keyless

Privacy-first, real-time speech-to-text dictation

Local-only voice transcription built in Rust. Hold a hotkey, speak, release - your words appear as text.

Powered by OpenAI's Whisper.

Website • Features • Quick Start • How It Works • Tech Stack • Contributing

Motivation

I built keyless to create a fast, private dictation workflow I can rely on every day, entirely in Rust. That means:

Audio capture and DSP in keyless-audio (FFT‑based EQ, hysteresis VAD; resampling handled in keyless-whisper via rubato)
On‑device ML inference in keyless-whisper (Candle/Whisper model loading, tokenizer, quantized GGUF support)
Concurrency and orchestration in keyless-runtime (mpsc channels, phased startup artifacts, non‑Send cpal::Stream boundaries)
A responsive terminal UI in keyless (ratatui overlays, hotkeys, previews, logs)
A modern desktop app in keyless-desktop (Tauri v2 + React, system tray, overlay windows, full settings UI)
Robust downloads and configuration in keyless-models and keyless-core (reqwest resume/backoff, typed caches, serde config, KeylessError)

The goal is a tool you can trust: local and no accounts. Choose the interface that fits your workflow—terminal or desktop.

📸 Demos

Desktop App (Beta):

TUI (Terminal) Version:

✨ Features

Privacy-First Architecture

100% local processing - no cloud, no API keys necessary, no network needed after model download
All audio and transcription stays on your device
Open-source and auditable

Real-Time Transcription

Live preview mirrors the Final by running the same voiced‑mask pipeline incrementally (~120–200 ms cadence)
Push-to-talk for precise control
Full, high‑quality final transcription on release

Smart Quality

Per‑unit silence‑drop (RMS + Whisper no_speech_prob) and overlap dedupe
Temperature fallback decoding
Language auto-detection (99 languages supported)
Quality metrics tracking (confidence, compression ratio)

Performance Optimized

GPU acceleration (Metal on macOS, CUDA on Linux/Windows)
High-quality resampling via rubato (supports all device sample rates)
Auto mic rate (device default; prefers 48 kHz; caps high rates)

🚀 Quick Start

Desktop App (Beta) 🎉

Download desktop app: Releases - Look for keyless-desktop-* assets

Build desktop app from source:

Prerequisites:

Node.js (LTS version recommended)
pnpm (version 10+)
Rust (stable toolchain)
Tauri CLI - Install with cargo install tauri-cli or pnpm add -D @tauri-apps/cli

Build steps:

git clone https://github.com/hate/keyless.git
cd keyless/keyless-desktop

# Install dependencies
pnpm install

# Development mode
pnpm tauri dev

# Build release
pnpm tauri build

First run:

Grant microphone and accessibility permissions when prompted
Download a Whisper model from the Models screen
Configure your hotkey and output mode in Settings
Press your configured hotkey to start dictating

Note: On macOS, Accessibility permission is required for paste mode (keystroke simulation) and global hotkey detection. The app will guide you through granting permissions on first launch.

TUI (Terminal) Version

Download binary: Releases - Look for keyless-* assets (not desktop)

# Download from releases page
# Extract and run
./keyless  # macOS/Linux
# or
keyless.exe # Windows

Build from source:

git clone https://github.com/hate/keyless.git
cd keyless
cargo build --release
./target/release/keyless

Install via Cargo (from git):

# Latest main
cargo install --git https://github.com/hate/keyless --package keyless --locked

# Or pin a release tag
cargo install --git https://github.com/hate/keyless --tag tui-v0.3.0 --package keyless --locked

# Update existing install
cargo install --git https://github.com/hate/keyless --package keyless --locked --force

# Enable CUDA (Linux/Windows with NVIDIA)
cargo install --git https://github.com/hate/keyless --package keyless --locked --features cuda

# After install, ensure ~/.cargo/bin is on PATH

Note: CUDA is an optional feature flag on the same install; use --features cuda when building on an NVIDIA system with the CUDA toolkit installed. macOS Metal is enabled automatically by target.

First run: Downloads Whisper model Usage: Press Control+Option (configurable) to start dictating

🔧 How It Works

Both the TUI and Desktop app share the same core audio processing pipeline. The only difference is the user interface layer—the TUI uses a terminal-based interface while the Desktop app uses a modern GUI with overlays.

High-Level Flow

keyless uses a multi-threaded pipeline to process your voice in real-time (shared by both TUI and Desktop):

Capture - Microphone capture (auto: device default; prefers 48 kHz; caps high rates)
Process - Automatic resample to 16 kHz with rubato, VAD gate, EQ analysis
Transcribe (Preview) - Every ~120–200 ms, run the same voiced‑mask pipeline on the current buffer (≤10s units, 0.5s overlap), reusing cached unit texts via a ~128 ms tail hash; preview text matches what Final would be now
Transcribe (Final) - On release, run the same unitized pipeline on the full segment; stitch units with overlap dedupe; per‑unit silence‑drop prevents stray boilerplate
Deliver - Text appears in your app (paste/clipboard/file)

All processing happens on your device using Rust and Candle ML.

┌─────────────┐
│  Microphone │
└──────┬──────┘
       │ PCM audio (auto rate; prefers 48 kHz)
       ▼
┌─────────────────────────────┐
│  Audio Thread (cpal)        │
│  • Captures ~100ms frames   │
│  • Thread-safe channel      │
└──────┬──────────────────────┘
       │ Audio frames
       ▼
┌─────────────────────────────┐
│  Worker Thread              │
│  • Resample to 16 kHz       │
│  • VAD gating               │
│  • EQ spectrum              │
│  • Accumulation buffer      │
└──────┬──────────────────────┘
       │ Gated 16kHz audio
       ▼
┌─────────────────────────────┐
│  Inference Thread (Whisper) │
│  • Mel spectrogram          │
│  • Encoder (audio→features) │
│  • Decoder (features→text)  │
│  • Event emission           │
└──────┬──────────────────────┘
       │ Transcription events
       ▼
┌─────────────────────────────┐
│  Output Sink                │
│  • Paste (keystroke sim)    │
│  • Clipboard                │
│  • File append              │
└─────────────────────────────┘

Click to see detailed technical implementation

Technical Implementation

1. Audio Capture (`keyless-audio`)

Technology: cpal (Cross-Platform Audio Library)

Process:

Uses device default input rate automatically; prefers 48 kHz when available; caps excessively high defaults (e.g., ≤48 kHz)
Produces ~100 ms frames (computed from the chosen sample rate)
Bounded channel (capacity: 64 frames) introduces backpressure to prevent unbounded memory; realtime callback uses non-blocking try_send so audio never blocks (overflow frames may be dropped)

Why prefer/cap around 48 kHz?

Many microphones support 48 kHz and it offers smoother EQ visualization
We resample to 16 kHz using rubato (high‑quality FFT‑based), so recognition accuracy is independent of device rate
Capping avoids unnecessary CPU load from extreme defaults (e.g., 96/192 kHz)

Threading:

OS audio callback runs on realtime thread (must not block!)
Frames forwarded to worker thread via channel
Separation prevents audio glitches even if processing is slow

2. Signal Processing (`keyless-audio`, `keyless-whisper`)

Resampling (arbitrary rate → 16 kHz):

rubato FFT‑based resampler
Supports all device sample rates (16k, 32k, 44.1k, 48k, 96k)
High‑quality sinc interpolation with anti‑aliasing
No assumptions about microphone capabilities (works with your device rate)

EQ Visualizer:

1024-point FFT with Hann window
64 log-spaced bands (120 Hz to configured Nyquist); EQ runs pre-resample (typically 24 kHz Nyquist at 48 kHz)
Auto-sensitivity via max normalization for consistent bar heights
Attack/decay smoothing for fluid animation
Tunable parameters: noise reduction, gamma curve, window scaling

VAD (Voice Activity Detection):

Hysteresis‑based gate (default: start: −45 dBFS RMS, stop: −50 dBFS RMS; configurable)
Minimum duration (200 ms) prevents brief clicks
Maximum silence (800 ms) allows natural pauses
Prevents Whisper from receiving silence/background noise

3. Transcription Engine (`keyless-whisper`)

What is Whisper? OpenAI Whisper is a state-of-the-art speech recognition model trained on 680,000 hours of multilingual data. It achieves human-level accuracy and supports 99 languages.

Our Implementation:

ML Framework: Candle (Hugging Face's Rust ML library)
Models: Whisper tiny/base/small (multilingual or .en variants)
Runs locally: No API calls, all inference on-device

Pipeline:

Preprocessing:
- Convert 16kHz PCM → Mel spectrogram (80 mel bins)
- Voiced span unitization: Split audio into ≤10s voiced units with 0.5s overlap using hysteresis-based detection
- Mel frames capped to 2 × max_source_positions to prevent encoder panics
Encoder:
- Processes mel spectrogram → audio features
- Runs once per unit (cached for language detection and preview reuse)
Decoder (Autoregressive):
- Token generation with temperature-based sampling
- Per-unit silence-drop (RMS + no_speech_prob) prevents hallucinations
- Temperature fallback (tries 0.0, 0.2, ..., 1.0 until quality acceptable)
- Quality metrics: avg_logprob, compression_ratio
Post-processing:
- Filter special tokens (SOT, EOT, language tags)
- Stitch units together with overlap dedupe
- Deliver Final event to output sink

Performance Optimizations:

Automatically selects the best mic rate
Single encoder pass (language detection reuses features)
Quantized model support (GGUF via Candle's quantized loader, 3–4× faster)

Device Selection:

Prefers GPU backends when compiled and available (Metal on macOS, CUDA on NVIDIA) with CPU fallback
Selection is automatic among available backends; no manual configuration needed

4. Output Delivery (`keyless-output`)

Three sink modes:

Paste:

Uses enigo for cross-platform keystroke simulation
Simulates typing character-by-character
Appears in focused application
Requires Accessibility permission on macOS

Clipboard:

Uses arboard for cross-platform clipboard access
Copies transcription to system clipboard
User manually pastes with Cmd+V/Ctrl+V

File:

Appends to specified file path
Each transcription on new line
Useful for logging or feeding to other tools

5. Event-Driven Coordination (`keyless-runtime`)

Channel-based communication:

pub struct PipelineChannels {
    pub tx_level: SyncSender<u16>,      // RMS level → TUI/Desktop
    pub tx_log: SyncSender<String>,     // Logs → TUI/Desktop
    pub tx_spec: SyncSender<Vec<u16>>,  // EQ bars → TUI/Desktop
    pub tx_preview: SyncSender<String>, // Preview text → TUI/Desktop
    pub tx_vad: SyncSender<bool>,       // VAD state → TUI/Desktop
}

Key design decisions:

Bounded channels with backpressure (prevents memory bloat)
Non-blocking sends (audio never blocks on slow consumers)
Separate worker threads (audio, inference, UI never block each other)

Push-to-Talk:

Global hotkey listener (using platform-specific APIs)
Arc<AtomicBool> for thread-safe hold state
Atomic flag checked in audio callback (realtime-safe)

Error Handling Philosophy

Project-wide KeylessError enum:

pub enum KeylessError {
    Audio(String),
    Whisper(String),
    Output(String),
    Config(String),
    Io(#[from] io::Error),
    // ...
}

Benefits:

Type-safe error handling (pattern matching)
Automatic io::Error conversion
Consistent error messages across all 8 crates
No panics (zero unwrap/expect in production code)

📦 Project Structure

keyless/
├── keyless/              # TUI binary (Ratatui-based terminal interface)
├── keyless-desktop/      # Desktop app (Tauri v2 + React + TypeScript)
│   ├── src/              # React frontend (components, hooks, views, utils)
│   └── src-tauri/        # Rust backend (IPC commands, services, runtime)
├── keyless-whisper/      # Whisper engine (Candle ML, model loading, inference)
├── keyless-audio/        # Audio capture (cpal, VAD, EQ spectrum, SFX)
├── keyless-output/       # Output sinks (paste, clipboard, file)
├── keyless-models/       # HTTP downloads, HF model management
├── keyless-runtime/      # Pipeline orchestration, worker lifecycle
├── keyless-core/         # Shared types (Config, KeylessError, events)
├── keyless-logging/      # Tracing initialization (stdout, JSON, channel)
└── keyless-website/      # Website (Zola static site generator)

🛠️ Tech Stack

Core:

Rust (edition 2024) - Systems programming language
Candle - Hugging Face ML framework
OpenAI Whisper - Speech recognition model

Audio:

cpal - Cross-platform audio I/O
realfft - Fast Fourier Transform
rubato - High-quality audio resampling

TUI:

ratatui - Terminal UI framework
crossterm - Terminal manipulation

Desktop:

Tauri v2 - Cross-platform desktop app framework
React - UI library
TypeScript - Type-safe JavaScript
Tailwind CSS - Utility-first CSS framework
Vite - Build tool and dev server

Hotkeys:

rdev - Global keyboard event listening (PTT)
- Custom fork with macOS key-name generation disabled to prevent crashes

Output:

arboard - Clipboard access
enigo - Keystroke simulation

Utilities:

reqwest - HTTP client (model downloads)
tracing - Structured logging
thiserror - Error derive macros
clap - CLI argument parsing

Website:

Zola - Static site generator (hosted at keyless.sh)

🧪 Development

# Run all CI checks locally
./check-ci.sh   # macOS/Linux
.\check-ci.ps1  # Windows

# Format code
cargo fmt --all

# Lint
cargo clippy --all-targets -- -D warnings

# Test
cargo test --workspace

# Build TUI release
cargo build --release

# Build desktop app (requires Node.js and pnpm)
cd keyless-desktop
pnpm install
pnpm tauri build

🔒 Privacy & Security

100% local - No data leaves your machine
Offline after download - Models cached locally (~/.cache/keyless/models/)
No telemetry - Zero tracking, zero analytics
Open source - Audit the code yourself
Security scanning - cargo-deny + cargo-audit in CI

📚 Documentation

Website - Project website built with Zola static site generator
TUI Usage & Troubleshooting - Terminal interface guide
Desktop App Guide - Desktop app documentation (Beta)
Whisper Implementation Details - ML engine technical docs
Contributing Guide - Development standards and workflow
Changelog - Version history and release notes

Documentation standards are enforced in CI:

rustdoc warnings as errors (RUSTDOCFLAGS="-D warnings")
Doctests run across the workspace
Clippy enforces documentation on private items

🤝 Contributing

Check out the Contributing Guide for:

Code standards (no unwrap, KeylessError everywhere)
Commit format (Conventional Commits)
Testing requirements
PR process

📄 License

MIT - see LICENSE for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

keyless

Motivation

📸 Demos

✨ Features

🚀 Quick Start

Desktop App (Beta) 🎉

TUI (Terminal) Version

🔧 How It Works

High-Level Flow

Technical Implementation

1. Audio Capture (`keyless-audio`)

2. Signal Processing (`keyless-audio`, `keyless-whisper`)

3. Transcription Engine (`keyless-whisper`)

4. Output Delivery (`keyless-output`)

5. Event-Driven Coordination (`keyless-runtime`)

Error Handling Philosophy

📦 Project Structure

🛠️ Tech Stack

🧪 Development

🔒 Privacy & Security

📚 Documentation

🤝 Contributing

📄 License

About

Uh oh!

Releases 4

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.cargo		.cargo
.github		.github
assets		assets
keyless-audio		keyless-audio
keyless-core		keyless-core
keyless-desktop		keyless-desktop
keyless-logging		keyless-logging
keyless-models		keyless-models
keyless-output		keyless-output
keyless-runtime		keyless-runtime
keyless-website		keyless-website
keyless-whisper		keyless-whisper
keyless		keyless
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
check-ci.ps1		check-ci.ps1
check-ci.sh		check-ci.sh
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml

License

hate/keyless

Folders and files

Latest commit

History

Repository files navigation

keyless

Motivation

📸 Demos

✨ Features

🚀 Quick Start

Desktop App (Beta) 🎉

TUI (Terminal) Version

🔧 How It Works

High-Level Flow

Technical Implementation

1. Audio Capture (keyless-audio)

2. Signal Processing (keyless-audio, keyless-whisper)

3. Transcription Engine (keyless-whisper)

4. Output Delivery (keyless-output)

5. Event-Driven Coordination (keyless-runtime)

Error Handling Philosophy

📦 Project Structure

🛠️ Tech Stack

🧪 Development

🔒 Privacy & Security

📚 Documentation

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Uh oh!

Languages

1. Audio Capture (`keyless-audio`)

2. Signal Processing (`keyless-audio`, `keyless-whisper`)

3. Transcription Engine (`keyless-whisper`)

4. Output Delivery (`keyless-output`)

5. Event-Driven Coordination (`keyless-runtime`)

Packages