Privacy-first, real-time speech-to-text dictation
Local-only voice transcription built in Rust. Hold a hotkey, speak, release - your words appear as text.
Powered by OpenAI's Whisper.
Website • Features • Quick Start • How It Works • Tech Stack • Contributing
I built keyless to create a fast, private dictation workflow I can rely on every day, entirely in Rust. That means:
- Audio capture and DSP in
keyless-audio(FFT‑based EQ, hysteresis VAD; resampling handled inkeyless-whispervia rubato) - On‑device ML inference in
keyless-whisper(Candle/Whisper model loading, tokenizer, quantized GGUF support) - Concurrency and orchestration in
keyless-runtime(mpsc channels, phased startup artifacts, non‑Sendcpal::Streamboundaries) - A responsive terminal UI in
keyless(ratatui overlays, hotkeys, previews, logs) - A modern desktop app in
keyless-desktop(Tauri v2 + React, system tray, overlay windows, full settings UI) - Robust downloads and configuration in
keyless-modelsandkeyless-core(reqwest resume/backoff, typed caches, serde config,KeylessError)
The goal is a tool you can trust: local and no accounts. Choose the interface that fits your workflow—terminal or desktop.
Desktop App (Beta):
TUI (Terminal) Version:
Privacy-First Architecture
- 100% local processing - no cloud, no API keys necessary, no network needed after model download
- All audio and transcription stays on your device
- Open-source and auditable
Real-Time Transcription
- Live preview mirrors the Final by running the same voiced‑mask pipeline incrementally (~120–200 ms cadence)
- Push-to-talk for precise control
- Full, high‑quality final transcription on release
Smart Quality
- Per‑unit silence‑drop (RMS + Whisper no_speech_prob) and overlap dedupe
- Temperature fallback decoding
- Language auto-detection (99 languages supported)
- Quality metrics tracking (confidence, compression ratio)
Performance Optimized
- GPU acceleration (Metal on macOS, CUDA on Linux/Windows)
- High-quality resampling via rubato (supports all device sample rates)
- Auto mic rate (device default; prefers 48 kHz; caps high rates)
Download desktop app:
Releases - Look for keyless-desktop-* assets
Build desktop app from source:
Prerequisites:
- Node.js (LTS version recommended)
- pnpm (version 10+)
- Rust (stable toolchain)
- Tauri CLI - Install with
cargo install tauri-cliorpnpm add -D @tauri-apps/cli
Build steps:
git clone https://github.com/hate/keyless.git
cd keyless/keyless-desktop
# Install dependencies
pnpm install
# Development mode
pnpm tauri dev
# Build release
pnpm tauri buildFirst run:
- Grant microphone and accessibility permissions when prompted
- Download a Whisper model from the Models screen
- Configure your hotkey and output mode in Settings
- Press your configured hotkey to start dictating
Note: On macOS, Accessibility permission is required for paste mode (keystroke simulation) and global hotkey detection. The app will guide you through granting permissions on first launch.
Download binary:
Releases - Look for keyless-* assets (not desktop)
# Download from releases page
# Extract and run
./keyless # macOS/Linux
# or
keyless.exe # WindowsBuild from source:
git clone https://github.com/hate/keyless.git
cd keyless
cargo build --release
./target/release/keylessInstall via Cargo (from git):
# Latest main
cargo install --git https://github.com/hate/keyless --package keyless --locked
# Or pin a release tag
cargo install --git https://github.com/hate/keyless --tag tui-v0.3.0 --package keyless --locked
# Update existing install
cargo install --git https://github.com/hate/keyless --package keyless --locked --force
# Enable CUDA (Linux/Windows with NVIDIA)
cargo install --git https://github.com/hate/keyless --package keyless --locked --features cuda
# After install, ensure ~/.cargo/bin is on PATHNote: CUDA is an optional feature flag on the same install; use --features cuda when building on an NVIDIA system with the CUDA toolkit installed. macOS Metal is enabled automatically by target.
First run: Downloads Whisper model Usage: Press Control+Option (configurable) to start dictating
Both the TUI and Desktop app share the same core audio processing pipeline. The only difference is the user interface layer—the TUI uses a terminal-based interface while the Desktop app uses a modern GUI with overlays.
keyless uses a multi-threaded pipeline to process your voice in real-time (shared by both TUI and Desktop):
- Capture - Microphone capture (auto: device default; prefers 48 kHz; caps high rates)
- Process - Automatic resample to 16 kHz with rubato, VAD gate, EQ analysis
- Transcribe (Preview) - Every ~120–200 ms, run the same voiced‑mask pipeline on the current buffer (≤10s units, 0.5s overlap), reusing cached unit texts via a ~128 ms tail hash; preview text matches what Final would be now
- Transcribe (Final) - On release, run the same unitized pipeline on the full segment; stitch units with overlap dedupe; per‑unit silence‑drop prevents stray boilerplate
- Deliver - Text appears in your app (paste/clipboard/file)
All processing happens on your device using Rust and Candle ML.
┌─────────────┐
│ Microphone │
└──────┬──────┘
│ PCM audio (auto rate; prefers 48 kHz)
▼
┌─────────────────────────────┐
│ Audio Thread (cpal) │
│ • Captures ~100ms frames │
│ • Thread-safe channel │
└──────┬──────────────────────┘
│ Audio frames
▼
┌─────────────────────────────┐
│ Worker Thread │
│ • Resample to 16 kHz │
│ • VAD gating │
│ • EQ spectrum │
│ • Accumulation buffer │
└──────┬──────────────────────┘
│ Gated 16kHz audio
▼
┌─────────────────────────────┐
│ Inference Thread (Whisper) │
│ • Mel spectrogram │
│ • Encoder (audio→features) │
│ • Decoder (features→text) │
│ • Event emission │
└──────┬──────────────────────┘
│ Transcription events
▼
┌─────────────────────────────┐
│ Output Sink │
│ • Paste (keystroke sim) │
│ • Clipboard │
│ • File append │
└─────────────────────────────┘
Click to see detailed technical implementation
Technology: cpal (Cross-Platform Audio Library)
Process:
- Uses device default input rate automatically; prefers 48 kHz when available; caps excessively high defaults (e.g., ≤48 kHz)
- Produces ~100 ms frames (computed from the chosen sample rate)
- Bounded channel (capacity: 64 frames) introduces backpressure to prevent unbounded memory; realtime callback uses non-blocking try_send so audio never blocks (overflow frames may be dropped)
Why prefer/cap around 48 kHz?
- Many microphones support 48 kHz and it offers smoother EQ visualization
- We resample to 16 kHz using rubato (high‑quality FFT‑based), so recognition accuracy is independent of device rate
- Capping avoids unnecessary CPU load from extreme defaults (e.g., 96/192 kHz)
Threading:
- OS audio callback runs on realtime thread (must not block!)
- Frames forwarded to worker thread via channel
- Separation prevents audio glitches even if processing is slow
Resampling (arbitrary rate → 16 kHz):
- rubato FFT‑based resampler
- Supports all device sample rates (16k, 32k, 44.1k, 48k, 96k)
- High‑quality sinc interpolation with anti‑aliasing
- No assumptions about microphone capabilities (works with your device rate)
EQ Visualizer:
- 1024-point FFT with Hann window
- 64 log-spaced bands (120 Hz to configured Nyquist); EQ runs pre-resample (typically 24 kHz Nyquist at 48 kHz)
- Auto-sensitivity via max normalization for consistent bar heights
- Attack/decay smoothing for fluid animation
- Tunable parameters: noise reduction, gamma curve, window scaling
VAD (Voice Activity Detection):
- Hysteresis‑based gate (default: start: −45 dBFS RMS, stop: −50 dBFS RMS; configurable)
- Minimum duration (200 ms) prevents brief clicks
- Maximum silence (800 ms) allows natural pauses
- Prevents Whisper from receiving silence/background noise
What is Whisper? OpenAI Whisper is a state-of-the-art speech recognition model trained on 680,000 hours of multilingual data. It achieves human-level accuracy and supports 99 languages.
Our Implementation:
- ML Framework: Candle (Hugging Face's Rust ML library)
- Models: Whisper tiny/base/small (multilingual or .en variants)
- Runs locally: No API calls, all inference on-device
Pipeline:
-
Preprocessing:
- Convert 16kHz PCM → Mel spectrogram (80 mel bins)
- Voiced span unitization: Split audio into ≤10s voiced units with 0.5s overlap using hysteresis-based detection
- Mel frames capped to
2 × max_source_positionsto prevent encoder panics
-
Encoder:
- Processes mel spectrogram → audio features
- Runs once per unit (cached for language detection and preview reuse)
-
Decoder (Autoregressive):
- Token generation with temperature-based sampling
- Per-unit silence-drop (RMS +
no_speech_prob) prevents hallucinations - Temperature fallback (tries 0.0, 0.2, ..., 1.0 until quality acceptable)
- Quality metrics: avg_logprob, compression_ratio
-
Post-processing:
- Filter special tokens (SOT, EOT, language tags)
- Stitch units together with overlap dedupe
- Deliver Final event to output sink
Performance Optimizations:
- Automatically selects the best mic rate
- Single encoder pass (language detection reuses features)
- Quantized model support (GGUF via Candle's quantized loader, 3–4× faster)
Device Selection:
- Prefers GPU backends when compiled and available (Metal on macOS, CUDA on NVIDIA) with CPU fallback
- Selection is automatic among available backends; no manual configuration needed
Three sink modes:
Paste:
- Uses enigo for cross-platform keystroke simulation
- Simulates typing character-by-character
- Appears in focused application
- Requires Accessibility permission on macOS
Clipboard:
- Uses arboard for cross-platform clipboard access
- Copies transcription to system clipboard
- User manually pastes with Cmd+V/Ctrl+V
File:
- Appends to specified file path
- Each transcription on new line
- Useful for logging or feeding to other tools
Channel-based communication:
pub struct PipelineChannels {
pub tx_level: SyncSender<u16>, // RMS level → TUI/Desktop
pub tx_log: SyncSender<String>, // Logs → TUI/Desktop
pub tx_spec: SyncSender<Vec<u16>>, // EQ bars → TUI/Desktop
pub tx_preview: SyncSender<String>, // Preview text → TUI/Desktop
pub tx_vad: SyncSender<bool>, // VAD state → TUI/Desktop
}Key design decisions:
- Bounded channels with backpressure (prevents memory bloat)
- Non-blocking sends (audio never blocks on slow consumers)
- Separate worker threads (audio, inference, UI never block each other)
Push-to-Talk:
- Global hotkey listener (using platform-specific APIs)
Arc<AtomicBool>for thread-safe hold state- Atomic flag checked in audio callback (realtime-safe)
Project-wide KeylessError enum:
pub enum KeylessError {
Audio(String),
Whisper(String),
Output(String),
Config(String),
Io(#[from] io::Error),
// ...
}Benefits:
- Type-safe error handling (pattern matching)
- Automatic
io::Errorconversion - Consistent error messages across all 8 crates
- No panics (zero
unwrap/expectin production code)
keyless/
├── keyless/ # TUI binary (Ratatui-based terminal interface)
├── keyless-desktop/ # Desktop app (Tauri v2 + React + TypeScript)
│ ├── src/ # React frontend (components, hooks, views, utils)
│ └── src-tauri/ # Rust backend (IPC commands, services, runtime)
├── keyless-whisper/ # Whisper engine (Candle ML, model loading, inference)
├── keyless-audio/ # Audio capture (cpal, VAD, EQ spectrum, SFX)
├── keyless-output/ # Output sinks (paste, clipboard, file)
├── keyless-models/ # HTTP downloads, HF model management
├── keyless-runtime/ # Pipeline orchestration, worker lifecycle
├── keyless-core/ # Shared types (Config, KeylessError, events)
├── keyless-logging/ # Tracing initialization (stdout, JSON, channel)
└── keyless-website/ # Website (Zola static site generator)
Core:
- Rust (edition 2024) - Systems programming language
- Candle - Hugging Face ML framework
- OpenAI Whisper - Speech recognition model
Audio:
- cpal - Cross-platform audio I/O
- realfft - Fast Fourier Transform
- rubato - High-quality audio resampling
TUI:
Desktop:
- Tauri v2 - Cross-platform desktop app framework
- React - UI library
- TypeScript - Type-safe JavaScript
- Tailwind CSS - Utility-first CSS framework
- Vite - Build tool and dev server
Hotkeys:
- rdev - Global keyboard event listening (PTT)
- Custom fork with macOS key-name generation disabled to prevent crashes
Output:
Utilities:
- reqwest - HTTP client (model downloads)
- tracing - Structured logging
- thiserror - Error derive macros
- clap - CLI argument parsing
Website:
- Zola - Static site generator (hosted at keyless.sh)
# Run all CI checks locally
./check-ci.sh # macOS/Linux
.\check-ci.ps1 # Windows
# Format code
cargo fmt --all
# Lint
cargo clippy --all-targets -- -D warnings
# Test
cargo test --workspace
# Build TUI release
cargo build --release
# Build desktop app (requires Node.js and pnpm)
cd keyless-desktop
pnpm install
pnpm tauri build- 100% local - No data leaves your machine
- Offline after download - Models cached locally (
~/.cache/keyless/models/) - No telemetry - Zero tracking, zero analytics
- Open source - Audit the code yourself
- Security scanning - cargo-deny + cargo-audit in CI
- Website - Project website built with Zola static site generator
- TUI Usage & Troubleshooting - Terminal interface guide
- Desktop App Guide - Desktop app documentation (Beta)
- Whisper Implementation Details - ML engine technical docs
- Contributing Guide - Development standards and workflow
- Changelog - Version history and release notes
Documentation standards are enforced in CI:
- rustdoc warnings as errors (RUSTDOCFLAGS="-D warnings")
- Doctests run across the workspace
- Clippy enforces documentation on private items
Check out the Contributing Guide for:
- Code standards (no unwrap, KeylessError everywhere)
- Commit format (Conventional Commits)
- Testing requirements
- PR process
MIT - see LICENSE for details

