A unified web-based interface for multiple text-to-speech models - featuring Kitten TTS, Piper TTS, and Kokoro TTS running entirely in your browser! Switch between models seamlessly and choose the perfect voice for your needs.
- π» Kitten TTS - Lightweight 24MB model with 8 voice embeddings
- π Piper TTS - High-quality 75MB model with 904 different voices
- πΈ Kokoro TTS - Premium 82MB model with 21 expressive voices
- Model Switcher - Seamless switching between TTS engines
- Dynamic Interface - Controls adapt to each model's capabilities
- Voice Selection - From 8 expressive voices (Kitten), 21 premium voices (Kokoro), to 904 diverse voices (Piper)
- Voice Preview - Click to instantly hear any voice before selecting it! π§
- Speed Control - Adjustable speech rate from 0.5x to 2.0x
- Sample Rate Control - Multiple quality levels (Kitten & Kokoro)
- WebGPU Acceleration - Optional GPU acceleration (Kitten & Kokoro)
- No server required - runs completely locally
- Real-time generation using WebAssembly
- Smart model loading - only loads selected model
- Intelligent caching for optimal performance
- One-click voice previews with personalized greetings
docker pull ghcr.io/clowerweb/tts-studio:latest
-
Clone the repository:
git clone https://github.com/clowerweb/tts-studio cd tts-studio -
Install dependencies:
npm install
-
Start the development server:
npm run dev
-
Open your browser and navigate to
http://localhost:5173 -
Choose your model and generate speech! π
- Node.js 16+
- Modern browser with WebAssembly support
- ~180MB disk space for all model files
| Feature | π» Kitten TTS | πΈ Kokoro TTS | π Piper TTS |
|---|---|---|---|
| Model Size | ~24MB | ~82MB | ~75MB |
| Voices | 8 expressive embeddings | 21 premium voices | 904 diverse speakers |
| Voice Preview | β Instant previews | β Instant previews | β Instant previews |
| Quality | High quality, fast | Highest quality, natural | Premium quality |
| Speed | ~2-3x realtime | ~1-2x realtime | ~3-5x realtime |
| WebGPU | β Optional acceleration | β Optional acceleration | β WASM only |
| Sample Rate | β 8-48kHz configurable | 24kHz fixed | 22kHz fixed |
| Best For | Quick generation, mobile | Natural speech, production | Diverse voices, high-quality |
The TTS Studio provides a unified interface that:
- Model Selection - Choose between Kitten TTS, Kokoro TTS, or Piper TTS
- Dynamic Loading - Only loads the selected model to save bandwidth
- Adaptive UI - Shows relevant controls for each model
- Unified Processing - All models use the same audio pipeline
- Smart Caching - Models are cached locally for faster subsequent loads
- Size: 24MB quantized ONNX model
- Architecture: 15M parameter neural TTS
- Voices: 8 expression-based embeddings
- Features: WebGPU support, configurable sample rates
- Source: KittenML/kitten-tts-nano-0.1
- Size: 82MB quantized ONNX model
- Architecture: StyleTextToSpeech2 neural architecture
- Voices: 21 premium voices (American & British English)
- Features: WebGPU support, natural speech synthesis, adaptive voice embeddings
- Source: onnx-community/Kokoro-82M-v1.0-ONNX
- Size: 75MB ONNX model + config
- Architecture: Neural TTS with LibriTTS training
- Voices: 904 diverse speakers from LibriTTS dataset
- Features: High-quality synthesis, voice preview
- Source: rhasspy/piper-voices
- Frontend: Vue 3 + Vite + Tailwind CSS
- ML Runtime: ONNX Runtime Web (WebGPU + WebAssembly)
- Phonemization: phonemizer.js (espeak-ng backend)
- Audio Processing: Web Audio API with WAV encoding
- Text Processing: Smart text cleaning and chunking
- Worker Architecture: Web Workers for non-blocking inference
βββ index.html # Main HTML entry point
βββ src/
β βββ App.vue # Main TTS Studio application
β βββ main.js # Application entry point
β βββ components/ # Vue components
β β βββ AudioChunk.vue # Audio playback component
β β βββ ModelSwitcher.vue # Model selection interface
β β βββ SampleRateSelector.vue
β β βββ SpeedControl.vue
β β βββ TextStatistics.vue
β β βββ ThemeToggle.vue
β β βββ VoiceSelector.vue
β β βββ WebGPUToggle.vue
β βββ lib/
β β βββ kitten-tts.js # Kitten TTS implementation
β β βββ kokoro-tts.js # Kokoro TTS implementation
β β βββ piper-tts.js # Piper TTS implementation
β βββ utils/
β β βββ model-cache.js # Intelligent model caching
β β βββ text-cleaner.js # Advanced text processing
β β βββ utils.js # Shared utilities
β βββ workers/
β βββ tts-worker.js # Unified TTS Web Worker
βββ public/
β βββ onnx-runtime/ # ONNX Runtime WASM files
β βββ tts-models/ # Model files
β βββ kitten-tts/ # Kitten TTS model files
β β βββ model_quantized.onnx
β β βββ tokenizer.json
β β βββ voices.json
β βββ kokoro/ # Kokoro TTS model files
β β βββ model_quantized.onnx
β β βββ tokenizer.json
β β βββ voices/ # 21 voice embedding files
β βββ piper/ # Piper TTS model files
β βββ en_US-libritts_r-medium.onnx
β βββ en_US-libritts_r-medium.onnx.json
β βββ voices.json
βββ piper/ # Reference Piper TTS code
βββ package.json # Dependencies and scripts
βββ vite.config.js # Vite configuration
- Choose π» Kitten TTS for fast, lightweight synthesis
- Great for prototyping, mobile devices, or real-time applications
- Enable WebGPU for even faster generation (if supported)
- Choose πΈ Kokoro TTS for the most natural-sounding voices
- 21 carefully crafted voices with exceptional expressiveness
- Perfect for content creation, audiobooks, and premium applications
- Enable WebGPU for faster generation
- Choose π Piper TTS for maximum voice diversity
- 904 voices provide extensive variety for any project
- Use voice preview to find the perfect speaker
Test voices instantly! Every voice across all models includes an instant preview:
- π― One-Click Preview - Click the play button next to any voice to hear it immediately
- π Personalized Greetings - Each voice introduces itself with a unique message
- β‘ Zero Setup - No configuration needed, works with all models
- π Easy Comparison - Switch between voices to find your favorite
- π Smart Controls - Click again to stop, automatic cleanup when finished
Perfect for finding the ideal voice for your project without any guesswork!
- Only one model loads at a time to save memory
- Models are cached locally after first download
- Use shorter text chunks for faster streaming
- Voice previews are lightweight and don't interfere with main generation
Contributions are welcome! Feel free to:
- Report bugs or issues
- Suggest new TTS models to integrate
- Submit pull requests for features
- Improve documentation
- Add more voice options
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Kitten TTS - Apache 2.0 License (KittenML)
- Piper TTS - MIT License (rhasspy/piper)
- Piper Voices - MIT License (rhasspy/piper-voices)
- LibriTTS Dataset - CC-BY 4.0 License (original voice recordings)
- phonemizer.js - MIT License (espeak-ng phonemization)
- KittenML Team for the lightweight and efficient Kitten TTS model
- ONNX Community for the outstanding Kokoro TTS model conversion and optimization
- Rhasspy/Piper Team for the high-quality Piper TTS model and voice collection
- LibriTTS Dataset for the diverse, high-quality voice recordings
- Microsoft ONNX Runtime for excellent WebAssembly inference
- Xenova/transformers.js for ONNX Runtime Web integration
- espeak-ng for robust phonemization support
- Check browser console for CORS or network errors
- Ensure sufficient space (~100MB for both models)
- Try refreshing to retry model download
- Use modern browser (Chrome/Firefox/Safari recommended)
- Check audio permissions in browser settings
- Disable audio blockers or ad blockers interfering
- Click page first to enable audio context (browser requirement)
- Try different browsers if issues persist
- Close other tabs to free memory for models
- Use shorter text for faster generation
- Try different models - Kitten is fastest, Kokoro most natural, Piper most diverse
- Check device compatibility - WebGPU requires modern GPU
- Kitten TTS WebGPU not working? This is experimental - WASM fallback will activate
- Kokoro TTS slow generation? Enable WebGPU toggle for significant speedup
- Kokoro voices not loading? Wait for all 21 voice embeddings to download
- Piper voice preview silent? Wait for model to fully load before previewing
- Poor audio quality? Try different voices or adjust speed settings
Awaiting feedback to update!
π€ Ready to create amazing speech synthesis? Choose your model and start generating!
Made with β€οΈ combining the best of Kitten TTS, Kokoro TTS, and Piper TTS