Welcome to the Robo-Shaul repository! This project enables you to train your own Robo-Shaul or use pre-trained models to convert Hebrew text into speech using the Tacotron 2 TTS framework.
Robo-Shaul was originally developed for a competition, where the winning model was trained for only 5k steps. After the competition, a more advanced model was trained for 90k steps using improved methodologies and a wider range of training data, resulting in significantly better performance.
- Python 3.10
-
Clone the repository:
git clone https://github.com/maxmelichov/Text-To-speech.git cd Text-To-speech -
Set up a virtual environment:
python3.10 -m venv venv source venv/bin/activate # Linux/Mac # or activate.bat # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Clone required submodules and dependencies:
git clone https://github.com/maxmelichov/tacotron2.git git submodule init git submodule update git clone https://github.com/maxmelichov/waveglow.git cp waveglow/glow.py ./
The main directories used in this project are:
Text-To-speech/
├── data/ # Place the SASPEECH dataset here
├── checkpoints/ # Stores Tacotron2 model checkpoints (*.pt files)
├── waveglow_weights/ # Stores WaveGlow model checkpoint (*.pt file)
├── tacotron2/ # Tacotron2 source code (cloned as submodule)
├── waveglow/ # WaveGlow source code (cloned as submodule)
├── ...
- data/: Put your downloaded and preprocessed dataset here.
- checkpoints/: Save and load Tacotron2 model weights (e.g.,
checkpoint_90000.pt). - waveglow_weights/: Place the WaveGlow model checkpoint file (e.g.,
waveglow_256channels.pt).
- Download the SASPEECH dataset from OpenSLR.
-
Preprocess the data:
python data_preprocess.py
After running the script, ensure you generate a
.txtfile in the same format as the examples in thefilelistsdirectory:path/to/audio.wav|transcript in Hebrew that using English letters -
Train the model:
python train.py
-
Generate speech (inference):
python inference.py
- Live Demo: Project Site
- Demo Page: here
- Quick Start Notebook: Notebook |
- Project Podcast: חיות כיס episode
- Training & Synthesis Videos: Part 1 | Part 2
- The system uses the SASPEECH dataset, a collection of unedited recordings from Shaul Amsterdamski for the 'Hayot Kis' podcast.
- The TTS system is based on Nvidia's Tacotron 2, customized for Hebrew.
Note: The model expects diacritized Hebrew (עברית מנוקדת). For diacritization, we recommend Nakdimon (GitHub).
| Maxim Melichov | Tony Hasson |
|---|---|
Feel free to reach out with questions or suggestions!