Skip to content

maxmelichov/Text-To-speech

Repository files navigation

Text-To-Speech (Robo-Shaul)

Welcome to the Robo-Shaul repository! This project enables you to train your own Robo-Shaul or use pre-trained models to convert Hebrew text into speech using the Tacotron 2 TTS framework.

Robo-Shaul was originally developed for a competition, where the winning model was trained for only 5k steps. After the competition, a more advanced model was trained for 90k steps using improved methodologies and a wider range of training data, resulting in significantly better performance.


🚀 Quick Start

Prerequisites

  • Python 3.10

Installation

  1. Clone the repository:

    git clone https://github.com/maxmelichov/Text-To-speech.git
    cd Text-To-speech
  2. Set up a virtual environment:

    python3.10 -m venv venv
    source venv/bin/activate  # Linux/Mac
    # or
    activate.bat  # Windows
  3. Install dependencies:

    pip install -r requirements.txt
  4. Clone required submodules and dependencies:

    git clone https://github.com/maxmelichov/tacotron2.git
    git submodule init
    git submodule update
    git clone https://github.com/maxmelichov/waveglow.git
    cp waveglow/glow.py ./

📁 Project Structure

The main directories used in this project are:

Text-To-speech/
├── data/                  # Place the SASPEECH dataset here
├── checkpoints/           # Stores Tacotron2 model checkpoints (*.pt files)
├── waveglow_weights/      # Stores WaveGlow model checkpoint (*.pt file)
├── tacotron2/             # Tacotron2 source code (cloned as submodule)
├── waveglow/              # WaveGlow source code (cloned as submodule)
├── ...
  • data/: Put your downloaded and preprocessed dataset here.
  • checkpoints/: Save and load Tacotron2 model weights (e.g., checkpoint_90000.pt).
  • waveglow_weights/: Place the WaveGlow model checkpoint file (e.g., waveglow_256channels.pt).

📦 Download Pre-trained Models


📚 Dataset

  • Download the SASPEECH dataset from OpenSLR.

🛠️ Usage

  1. Preprocess the data:

    python data_preprocess.py

    After running the script, ensure you generate a .txt file in the same format as the examples in the filelists directory:

    path/to/audio.wav|transcript in Hebrew that using English letters
    
  2. Train the model:

    python train.py
  3. Generate speech (inference):

    python inference.py

💡 Demos & Resources


📝 Model Details

  • The system uses the SASPEECH dataset, a collection of unedited recordings from Shaul Amsterdamski for the 'Hayot Kis' podcast.
  • The TTS system is based on Nvidia's Tacotron 2, customized for Hebrew.

Note: The model expects diacritized Hebrew (עברית מנוקדת). For diacritization, we recommend Nakdimon (GitHub).


👥 Contact

Maxim Melichov Tony Hasson
LinkedIn LinkedIn

Feel free to reach out with questions or suggestions!

About

Roboshaul

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •