HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

If you find this project useful, please give us a star 🌟.

📚 Algorithm Overview

🎉 Discrete Neural Codec With 24 Tokens Per Second (24KHZ) for Spoken Language Modeling!

Different color lines indicate the data flow used in inference and only for training. During inference, the audio is processed through the encoder and VQ1 to generate discrete quantization, which is then refined by the MLP. The decoder and fine-tuned BigVGAN subsequently reconstruct the Mel-spectrogram and audio.

📚 Experimental Results

$N_q$ denotes the number of quantizers. The origin human voice's UTMOS of three dataset (LibriTTS test-other / LibriTTS test-clean / Seed-TTS-eval) is $3.48$ / $4.05$ / $3.57$.}

⚙️ Installation

To install HHCodec, follow these steps:

conda create -n hhcodec python=3.10 # it must >3.10 because use bigvgan
conda activate hhcodec
git clone https://github.com/opendilab/HH-Codec.git
cd HH-Codec 
pip install -e .

# Install Dependencies for UTMOS Evaluation
pip install fairseq

# If you encounter conflicts, try:
pip install pip==24.0

🚀 Train

Step 1: Prepare the Training Dataset

Ensure your dataset is preprocessed by following the instructions in dataset

Step 2: Modify Configuration Files

Before starting training, update the configuration settings

# Open and modify the following file "configs/train.yaml"
# Adjust parameters such as:
# - log settings
# - train_path
# - save_dir
# - device (e.g., CPU/GPU)

Step 3: Start Training

Once the dataset is prepared and the configuration is set, launch the training process:

cd HH-Codec
python train.py fit --config configs/train.yaml

🧩 How to use HH-codec

You can simply use the training set from step 1, the configuration from step 2, and the training script from step 3 to reproduce the results of the model described in the paper with a single run. Since we are still refining the algorithm, an updated set of optimal model weights will be released after the final version of the paper is accepted by the journal.

wav, sr = torchaudio.load(audio_path).to(device))
wav = convert_audio(wav, sr, 24000, 1).unsqueeze(0).unsqueeze(0)  
# Generating discrete codecs
_, _, _, _, quant, _, index = model.encode(audio)
# Get quant from index only
quant = model.quantize.indices_to_codes(index)
# Reconstruct audio from raw wav
reconstructed_mel, reconstructed_audios = model.decode(quant)

🌏 Citation

@article{xue2025hh,
  title={HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling},
  author={Xue, Rongkun and Niu, Yazhe and Hu, Shuai and Yin, Zixin and Yao, Yongqiang and Yang, Jing},
  journal={arXiv preprint arXiv:2507.18897},
  year={2025}
}

💓 Acknowledgement

This project has been developed partially based on the following pioneering works on GitHub repositories. We express our profound gratitude for these foundational resources:

🏷️ License

All code within this repository is under Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
config		config
dataset		dataset
hhcodec		hhcodec
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
exp.png		exp.png
main.png		main.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

📚 Algorithm Overview

📚 Experimental Results

⚙️ Installation

🚀 Train

Step 1: Prepare the Training Dataset

Step 2: Modify Configuration Files

Step 3: Start Training

🧩 How to use HH-codec

🌏 Citation

💓 Acknowledgement

🏷️ License

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

opendilab/HH-Codec

Folders and files

Latest commit

History

Repository files navigation

HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

📚 Algorithm Overview

📚 Experimental Results

⚙️ Installation

🚀 Train

Step 1: Prepare the Training Dataset

Step 2: Modify Configuration Files

Step 3: Start Training

🧩 How to use HH-codec

🌏 Citation

💓 Acknowledgement

🏷️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages