Nene AI - Voice Chat Assistant (Developing)

Overview

Nene AI is an advanced voice-based assistant designed for seamless real-time interactions. It can:

Act as a Vtuber AI with a lively and affectionate personality
Read and respond to live chat messages from YouTube Live
Accept text input for conversation
Record and process audio for real-time voice-based interactions

Features

Converts speech to text using Whisper
Processes text input and generates responses using DeepSeek-R1 14B via Ollama
Converts text responses to speech using TTS (Text-to-Speech)
Audio tuning and playback using pydub
Configured to always use a warm, kind, and loving tone

System Requirements

Hardware

GPU: Recommended RTX 3070 or higher for optimal performance
RAM: Minimum 16GB, recommended 32GB+
Storage: At least ~30GB free space
OS: Windows 10/11, macOS, or Linux

Setup Instructions

Prerequisites

Ensure you have the following installed:

Python 3.8+ and < 3.10

Dependencies:

pip install whisper ollama TTS pydub torch

File Structure

project_root/
│-- core/
│   ├── audio_utils.py
│-- voice/
│   ├── input-th.m4a
│   ├── idle/
│   │   ├── en_idle_1.wav
│   │   ├── en_idle_2.wav
│   │   ├── jp_idle_1.wav
│   │   ├── jp_idle_2.wav
│   │   ├── th_idle_1.wav
│   │   ├── th_idle_2.wav
│   ├── think/
│   │   ├── en_think_1.wav
│   │   ├── en_think_2.wav
│   │   ├── jp_think_1.wav
│   │   ├── jp_think_2.wav
│   │   ├── th_think_1.wav
│   │   ├── th_think_2.wav
│-- output/
│   ├── ro-th.wav
│-- target/
│   ├── speaker-en.wav
│   ├── speaker-jp.wav
│   ├── speaker-th.wav
│-- other/
│   ├── Nene.png
│   ├── Terminal.png
│-- run.py
│-- requirements.txt
│-- README.md
│-- .env

Configuration

The assistant is configured with the following personality (TH):

Name: Nene
Personality: Sweet, caring, playful, and affectionate
Response style: Uses polite Thai language with "ค่ะ" and "คะ" to sound gentle
Restrictions: Cannot use "ครับ" as it is a masculine term

Code Breakdown

Speech to Text

def speech_to_text(audio_path):
    model = whisper.load_model("base")
    result = model.transcribe(audio_path, fp16=False)
    return result["text"]

Converts input audio to text using OpenAI's Whisper model.

Getting AI Response

def get_response_from_deepseek(text):
    response = ollama.chat(model=setup_role["model"], messages=[{"role": "system", "content": setup_role['setup-role']}, {"role": "user", "content": text}])
    return response['message']['content']

Uses DeepSeek-R1 14B via Ollama to generate a response.

Text to Speech

def text_to_speech(name, lang, text):
    tts = TTS(model_name=f"tts_models/{lang}/fairseq/vits")
    tts.tts_with_vc_to_file(text, speaker_wav="./target/speaker-en.wav", file_path=f"./output/{name}.wav")

Converts text to speech using TTS with voice cloning.

Playing Audio

play_audio(f"output/{name}.wav")

Plays the generated voice response.

Running the Assistant

python Talk_EN.py

The program will:

Take an input audio file (input-th.m4a)
Convert speech to text
Generate a response using DeepSeek-R1 14B
Convert the response into a voice output
Play the generated voice

Notes

The voice tuning applies pitch and filter modifications for a natural Thai accent.
The response is always in a cheerful, affectionate style.

Future Improvements

Support for more languages
Enhanced voice customization
Integration with real-time voice input/output

License

This project is open-source and free to use under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nene AI - Voice Chat Assistant (Developing)

Overview

Features

System Requirements

Setup Instructions

Prerequisites

File Structure

Configuration

Code Breakdown

Speech to Text

Getting AI Response

Text to Speech

Playing Audio

Running the Assistant

Notes

Future Improvements

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
core		core
other		other
output		output
target		target
voice		voice
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt
run.py		run.py

License

ScriptBloxX/Deepseek-Talk

Folders and files

Latest commit

History

Repository files navigation

Nene AI - Voice Chat Assistant (Developing)

Overview

Features

System Requirements

Setup Instructions

Prerequisites

File Structure

Configuration

Code Breakdown

Speech to Text

Getting AI Response

Text to Speech

Playing Audio

Running the Assistant

Notes

Future Improvements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages