Skip to content

ScriptBloxX/Deepseek-Talk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Nene AI - Voice Chat Assistant (Developing)

Nene

Overview

Nene AI is an advanced voice-based assistant designed for seamless real-time interactions. It can:

  • Act as a Vtuber AI with a lively and affectionate personality

  • Read and respond to live chat messages from YouTube Live

  • Accept text input for conversation

  • Record and process audio for real-time voice-based interactions

Features

  • Converts speech to text using Whisper
  • Processes text input and generates responses using DeepSeek-R1 14B via Ollama
  • Converts text responses to speech using TTS (Text-to-Speech)
  • Audio tuning and playback using pydub
  • Configured to always use a warm, kind, and loving tone

System Requirements

Hardware

  • GPU: Recommended RTX 3070 or higher for optimal performance
  • RAM: Minimum 16GB, recommended 32GB+
  • Storage: At least ~30GB free space
  • OS: Windows 10/11, macOS, or Linux

Setup Instructions

Prerequisites

Ensure you have the following installed:

  • Python 3.8+ and < 3.10
  • Dependencies:
    pip install whisper ollama TTS pydub torch

File Structure

project_root/
β”‚-- core/
β”‚   β”œβ”€β”€ audio_utils.py
β”‚-- voice/
β”‚   β”œβ”€β”€ input-th.m4a
β”‚   β”œβ”€β”€ idle/
β”‚   β”‚   β”œβ”€β”€ en_idle_1.wav
β”‚   β”‚   β”œβ”€β”€ en_idle_2.wav
β”‚   β”‚   β”œβ”€β”€ jp_idle_1.wav
β”‚   β”‚   β”œβ”€β”€ jp_idle_2.wav
β”‚   β”‚   β”œβ”€β”€ th_idle_1.wav
β”‚   β”‚   β”œβ”€β”€ th_idle_2.wav
β”‚   β”œβ”€β”€ think/
β”‚   β”‚   β”œβ”€β”€ en_think_1.wav
β”‚   β”‚   β”œβ”€β”€ en_think_2.wav
β”‚   β”‚   β”œβ”€β”€ jp_think_1.wav
β”‚   β”‚   β”œβ”€β”€ jp_think_2.wav
β”‚   β”‚   β”œβ”€β”€ th_think_1.wav
β”‚   β”‚   β”œβ”€β”€ th_think_2.wav
β”‚-- output/
β”‚   β”œβ”€β”€ ro-th.wav
β”‚-- target/
β”‚   β”œβ”€β”€ speaker-en.wav
β”‚   β”œβ”€β”€ speaker-jp.wav
β”‚   β”œβ”€β”€ speaker-th.wav
β”‚-- other/
β”‚   β”œβ”€β”€ Nene.png
β”‚   β”œβ”€β”€ Terminal.png
β”‚-- run.py
β”‚-- requirements.txt
β”‚-- README.md
β”‚-- .env

Configuration

The assistant is configured with the following personality (TH):

  • Name: Nene
  • Personality: Sweet, caring, playful, and affectionate
  • Response style: Uses polite Thai language with "ΰΈ„ΰΉˆΰΈ°" and "ΰΈ„ΰΈ°" to sound gentle
  • Restrictions: Cannot use "ΰΈ„ΰΈ£ΰΈ±ΰΈš" as it is a masculine term

Code Breakdown

Speech to Text

def speech_to_text(audio_path):
    model = whisper.load_model("base")
    result = model.transcribe(audio_path, fp16=False)
    return result["text"]

Converts input audio to text using OpenAI's Whisper model.

Getting AI Response

def get_response_from_deepseek(text):
    response = ollama.chat(model=setup_role["model"], messages=[{"role": "system", "content": setup_role['setup-role']}, {"role": "user", "content": text}])
    return response['message']['content']

Uses DeepSeek-R1 14B via Ollama to generate a response.

Text to Speech

def text_to_speech(name, lang, text):
    tts = TTS(model_name=f"tts_models/{lang}/fairseq/vits")
    tts.tts_with_vc_to_file(text, speaker_wav="./target/speaker-en.wav", file_path=f"./output/{name}.wav")

Converts text to speech using TTS with voice cloning.

Playing Audio

play_audio(f"output/{name}.wav")

Plays the generated voice response.

Running the Assistant

python Talk_EN.py

The program will:

  1. Take an input audio file (input-th.m4a)
  2. Convert speech to text
  3. Generate a response using DeepSeek-R1 14B
  4. Convert the response into a voice output
  5. Play the generated voice TerminalPreview

Notes

  • The voice tuning applies pitch and filter modifications for a natural Thai accent.
  • The response is always in a cheerful, affectionate style.

Future Improvements

  • Support for more languages
  • Enhanced voice customization
  • Integration with real-time voice input/output

License

This project is open-source and free to use under the MIT License.

About

PreviewπŸ”½

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages