A Flask web application that converts video files into translated transcripts using AI-powered speech recognition and translation.
- π€ Extract audio from MP4 videos
- π Convert audio to 16kHz WAV format (optimal for speech recognition)
- π£οΈ Transcribe audio using Groq's Whisper model
- π Translate transcripts to multiple languages using Gemma3 AI
- πΎ Download transcripts as JSON files
- π¨ Modern, responsive UI with progress tracking
Before you begin, ensure you have:
- Python 3.8+
- Ollama running locally with Gemma3 model
- Groq API key (for Whisper transcription)
- FFmpeg installed (for audio processing)
-
Clone the repository:
git clone https://github.com/yourusername/video-to-transcript.git cd video-to-transcript
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up grok api key:
set your grok api key in functions/convertedwav_to_transcript.py file
-
Download FFmpeg: Open your terminal and enter this command:
sudo apt-get install ffmpeg
-
Download the gemma3 model: First install ollama in your system then open your terminal and enter this command::
ollama pull gemma3:12b
-
Start the Flask server:
python app.py
-
Access the application: Open your browser and navigate to:
http://localhost:5000
- Upload an MP4 video file
- Select target language for translation
- Click "Process Video"
- Wait for processing to complete
- Download the JSON transcript file
video-to-transcript/
βββ app.py # Main Flask application
βββ main.py # Core processing logic
βββ functions/
β βββ video_to_wav.py # Video to WAV conversion
β βββ wav_to_16kwav.py # Audio format conversion
β βββ convertedwav_to_transcript.py # Speech recognition
β βββ transcript_lan_covert.py # Translation
βββ templates/
β βββ index.html # Frontend interface
βββ static/
β βββ script.js # Client-side JavaScript
β βββ style.css # Styling
βββ outputs/ # Generated transcripts
The application supports translation to:
- Bengali (default)
- English
- Hindi
- Spanish
- French
- Portuguese
- German
- Russian
- Italian
- Dutch
- Chinese (Simplified)
- Japanese
- Korean
- Arabic
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature-branch
) - Commit your changes (
git commit -m 'Add new feature'
) - Push to the branch (
git push origin feature-branch
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Audio processing fails: Ensure FFmpeg is installed and in your PATH
- Translation errors: Verify Ollama is running and Gemma3 model is downloaded
- API errors: Check your Groq API key in the functions/convertedwav_to_transcript.py file
- File permission issues: Ensure the
uploads
andoutputs
directories are writable
For support or questions, please contact [email protected]