Skip to content

πŸŽ₯ AI-powered video-to-transcript converter 🌐 Extracts audio, generates πŸ“ accurate transcripts using Whisper AI, and translates to 15+ languages πŸ€– via Ollama/Gemma.

License

Notifications You must be signed in to change notification settings

cyberytti/Video2Transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ₯ Video to Transcript Converter 🌐

A Flask web application that converts video files into translated transcripts using AI-powered speech recognition and translation.

Project Screenshot

✨ Features

  • 🎀 Extract audio from MP4 videos
  • πŸ”‰ Convert audio to 16kHz WAV format (optimal for speech recognition)
  • πŸ—£οΈ Transcribe audio using Groq's Whisper model
  • 🌍 Translate transcripts to multiple languages using Gemma3 AI
  • πŸ’Ύ Download transcripts as JSON files
  • 🎨 Modern, responsive UI with progress tracking

πŸ“‹ Prerequisites

Before you begin, ensure you have:

  • Python 3.8+
  • Ollama running locally with Gemma3 model
  • Groq API key (for Whisper transcription)
  • FFmpeg installed (for audio processing)

πŸ› οΈ Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/video-to-transcript.git
    cd video-to-transcript
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up grok api key:

    set your grok api key in functions/convertedwav_to_transcript.py file 
    
  5. Download FFmpeg: Open your terminal and enter this command:

    sudo apt-get install ffmpeg
  6. Download the gemma3 model: First install ollama in your system then open your terminal and enter this command::

    ollama pull gemma3:12b

πŸš€ Running the Application

  1. Start the Flask server:

    python app.py
  2. Access the application: Open your browser and navigate to:

    http://localhost:5000
    

πŸ–₯️ Usage

  1. Upload an MP4 video file
  2. Select target language for translation
  3. Click "Process Video"
  4. Wait for processing to complete
  5. Download the JSON transcript file

πŸ“‚ Project Structure

video-to-transcript/
β”œβ”€β”€ app.py                # Main Flask application
β”œβ”€β”€ main.py               # Core processing logic
β”œβ”€β”€ functions/
β”‚   β”œβ”€β”€ video_to_wav.py   # Video to WAV conversion
β”‚   β”œβ”€β”€ wav_to_16kwav.py  # Audio format conversion
β”‚   β”œβ”€β”€ convertedwav_to_transcript.py  # Speech recognition
β”‚   └── transcript_lan_covert.py       # Translation
β”œβ”€β”€ templates/
β”‚   └── index.html        # Frontend interface
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ script.js         # Client-side JavaScript
β”‚   └── style.css         # Styling
└── outputs/              # Generated transcripts

🌐 Supported Languages

The application supports translation to:

  • Bengali (default)
  • English
  • Hindi
  • Spanish
  • French
  • Portuguese
  • German
  • Russian
  • Italian
  • Dutch
  • Chinese (Simplified)
  • Japanese
  • Korean
  • Arabic

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature-branch)
  3. Commit your changes (git commit -m 'Add new feature')
  4. Push to the branch (git push origin feature-branch)
  5. Open a Pull Request

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Troubleshooting

  • Audio processing fails: Ensure FFmpeg is installed and in your PATH
  • Translation errors: Verify Ollama is running and Gemma3 model is downloaded
  • API errors: Check your Groq API key in the functions/convertedwav_to_transcript.py file
  • File permission issues: Ensure the uploads and outputs directories are writable

πŸ“§ Contact

For support or questions, please contact [email protected]

About

πŸŽ₯ AI-powered video-to-transcript converter 🌐 Extracts audio, generates πŸ“ accurate transcripts using Whisper AI, and translates to 15+ languages πŸ€– via Ollama/Gemma.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published