Meeting Transcriber & Summarizer (FFmpeg v8 Whisper + LM Studio / Ollama )
A tiny Gradio app that records from your browser microphone or uploads an audio file, transcribes it locally using FFmpeg v8 + the whisper filter (with Whisper.cpp models), and sends the transcript to a local LLM (LM Studio) to produce a clean, structured meeting minutes (Markdown).
Highlights
- One-click record or upload (Gradio 4.x
Audio).- Robust capture: pre‑roll (to avoid missing the first words), optional VAD, and input normalization to WAV 16 kHz mono.
- Flexible output: text / srt / json.
- Built‑in prompt template CRUD persisted to
prompt_templates.jsonand used as system prompt for LM Studio.- Persistent UI options (language 🇬🇧/🇫🇷 and light/dark theme) saved to
ui_settings.json.- New Transcriptions tab with full CRUD to revisit transcripts and their summaries.
-
Gradio UI (microphone/upload →
Audiocomponent). -
FFmpeg 8 +
whisperfilter (with Whisper.cpp ggml model) → transcript file.- Optional pre‑roll (adds a short silence at the start) and VAD.
- Audio is normalized to WAV 16 kHz mono for reliability.
-
LM Studio (OpenAI‑compatible API) → structured Markdown meeting minutes. You can use Ollama too.
-
Python 3.10+
-
FFmpeg 8.0+ compiled with the
whisperfilter (requires Whisper.cpp). Verify with:- Linux/macOS:
ffmpeg -hide_banner -filters | grep whisper - Windows:
ffmpeg -hide_banner -filters | findstr whisper
- Linux/macOS:
-
LM Studio running in local server/developer mode (OpenAI‑compatible), or Ollama, usually at
http://localhost:1234. -
Whisper.cpp ggml model file(s), e.g.
ggml-large-v3-turbo.bin.
Python packages:
gradio>=4.44.1
uvicorn>=0.30
starlette>=0.37
anyio>=4.4
h11>=0.14
httpx>=0.27
httpcore>=1.0
python-dotenvPrivacy: All audio processing happens locally via FFmpeg; the transcript is summarized by your local LM Studio instance.
-
Clone the repo and enter it.
git clone https://github.com/magicmars35/AutoTranscriptReport cd AutoTranscriptReport -
(Optional but recommended) Create a venv and activate it.
For Windows :
python -m venv venv venv\Scripts\activate
-
Install deps:
pip install -r requirements.txt
-
Place a Whisper.cpp model under
./models/, e.g../models/ggml-large-v3-turbo.bin. Models can be found here : https://huggingface.co/ggerganov/whisper.cpp/tree/main -
Ensure FFmpeg 8 supports the
whisperfilter (see Requirements above). -
Start LM Studio (or Ollama) in server mode (Developer tab → Start server). Default base URL is
http://localhost:1234and the API path is usually/v1/chat/completions.
Create a .env at the project root if you want to override defaults:
# LM Studio
LMSTUDIO_BASE_URL=http://localhost:1234
LMSTUDIO_API_PATH=/v1/chat/completions
LMSTUDIO_MODEL=Qwen2.5-7B-Instruct
LMSTUDIO_API_KEY=lm-studio
# FFmpeg + Whisper
FFMPEG_BIN=ffmpeg
WHISPER_MODEL_PATH=./models/ggml-large-v3-turbo.bin
WHISPER_LANGUAGE=fr
# Templates storage
TEMPLATES_PATH=prompt_templates.jsonDefaults are the same as the example above. You can also edit the constants at the top of the Python file.
python app.pyGradio will print a local URL (e.g. http://127.0.0.1:7860).
-
Record or Upload: Use the single
Audiowidget to record from the browser mic or upload an existing file (.wav,.mp3,.m4a, etc.). -
Choose output format:
srt(default) is great to visually verify timestamps;textfor plain text;jsonfor programmatic use. -
(Optional) Advanced settings:
- Pre‑roll (ms): add a short silence at the start so the very first words aren’t cut (try 250–500 ms).
- Queue (ms): buffering window for VAD; larger values may help stabilize segmentation.
- VAD (Silero): enable only if you provide a Silero VAD ggml model path; otherwise keep it off to preserve natural pauses.
- Click Transcribe → FFmpeg runs the
whisperfilter and writes the transcript file into./transcripts/. - Inspect the transcript/SRT/JSON shown in the UI.
- Pick or edit a Prompt Template and click Summarize → LM Studio returns a structured Markdown meeting minutes.
- (Optional) Use the Options tab to switch UI language (English/French) or theme (light/dark). Choices persist to
ui_settings.json.
- Templates are persisted to
prompt_templates.jsonin the project root. - Each entry is a mapping of name ➝ prompt content.
- The UI provides buttons to Reload, Save, and Delete templates. The selected template’s content is sent as the system prompt.
- You can also hand‑edit
prompt_templates.jsonwhile the app is stopped.
Example prompt_templates.json:
{
"default": "You are an assistant specializing in meeting minutes...",
"brief": "Write a concise summary focusing on decisions and actions."
}The app trims the template content to a safe length before sending it to LM Studio.
- The Options tab lets you choose interface language (English or French) and theme (light or dark).
- Labels for each language are stored in
strings/<lang>.jsonfor easy translation. - Selections are persisted to
ui_settings.jsonso your preferences are restored on next launch. - Delete or edit this file to reset the UI settings.
Place your chosen Whisper.cpp ggml model file under ./models and point WHISPER_MODEL_PATH to it. Recommendations:
ggml-large-v3-turbo.bin(~1.5 GB): great quality/speed balance, multilingual.ggml-large-v3.bin(~2.9 GB): highest quality, heavier.ggml-medium.bin(~1.5 GB): good quality, lighter than large.ggml-small.bin(~466 MB): fast and decent for French.
All models are multilingual unless the filename ends with .en.
Models can be found here : https://huggingface.co/ggerganov/whisper.cpp/tree/main
- Check
whisperfilter availability: ifffmpeg -filtersdoes not listwhisper, your build isn’t compatible. Install/compile FFmpeg 8 with Whisper support. - Windows paths: the FFmpeg filter parser doesn’t like drive letters (
D:) and backslashes. This app writes transcripts using relative POSIX‑style paths (slashes/) to avoid that. - Microphone quirks / first words missing: increase Pre‑roll (e.g., 300–800 ms). Keep VAD off unless you really need it. The app also auto‑converts inputs to WAV 16 kHz mono for consistency.
- Only partial audio transcribed: try disabling VAD, increase
queue (ms), and ensure your input isn’t corrupted. - LM Studio errors (404/connection refused): make sure LM Studio’s local server is running and that
LMSTUDIO_BASE_URLandLMSTUDIO_API_PATHmatch your version. - Slow on CPU: prefer
large‑v3‑turboorsmall; quantized variants can help.
- Export DOCX/PDF for the final minutes.
- Diarization / speaker labels.
- Multi‑language post‑processing and translation.
- Batch mode and watch folders.
MIT