Direct Transcriber

A local CPU-only batch audio transcription tool optimized for RAGflow integration. Uses OpenAI Whisper models running entirely on your machine - no data leaves your system.

Features

100% Local Processing - No external API calls, all processing on CPU
Automatic Model Selection - Detects available RAM and suggests optimal Whisper model
Batch Processing - Process entire directories of audio files
RAGflow Optimized - Markdown output with metadata and chunking options
Progress Tracking - Rich progress bars for individual files and batches
Multi-format Support - Handles audio (MP3, WAV, M4A, FLAC, OGG, WMA, AAC) and video (MP4, AVI, MOV, MKV, WMV, FLV, WebM, M4V)
Memory Safe - Monitors system resources and prevents OOM errors

Installation

Option 1: Pre-built Docker Images (Recommended)

Using GitHub Container Registry (fastest):

# Use pre-built images - no build required!
docker run --rm -v ./input:/input -v ./output:/output -v ./models:/models \
  ghcr.io/user/direct-transcriber:latest \
  direct-transcriber batch /input --output-dir /output --yes

# Or with docker-compose
curl -O https://raw.githubusercontent.com/user/direct-transcriber/main/docker-compose.ghcr.yml
docker-compose -f docker-compose.ghcr.yml up

Option 2: Local Docker Build

# Clone repository
git clone https://github.com/norandom/direct-transriberr
cd direct-transcriber

# Build and run with Docker Compose
docker-compose up --build

Option 3: Python Installation

Requirements:

Python 3.9+
FFmpeg (for audio processing)
uv (for dependency management)

Install with uv:

# Clone repository
git clone https://github.com/norandom/direct-transriberr
cd direct-transcriber

# Install with uv
uv pip install -e .

System Dependencies:

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

macOS:

brew install ffmpeg

CentOS/RHEL:

sudo yum install ffmpeg

Usage

Docker Usage (Recommended)

Using the helper script for external files:

# Batch process external directory
./scripts/transcribe-external.sh -m /path/to/your/media -o /path/to/output

# Single external file
./scripts/transcribe-external.sh -f /path/to/video.mp4 -o /path/to/output

# With specific model
./scripts/transcribe-external.sh -m /path/to/media -o /path/to/output --model large-v3

Direct Docker Compose usage:

# Copy files to input directory
cp /path/to/your/media/* ./input/

# Run transcription (models will be downloaded to ./models on first run)
docker-compose up --build

# For external directories, set environment variables
MEDIA_DIR=/path/to/your/media OUTPUT_DIR=/path/to/output docker-compose --profile external up --build transcriber-external

Python/uv Usage

Prerequisites:

# Activate the virtual environment
source .venv/bin/activate

Batch Processing:

# Process all audio and video files in a directory
direct-transcriber batch /path/to/media/files/

# With custom output directory
direct-transcriber batch /audio/ --output-dir /transcriptions/

# Force specific model
direct-transcriber batch /audio/ --model large-v3

# Include timestamps for reference
direct-transcriber batch /audio/ --timestamps

# Chunk output for better RAG performance
direct-transcriber batch /audio/ --chunk-size 500

# Save both markdown and JSON
direct-transcriber batch /audio/ --format both

Single File:

# Transcribe single audio file
direct-transcriber single audio.mp3

# Transcribe single video file (extracts audio automatically)
direct-transcriber single video.mp4

# With custom output  
direct-transcriber single media.mp4 --output transcript.md

# JSON output
direct-transcriber single media.mp4 --format json

# RAG-optimized output with intelligent chunking
direct-transcriber single media.mp4 --rag-optimized --chunking-strategy semantic

# Fixed-size chunking for consistent chunk sizes
direct-transcriber single media.mp4 --rag-optimized --chunking-strategy fixed --chunk-size 1000

# Example: Process a video segment with medium model and RAG optimization
direct-transcriber single test_5min.mp4 --model medium --rag-optimized --chunking-strategy semantic --output test_5min_medium.md

Command Options

--model, -m: Whisper model (tiny, base, small, medium, large-v3)
--output-dir, -o: Output directory for batch processing
--format, -f: Output format (md, json, both)
--timestamps: Include timestamps in markdown
--chunk-size: Chunk size for RAG optimization (characters)
--rag-optimized: Enable RAG-optimized output with intelligent chunking
--chunking-strategy: Chunking strategy (semantic, sentence, fixed)
--workers, -w: Number of parallel workers (auto-detected)
--yes, -y: Skip confirmation prompts

Model Selection

The tool automatically selects the best Whisper model based on available RAM:

Model	RAM Required	Quality	Speed
tiny	1 GB	Lowest	Fastest
base	1.5 GB	Good	Fast
small	2 GB	Better	Moderate
medium	4 GB	High	Slower
large-v3	6 GB	Best	Slowest

Output Formats

Clean Markdown (Default)

# Video Transcription
**File:** `/full/path/to/video/meeting.mp4`
**Duration:** 15:32 | **Model:** large-v3 | **Language:** en
**Source:** video | **Transcribed:** 2024-01-15 14:30:22

---

The speaker discusses the importance of machine learning in modern applications. They explain how neural networks can be trained to recognize patterns in data.

Another topic covered is the implementation of transformer models for natural language processing tasks.

With Timestamps

# Audio Transcription
**File:** `/full/path/to/audio/meeting.mp3`

## [00:00] - [02:15]
The speaker discusses the importance of machine learning in modern applications.

## [02:15] - [04:30]
They explain how neural networks can be trained to recognize patterns in data.

Chunked for RAG

# Audio Transcription
**File:** `/full/path/to/audio/meeting.mp3`

## Segment 1 (00:00-05:00)
[Content chunk optimized for semantic search]

## Segment 2 (05:00-10:00)
[Next semantic chunk]

Docker Image

Optimized Container

The Docker image uses a multi-stage build process to minimize size while maintaining full functionality:

Base: Python 3.11-slim for minimal footprint
Size: ~800MB (compared to 2GB+ for standard PyTorch images)
Security: Runs as non-root user
Optimization: Removes test files, caches, and unnecessary components

Available Tags

ghcr.io/norandom/direct-transriberr:latest - Latest stable release
ghcr.io/norandom/direct-transriberr:main - Latest development build
ghcr.io/norandom/direct-transriberr:v1.0.0 - Specific version tags

Docker Configuration

Environment Variables

Create a .env file based on .env.example:

# External media directory (absolute path)
MEDIA_DIR=/path/to/your/media/files

# Output directory for transcriptions (absolute path)
OUTPUT_DIR=/path/to/your/transcriptions

# Model to use (optional, auto-detected if not specified)
WHISPER_MODEL=large-v3

# Memory limit for container (optional)
MEMORY_LIMIT=8G

Volume Mounting

The Docker setup supports several volume mounting options:

Default directories: ./input, ./output, and ./models
Environment variables: Use MEDIA_DIR and OUTPUT_DIR
Single file mounting: Mount specific files for transcription
External script: Use scripts/transcribe-external.sh for easy external file processing
Persistent model storage: Models are downloaded once to ./models and reused

Performance

CPU Optimization: Uses available CPU cores minus 1 for system
Memory Management: Monitors RAM usage and prevents OOM
Batch Processing: Processes multiple files with progress tracking
Format Support: Automatic audio format conversion via FFmpeg
Docker Isolation: Containerized processing with resource limits
Model Persistence: Whisper models downloaded once and cached in ./models
Enhanced Progress: Detailed progress tracking with processing times
CPU Optimized: FP16 warnings suppressed, CPU-specific optimizations

RAG Optimization Features

Direct Transcriber now includes advanced RAG (Retrieval-Augmented Generation) optimizations:

Intelligent Chunking Strategies

Semantic Chunking (Recommended):

Breaks content at natural topic boundaries
Detects discourse markers and transitions
Preserves context and meaning
Ideal for complex discussions and lectures

Sentence Chunking:

Groups sentences into coherent chunks
Respects sentence boundaries
Good for clear, structured speech
Maintains readability

Fixed-Size Chunking:

Consistent chunk sizes with smart overlap
Predictable for downstream processing
Good for batch processing workflows

Enhanced Metadata Extraction

Keyword Extraction: Automatic identification of key terms
Entity Recognition: Names, numbers, times, and proper nouns
Topic Classification: Domain-specific topic identification
Quality Scoring: Confidence-based chunk quality assessment
Context Linking: Inter-chunk relationships and context

RAG-Optimized Output

# Enable RAG optimization
direct-transcriber batch /audio --rag-optimized

# Choose chunking strategy
direct-transcriber batch /audio --rag-optimized --chunking-strategy semantic

# Custom chunk size
direct-transcriber batch /audio --rag-optimized --chunk-size 1500

Output Features:

Structured markdown with semantic sections
JSON sidecar files for programmatic access
Cross-chunk context preservation
Quality metrics and confidence scores
Entity and keyword extraction
Topic classification

Example RAG Output:

# Audio Transcription (RAG Optimized)
**Chunks:** 15 | **Strategy:** SemanticChunking

## Segment 1 (00:00 - 02:30)
**ID:** `lecture_001` | **Quality:** 0.92 | **Topics:** technology, AI

The speaker discusses machine learning fundamentals...

🏷️ **Entities:** Neural Networks, Deep Learning, PyTorch

Integration Benefits

Better Retrieval: Semantic chunks improve search relevance
Context Preservation: Overlapping chunks maintain continuity
Quality Filtering: Low-confidence segments are flagged
Structured Data: JSON output enables programmatic processing
Metadata Rich: Enhanced information for better indexing

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/direct_transcriber		src/direct_transcriber
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Claude.md		Claude.md
Dockerfile		Dockerfile
Dockerfile.distroless		Dockerfile.distroless
Dockerfile.minimal		Dockerfile.minimal
Dockerfile.original		Dockerfile.original
README.md		README.md
docker-compose.ghcr.yml		docker-compose.ghcr.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
test_30s_medium.json		test_30s_medium.json
test_5min.json		test_5min.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Direct Transcriber

Features

Installation

Option 1: Pre-built Docker Images (Recommended)

Option 2: Local Docker Build

Option 3: Python Installation

Usage

Docker Usage (Recommended)

Python/uv Usage

Command Options

Model Selection

Output Formats

Clean Markdown (Default)

With Timestamps

Chunked for RAG

Docker Image

Optimized Container

Available Tags

Docker Configuration

Environment Variables

Volume Mounting

Performance

RAG Optimization Features

Intelligent Chunking Strategies

Enhanced Metadata Extraction

RAG-Optimized Output

Integration Benefits

License

About

Uh oh!

Contributors 2

Uh oh!

Languages

norandom/direct-transriberr

Folders and files

Latest commit

History

Repository files navigation

Direct Transcriber

Features

Installation

Option 1: Pre-built Docker Images (Recommended)

Option 2: Local Docker Build

Option 3: Python Installation

Usage

Docker Usage (Recommended)

Python/uv Usage

Command Options

Model Selection

Output Formats

Clean Markdown (Default)

With Timestamps

Chunked for RAG

Docker Image

Optimized Container

Available Tags

Docker Configuration

Environment Variables

Volume Mounting

Performance

RAG Optimization Features

Intelligent Chunking Strategies

Enhanced Metadata Extraction

RAG-Optimized Output

Integration Benefits

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages