ForgeLLM is a comprehensive platform for continued pre-training and instruction fine-tuning of large language models using MLX on Apple Silicon.
- π Train: Continued pre-training (CPT) via web interface (IFT coming soon - see Development Perspectives)
- π Monitor: Real-time training dashboards and checkpoint management
- π Compare: Enable comparison of multiple training sessions with validation loss, perplexity, stability and generalization gap
- π Fuse: Merge LoRA/DoRA adapters with base models for deployment
- β‘ Quantize: Convert models to 8-bit or 4-bit precision for efficient deployment
- π¬ Chat & Test: Interactive chat with models and adapters via CLI or web
- π¦ Publish: Convert and publish trained models with comprehensive documentation
# Install latest version
pip install forgellm
# Install specific version
pip install forgellm==0.4.7
# Upgrade existing installation
pip install --upgrade forgellm
git clone https://github.com/lpalbou/forgellm.git
cd forgellm
pip install -e .
Requirements: Python 3.9+ and Apple Silicon Mac (M1/M2/M3/M4). All dependencies including MLX are installed automatically.
# Install HuggingFace CLI
pip install huggingface_hub
# Download a model (examples)
huggingface-cli download mlx-community/gemma-3-1b-it-bf16 # Small model
huggingface-cli download mlx-community/Qwen3-4B-bf16 # Medium model
# Start both servers (recommended)
forgellm start
# Opens web interface at http://localhost:5002
# Model server runs at http://localhost:5001
That's it! π
The web interface provides everything you need:
forgellm start # Start both servers
# or
forgellm web --port 5002 # Web interface only
forgellm server --port 5001 # Model server only (separate terminal)
Web Interface Features:
- Training Tab: Configure and start CPT training (IFT support coming soon)
- Monitoring Tab: View training progress and dashboards
- Testing Tab: Chat with models and test different prompts
The CLI is perfect for quick model testing and interactive chat:
# Interactive chat with a model (REPL mode)
forgellm cli generate --model mlx-community/gemma-3-1b-it-bf16
# Single prompt test
forgellm cli generate --model mlx-community/gemma-3-1b-it-bf16 --prompt "Hello, how are you?"
# Get model architecture info
forgellm cli info --model mlx-community/gemma-3-1b-it-bf16
# Test with an adapter (your trained model)
forgellm cli generate --model mlx-community/Qwen3-4B-bf16 --adapter-path models/cpt/my_trained_model
REPL Mode Commands:
- Type normally to chat
/help
- Show available commands/q
or/exit
- Quit/stats
- Show session statistics/system [prompt]
- Set/show system prompt
ForgeLLM works with MLX-compatible models from HuggingFace. All models are cached locally in ~/.cache/huggingface/hub/
.
Small Models (1-2B) - Good for testing:
huggingface-cli download mlx-community/gemma-3-1b-it-bf16
huggingface-cli download mlx-community/gemma-3-1b-pt-bf16
Medium Models (3-4B) - Good balance:
huggingface-cli download mlx-community/Qwen3-4B-bf16
huggingface-cli download mlx-community/gemma-3-4b-it-bf16
Large Models (7-8B) - Best quality:
huggingface-cli download mlx-community/Qwen3-8B-bf16
huggingface-cli download mlx-community/Meta-Llama-3.1-8B-Instruct-bf16
- Base Models (
-bf16
,-pt-
): Ideal for continued pre-training, clean slate for domain adaptation - Instruct Models (
-it-
,-Instruct-
): Can also be used for continued pre-training with careful data mixing - Quantized Models (
-4bit
,-8bit
): Smaller memory usage, slightly lower quality
Base Models (Recommended for CPT):
- β No instruction-following capabilities to preserve
- β Clean foundation for domain-specific knowledge
- β Higher learning rates and longer training possible
Instruct Models (Advanced CPT):
- β Better at learning from complex documents (recent research)
β οΈ Requires careful data mixing (1-5% original pretraining data)β οΈ Lower learning rates to prevent catastrophic forgettingβ οΈ Shorter training to avoid losing instruction-following abilities
Choose base models for straightforward domain adaptation, instruct models when you need better knowledge absorption from complex documents.
π For detailed CPT best practices and latest research findings, see docs/cpt.md
- Prepare Data: Place text files in
dataset/
directory - Start Web Interface:
forgellm start
- Training Tab: Configure model, data, and parameters
- Monitor: Watch progress in real-time
- Publish: Convert best checkpoints to full models
Training is currently only available through the web interface.
IFT capabilities are currently in development. For technical details and implementation roadmap, see Development Perspectives.
forgellm/
βββ dataset/ # Your training data (text files)
βββ models/ # Trained model outputs
β βββ cpt/ # Continued pre-training models
β βββ ift/ # Instruction fine-tuning models (coming soon)
βββ data/ # Processed training data
forgellm start # Start both servers (recommended)
forgellm web [--port 5002] # Web interface only
forgellm server [--port 5001] # Model server only
forgellm cli <command> # Command-line operations
# Interactive chat (REPL mode)
forgellm cli generate --model <model>
# Single prompt
forgellm cli generate --model <model> --prompt "Your question"
# Model information
forgellm cli info --model <model>
# Test with adapter
forgellm cli generate --model <model> --adapter-path <path>
- Hardware: Apple Silicon Mac (M1/M2/M3/M4)
- Memory: 16GB+ RAM recommended
- Storage: 5-20GB per model
- Python: 3.9+
- MLX: Automatically installed
ForgeLLM uses a clean separation:
- Model Server (
forgellm server
): Handles model loading and inference - Web Server (
forgellm web
): Provides UI and training coordination - CLI (
forgellm cli
): Direct model interaction and testing
This allows you to use just the CLI for testing, or the full web interface for training.
- Getting Started: Complete setup and first training session
- Architecture: System design and component overview
- Data Flow: How data moves through the system
- API Reference: Complete REST API and CLI documentation
- CPT Best Practices: Advanced continued pre-training techniques
- Development Perspectives: Current capabilities and IFT roadmap
- Architecture: Multi-process design with model server separation
- Training Pipeline: Real-time monitoring with automatic checkpoint management
- Model Publishing: LoRA to full model conversion with comprehensive documentation
- Error Recovery: Robust error handling and automatic recovery mechanisms
All notable changes to this project are documented in the CHANGELOG.md file.
Contributions welcome! Please submit pull requests.
MIT License - see LICENSE file.