Skip to content

basetenlabs/ml-cookbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Baseten

Inference docs | Training docs

A curated collection of ready-to-use training recipes for machine learning on Baseten. Whether you’re starting from scratch or fine-tuning an existing model, these recipes provide practical, copy-paste solutions for every stage of your ML pipeline.

What's inside

  • Training recipes - End-to-end examples for training models from scratch
  • Fine-tuning workflows - Adapt pre-trained models to your specific use case
  • Best practices - Optimized configurations and common patterns

From data preprocessing to checkpointed and trained models, these recipes cover the complete ML lifecycle on Baseten's platform.

Table of contents

Prerequisites

Before getting started, ensure you have the following:

  • A Baseten account. Sign up here if you don't have one.
    • Add any access tokens, API keys (Example: Huggingface access token, Weights&Biases access token), passwords to securely access credentials from your models in secrets.
    • This is required to access models on Huggingface that have gated access. More information on setting up Huggingface access tokens can be found here.
  • Python 3.8 to 3.11 installed. Conda env recommended.
  • Install Truss, Baseten's open-source model packaging tool to configure and containerize model code.
    • pip install --upgrade truss

Clone this repository

git clone https://github.com/basetenlabs/ml-cookbook.git

Usage

Fine-tune GPT OSS 20B with LoRa and trl

If using a model with gated access, make sure you have access to the model on HuggingFace and your API token uploaded to your secrets. This example requires an HF access token and an optional Weights&Biases access token. To disable W&B, comment out any lines with wandb in examples/oss-gpt-20b-lora/training/config.py and examples/oss-gpt-20b-lora/training/train.py.

Training

examples/oss-gpt-20b-lora/training/train.py contains all training code.

examples/oss-gpt-20b-lora/training/config.py will be the entry point to start training, where you can define your training configuration. This also includes the start commands to launch your training job. Make sure these commands also include any file permission changes to make shell scripts run. We do not change any file system permissions.

Make sure to update hf_access_token in config.py with the same name for this access token saved in your secrets. In this example, we will be writing trained checkpoints directly to Huggingface, the Hub IDs for models and datasets are configured in examples/oss-gpt-20b-lora/training/run.sh. Update run.sh with a repo you have access to write to.

cd examples/oss-gpt-20b-lora/training
truss train push config.py

Upon successful submission, the CLI will output helpful information about your job:

✨ Training job successfully created!
🪵 View logs for your job via `truss train logs --job-id e3m512w [--tail]`
🔍 View metrics for your job via `truss train metrics --job-id e3m512w`

Keep the Job ID handy, as you’ll use it for managing and monitoring your job.

Alternatively, you can view all your training jobs at (https://app.baseten.co/training/)[https://app.baseten.co/training/].

  • As checkpoints are generated, you can access them on Huggingface at the same location defined in run.sh.

Fine-tune Llama 3.1 8b Instruct with LoRa and Unsloth

If using a model with gated access, make sure you have access to the model on HuggingFace and your API token uploaded to your secrets.

Training

examples/llama-finetune-8b-lora/training/train.py contains the training code.

examples/llama-finetune-8b-lora/training/config.py will be the entry point to start training, where you can define your training configuration. This also includes the start commands to launch your training job. Make sure these commands also include any file permission changes to make shell scripts run. We do not change any file system permissions.

cd examples/llama-finetune-8b-lora/training
truss train push config.py

Upon successful submission, the CLI will output helpful information about your job:

✨ Training job successfully created!
🪵 View logs for your job via `truss train logs --job-id e3m512w [--tail]`
🔍 View metrics for your job via `truss train metrics --job-id e3m512w`

Alternatively, you can view all your training jobs at (https://app.baseten.co/training/)[https://app.baseten.co/training/].

In this example, since checkpointing is enabled in config.py, checkpoints are stored in cloud storage and can be accessed with

truss train get_checkpoint_urls --job-id $JOB_ID

Train and deploy an MNIST digit classifier with Pytorch

Training

examples/mnist-single-gpu/training/train_mnist.py contains the a Pytorch example of an MNIST classifier with CNNs.

examples/mnist-single-gpu/training/config.py will be the entry point to start training, where you can define your training configuration. This also includes the start commands to launch your training job. Make sure these commands also include any file permission changes to make shell scripts run. We do not change any file system permissions.

cd examples/mnist-single-gpu/training
truss train push config.py

Upon successful submission, the CLI will output helpful information about your job:

✨ Training job successfully created!
🪵 View logs for your job via `truss train logs --job-id e3m512w [--tail]`
🔍 View metrics for your job via `truss train metrics --job-id e3m512w`

Keep the Job ID handy, as you’ll use it for managing and monitoring your job.

In this example, since checkpointing is enabled in config.py, checkpoints are stored in cloud storage and can be accessed with

truss train get_checkpoint_urls --job-id $JOB_ID

Contributing

Contributions are welcome! Please open issues or submit pull requests.

License

MIT License

About

Ready-to-use ML training recipes to help you build and deploy models on Baseten.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •