Skip to content

MachineLearningLifeScience/HackathonEBM

Repository files navigation

Debiasing with Energy-Based Models (EBMs)

Train an EBM on your dataset using the provided configurations.

📦 Environment

Conda environment (see environment.yml):

conda env create -f environment.yml
conda activate ebm

# or minimal pip (no CUDA pinning)
pip install torch torchvision lightning hydra-core einops matplotlib wandb

CUDA: the default environment.yml pulls pytorch/nvidia channels. Adjust versions if your GPU/driver requires it.


🗂️ Repository layout

HackathonEBM 2/
├─ configs/                 # Hydra configs (data/model/train and full recipes)
│  ├─ data/                 # e.g. mnist.yaml, binary_mnist.yaml, prop_*.yaml
│  ├─ model/                # ebm/, vae/, classifier/ (model blueprints)
│  ├─ train/                # trainer, optimizer, scheduler defaults
│  ├─ mnist_ebm.yaml        # ready-to-run full configs
│  ├─ binary_mnist_iwae.yaml
│  └─ ...                   # other experiment recipes
├─ experiments/             # Experiment runners (e.g., sampling utilities)
├─ notebooks/               # ebm.ipynb, miwae.ipynb, debiasing explorations
├─ src/
│  ├─ data/                 # datasets & loaders (MyMNIST, transforms, utils)
│  ├─ layers/               # ConvEncoder/Decoder, MLPs, etc.
│  ├─ models/               # EBM, VAE, IWAE/MIWAE, base Lightning module
│  ├─ samplers/             # SGLD + utilities
│  ├─ callbacks/            # replay buffer, sampler viz, checkpoint helpers
│  └─ utils.py              # W&B id helpers, misc tools
├─ train.py                 # Main training entrypoint (Hydra-driven)
├─ run_experiment.py        # Post-train experiment/analysis runner
├─ theory_notes.md          # Notes on energy models & debiasing ideas
└─ environment.yml          # Reproducible env

🚀 Quickstart

1) Choose a config

Pick one of the full recipes in configs/, for example:

  • configs/mnist_ebm.yaml — EBM on MNIST with SGLD
  • configs/binary_mnist_iwae.yaml — IWAE on Binary MNIST
  • configs/mnist_miwae_mcar.yaml / mnist_miwae_mnar.yaml — MIWAE with missingness settings
  • configs/prop_* — “proportion/bias” variants used for debiasing experiments

2) Train

# Example: EBM on MNIST
python train.py   --config-name mnist_ebm.yaml   train.batch_size=128 train.epochs=300   train.accelerator=cuda train.devices=1   logger.wandb.project=HackathonEBM  # (optional) if W&B is wired in your config

Hydra tips:

  • Use --config-name to select a full recipe in configs/ (without the path).
  • You can override any dotted key at the CLI, e.g. optimizer.lr=1e-4 or data.dataloader.batch_size=64.
  • Outputs/checkpoints default under ./outputs/<DATE>/<TIME>/ unless overridden by your config.

3) Resume / load from checkpoint

# Continue training from a checkpoint
python train.py --config-name mnist_ebm.yaml train.ckpt_path=path/to/checkpoint.ckpt

# Run a post‑train experiment using a checkpoint
python run_experiment.py   --config-path ./configs --config-name mnist_ebm.yaml   experiment.cfg.ckpt_path=path/to/checkpoint.ckpt

run_experiment.py instantiates experiments/* with your Hydra config to perform tasks like sampling or evaluation against held‑out splits.


Current available experiments :

Proportion variants for debiasing experiments

A specific callback was created to evaluate the debiasing performance of the EBM during training. A classifier needs to be pretrained using the src/models/MNIST_Classifier.py script and the resulting weights need to be provided in the config file. This pretrained classifier is used to classify samples generated by the EBM/VAE during training.

  • configs/prop_binary_mnist_vae.yaml : VAE on biased Binary MNIST (baseline for debiasing experiments). The proportions are set such that all digits don't appear equally in the training set. Simplified so that only digits 0, 1, 2, 3 and 4 appear in the training set.
  • configs/prop_binary_mnist_ebm_debiasing.yaml : EBM trained on the data spaced on biased Binary MNIST (with callback to evaluate debiasing), using the VAE above as base model. (Probably need to retrain the VAE as it is not included in the repo)

Possible future experiments

TODO


TODOs :

  • Add a version of the EBM that operates in the latent space of a pretrained VAE (instead of pixel space)
  • Add other experiments for simpler datasets (UCI?)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •