Train an EBM on your dataset using the provided configurations.
Conda environment (see environment.yml
):
conda env create -f environment.yml
conda activate ebm
# or minimal pip (no CUDA pinning)
pip install torch torchvision lightning hydra-core einops matplotlib wandb
CUDA: the default
environment.yml
pullspytorch
/nvidia
channels. Adjust versions if your GPU/driver requires it.
HackathonEBM 2/
├─ configs/ # Hydra configs (data/model/train and full recipes)
│ ├─ data/ # e.g. mnist.yaml, binary_mnist.yaml, prop_*.yaml
│ ├─ model/ # ebm/, vae/, classifier/ (model blueprints)
│ ├─ train/ # trainer, optimizer, scheduler defaults
│ ├─ mnist_ebm.yaml # ready-to-run full configs
│ ├─ binary_mnist_iwae.yaml
│ └─ ... # other experiment recipes
├─ experiments/ # Experiment runners (e.g., sampling utilities)
├─ notebooks/ # ebm.ipynb, miwae.ipynb, debiasing explorations
├─ src/
│ ├─ data/ # datasets & loaders (MyMNIST, transforms, utils)
│ ├─ layers/ # ConvEncoder/Decoder, MLPs, etc.
│ ├─ models/ # EBM, VAE, IWAE/MIWAE, base Lightning module
│ ├─ samplers/ # SGLD + utilities
│ ├─ callbacks/ # replay buffer, sampler viz, checkpoint helpers
│ └─ utils.py # W&B id helpers, misc tools
├─ train.py # Main training entrypoint (Hydra-driven)
├─ run_experiment.py # Post-train experiment/analysis runner
├─ theory_notes.md # Notes on energy models & debiasing ideas
└─ environment.yml # Reproducible env
Pick one of the full recipes in configs/
, for example:
configs/mnist_ebm.yaml
— EBM on MNIST with SGLDconfigs/binary_mnist_iwae.yaml
— IWAE on Binary MNISTconfigs/mnist_miwae_mcar.yaml
/mnist_miwae_mnar.yaml
— MIWAE with missingness settingsconfigs/prop_*
— “proportion/bias” variants used for debiasing experiments
# Example: EBM on MNIST
python train.py --config-name mnist_ebm.yaml train.batch_size=128 train.epochs=300 train.accelerator=cuda train.devices=1 logger.wandb.project=HackathonEBM # (optional) if W&B is wired in your config
Hydra tips:
- Use
--config-name
to select a full recipe inconfigs/
(without the path). - You can override any dotted key at the CLI, e.g.
optimizer.lr=1e-4
ordata.dataloader.batch_size=64
. - Outputs/checkpoints default under
./outputs/<DATE>/<TIME>/
unless overridden by your config.
# Continue training from a checkpoint
python train.py --config-name mnist_ebm.yaml train.ckpt_path=path/to/checkpoint.ckpt
# Run a post‑train experiment using a checkpoint
python run_experiment.py --config-path ./configs --config-name mnist_ebm.yaml experiment.cfg.ckpt_path=path/to/checkpoint.ckpt
run_experiment.py
instantiatesexperiments/*
with your Hydra config to perform tasks like sampling or evaluation against held‑out splits.
A specific callback was created to evaluate the debiasing performance of the EBM during training. A classifier needs to be pretrained using the src/models/MNIST_Classifier.py
script and the resulting weights need to be provided in the config file. This pretrained classifier is used to classify samples generated by the EBM/VAE during training.
configs/prop_binary_mnist_vae.yaml
: VAE on biased Binary MNIST (baseline for debiasing experiments). The proportions are set such that all digits don't appear equally in the training set. Simplified so that only digits 0, 1, 2, 3 and 4 appear in the training set.configs/prop_binary_mnist_ebm_debiasing.yaml
: EBM trained on the data spaced on biased Binary MNIST (with callback to evaluate debiasing), using the VAE above as base model. (Probably need to retrain the VAE as it is not included in the repo)
TODO
- Add a version of the EBM that operates in the latent space of a pretrained VAE (instead of pixel space)
- Add other experiments for simpler datasets (UCI?)