This repository presents two complementary approaches for detecting Minimal Residual Disease (MRD) in flow cytometry data:
VAE/
: Deep learning-based anomaly detection using Variational AutoencodersGMM/
: Probabilistic modeling with Gaussian Mixture Models
We work with flow cytometry data collected from 12 patients:
- Healthy Patients (P1–P6):
- ~27 million cells
- Used for model training
- Unhealthy Patients (P7–P12):
- ~20 million cells
- Used for evaluation and MRD prediction
Each cell is represented by 14 features. The models are trained to learn the healthy distribution and detect anomalous cells in new patient data, which may indicate MRD.
To accurately identify cells that are anomalous (i.e., likely cancerous) using only unsupervised learning methods trained on healthy patient data. These anomalies collectively form an estimate of MRD (%).
- Learns latent representations via probabilistic encoding/decoding
- Detects anomalies based on reconstruction error (MSE)
- Explored different latent dimensions (2 and 4) and β values
- Uses Leave-One-Patient-Out (LOPO) validation with progressive fine-tuning
- Produces per-cell MSE scores → used to estimate MRD
📎 Explore VAE Approach
📎 VAE Best Model
- Trains a mixture of Gaussians on healthy cell data
- Evaluates likelihood of each new cell under the model
- Low-likelihood cells are flagged as anomalies
- Tried multiple component counts (4, 6, 16)
- Compared
full
vstied
covariance structures - Final threshold: 1.5th percentile of healthy scores
📎 Explore GMM Approach
📎 GMM Best Model
- Mean Squared Error (MSE) between predicted and actual MRD %
- Mean Absolute Error (MAE)
- MRD Estimation for each patient based on anomaly scores
Both models approximate expert-annotated MRD scores with high accuracy.
torch
scikit-learn
numpy, pandas
matplotlib, seaborn
joblib
- PyTorch VAE Tutorial
- Uncovering Anomalies with Variational Autoencoders – Towards Data Science
- Hands-On Anomaly Detection with Variational Autoencoders – Medium
- scikit-learn GMM: https://scikit-learn.org/stable/modules/mixture.html
- Understanding Gaussian Mixture Models – Number Analytics Blog
- PMC Article on GMM & MRD