SCARF is a large-scale foundation model designed for single-cell ATAC-seq and RNA-seq.
It provides pretrained weights, preprocessing pipelines, and tutorials to accelerate downstream biological discovery.

- Operating system: Linux (Ubuntu 20.04+)
- Python version: == 3.12.3
- Dependencies:
- PyTorch >= 2.3.1
- Scanpy >= 1.11.0
- Anndata >= 0.9
- scikit-learn == 1.5.2
- transformers==4.46.3
- numpy, pandas, matplotlib, seaborn, jupyter
- Hardware:
- CPU: x86_64 architecture (tested on Intel i9 and AMD EPYC)
- GPU (recommended): NVIDIA GPU with CUDA >= 11.8 (tested on A800, H100)
- Minimum RAM: 40 GB
git clone https://github.com/JiekaiLab/scarf.git
cd scarf
conda env create -n scarf -f environment.yml
We provide example datasets and pretrained models for quick testing.
Run the notebook (download_data.ipynb) to download automatically:
-
Download the demo dataset (demo_hPBMC.tar.gz) into the data/ folder.
-
Download model files (model_files.tar.gz) and extract:
-
weights/ → into the weights/ folder
-
prior_data/ → into the prior_data/ folder
-
This ensures all required data and weights are available locally.
-
Preprocess your single-cell data (preprocess.ipynb)
-
Run inference (embedding.ipynb)
We provide ready-to-use Jupyter notebooks demonstrating how to apply SCARF to different downstream tasks:
-
Cell type prediction (CellType_prediction.ipynb)
Predicts cell type labels from multi-omic embeddings. -
Cell Matching (Cell-matching.ipynb)
Aligns and matches cells across modalities (scRNA-seq and scATAC-seq). -
Cell RNA-Inference (RNA-Inference.ipynb)
Predicts gene expression of cells through scATAC-seq data.
SCARF/
├── data/ # data for demo
├── downstream_tasks/ # Jupyter notebooks for demo and usage
├── scarf/ # model file
├── prior_data/ # Token dictionaries and metadata
├── scripts/ # Preprocessing and inference scripts
├── weights/ # Pretrained model weights (download from Zenodo)
└── env_setup.sh # Dependencies
This project is released under the GNU General Public License v3.0.
See LICENSE for details.
- GitHub Repository: JiekaiLab/scarf
- Pretrained weights & large files:
If you use SCARF in your research, please cite:
@misc{SCARF2025,
title = {SCARF: A Single Cell ATAC-seq and RNA-seq Foundation Model},
author = {Guole Liu#,Tianyu Wang#,Yingying Zhao#,Quanyou Cai#,Xiaotao Wang#,Ziyi Wen,Yaofeng Wang,Lihui Lin*, Yongbing Zhao*, Ge Yang*,Jiekai Chen*},
year = {2025},
url = {https://github.com/JiekaiLab/scarf},
doi = {https://doi.org/10.1101/2025.04.07.647689}
}