Skip to content

cbmi-group/scarf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCARF: A Single Cell ATAC-seq and RNA-seq Foundation Model

SCARF is a large-scale foundation model designed for single-cell ATAC-seq and RNA-seq.
It provides pretrained weights, preprocessing pipelines, and tutorials to accelerate downstream biological discovery.

image

🚀 System Requirements

  • Operating system: Linux (Ubuntu 20.04+)
  • Python version: == 3.12.3
  • Dependencies:
    • PyTorch >= 2.3.1
    • Scanpy >= 1.11.0
    • Anndata >= 0.9
    • scikit-learn == 1.5.2
    • transformers==4.46.3
    • numpy, pandas, matplotlib, seaborn, jupyter
  • Hardware:
    • CPU: x86_64 architecture (tested on Intel i9 and AMD EPYC)
    • GPU (recommended): NVIDIA GPU with CUDA >= 11.8 (tested on A800, H100)
    • Minimum RAM: 40 GB

⚙️ Installation Guide

1. Clone the repository

git clone https://github.com/JiekaiLab/scarf.git
cd scarf

2. Create conda environment and install dependencies

conda env create -n scarf -f environment.yml

📊 Quick start

We provide example datasets and pretrained models for quick testing.

Download demo data and pretrained model files

Run the notebook (download_data.ipynb) to download automatically:

  • Download the demo dataset (demo_hPBMC.tar.gz) into the data/ folder.

  • Download model files (model_files.tar.gz) and extract:

    • weights/ → into the weights/ folder

    • prior_data/ → into the prior_data/ folder

This ensures all required data and weights are available locally.

Expected runtime on a normal desktop (40GB RAM, no GPU): ~2–3 minutes

Run SCARF on your own data

  1. Preprocess your single-cell data (preprocess.ipynb)

  2. Run inference (embedding.ipynb)

🎯Downstream Tasks

We provide ready-to-use Jupyter notebooks demonstrating how to apply SCARF to different downstream tasks:

📂 Repository Structure

SCARF/
├── data/                 # data for demo
├── downstream_tasks/     # Jupyter notebooks for demo and usage
├── scarf/                # model file
├── prior_data/           # Token dictionaries and metadata
├── scripts/              # Preprocessing and inference scripts
├── weights/              # Pretrained model weights (download from Zenodo)
└── env_setup.sh          # Dependencies

📜 License

This project is released under the GNU General Public License v3.0.
See LICENSE for details.


🔗 Links


📖 Citation

If you use SCARF in your research, please cite:

@misc{SCARF2025,
  title   = {SCARF: A Single Cell ATAC-seq and RNA-seq Foundation Model},
  author  = {Guole Liu#,Tianyu Wang#,Yingying Zhao#,Quanyou Cai#,Xiaotao Wang#,Ziyi Wen,Yaofeng Wang,Lihui Lin*, Yongbing Zhao*, Ge Yang*,Jiekai Chen*},
  year    = {2025},
  url     = {https://github.com/JiekaiLab/scarf},
  doi     = {https://doi.org/10.1101/2025.04.07.647689}
}

About

A foundation model designed for learning the representation of scRNA-seq and scATAC-seq data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •