All raw FASTQs are accessed through EGA study: EGAS50000000524. All pre-processed and processed multiome and metadata files used in all scripts addressed here can be downloaded from GEO: GSE271675.
src
├── workflows/
│ └── scripts/
├── analysis/
│ └── tf_var/
│ └── heterogeneity/
│ └── cnv/
src/workflows
contains key pipelines used to analyze data:
Snakefile_MULTIOME_preAnnotation
: Generate multiome files from raw 10X outputs (on GEO) and runs WNN analyses. Uses a common ATAC feature space of 5kb bins.Snakefile_CellBender
: Runs CellBender on 10X outputs with sample-specific parameters. Outputs can be used to re-run the above Snakemake file as well.Snakefile_intraPatPeaks
: Runs intra-patient peak calling according to clusters, merges peaks, and regenerates patient-level merged objects (with more accurate chromVar values on intra-patient level merged peaks).- Additionally, it obtains promoter and non-promoter peaks, pseudobulks and normalizes values across clusters, and plots correlation ranges.
- Note: Snakefiles for running sample/Patient level copy number inference are in
src/analysis/cnv
src/analysis
contains various scripts that run on large/merged multiome samples:
tf_var/
: Runs TF variance analyses and associated plots. Requires multiome objects generated by Snakefile_intraPatPeaks (also uploaded on GEO)heterogeneity/
: Obtain entropy values, plot boxplots with tests, etc. Uses outputs from Snakefile_intraPatPeakscnv/
: Snakemake workflows for running sample and patient-level Numbat and plotting outputs, Patient-level Numbat is contingent on sample-level Numbat runs.- Note: it uses outputs from Snakefile_intraPatPeaks/_preAnnotation and corresponding annotated metadata.