Machine Learning for Aging Analysis in Drosophila Single-Cell Data
TimeFlies is a comprehensive machine learning framework for analyzing aging patterns in Drosophila single-cell RNA sequencing data. It provides deep learning models, model interpretability analysis, batch correction capabilities, and a complete research workflow.
# Download and run the installer
curl -O https://raw.githubusercontent.com/rsinghlab/TimeFlies/main/install_timeflies.sh
chmod +x install_timeflies.sh
./install_timeflies.sh
The installer automatically activates TimeFlies. For new terminal windows:
source .activate.sh
Command Line Interface:
# Complete setup workflow (customize configs/setup.yaml first)
timeflies setup [--batch-correct]
# Web-based GUI (recommended for beginners)
timeflies gui
# Batch correction (automatic environment switching)
timeflies batch-correct
# Train models with automatic evaluation
timeflies train [--with-eda --with-analysis --batch-corrected]
# Evaluate trained models
timeflies evaluate [--with-eda --with-analysis]
# Automated multi-model training queue
timeflies queue [configs/model_queue.yaml] [--no-resume]
# Automated hyperparameter tuning (customize configs/hyperparameter_tuning.yaml first)
timeflies tune [--no-resume]
# Run project-specific analysis
timeflies analyze
Web-Based GUI (Recommended):
For users who prefer a graphical interface, run timeflies gui
to launch a modern web-based interface:
- Works everywhere: Any browser, any OS, no system dependencies
- Point-and-click workflow: Setup, training, batch correction, hyperparameter tuning
- Real-time progress: Live updates and comprehensive logging
- Remote access: Can be accessed from other devices (with --share flag)
- Mobile-friendly: Works on tablets and smartphones
- Automatic environment handling: Seamlessly manages virtual environments
Usage:
timeflies gui # Launch on http://localhost:7860
timeflies gui --port 8080 # Use different port
timeflies gui --share # Create public URL (use with caution)
Both CLI and web GUI provide identical functionality - choose what works best for you.
TimeFlies uses modular configuration files in the configs/
directory:
- default.yaml: Main configuration (project, model, data settings)
- setup.yaml: Data splitting settings (split_size, stratify_by)
- batch_correction.yaml: Batch correction and PyTorch settings
- hyperparameter_tuning.yaml: Hyperparameter search ranges for tuning
- model_queue.yaml: Sequential model training configurations
- Data Setup: Place your
*_original.h5ad
files indata/[project]/[tissue]/
- Configuration: Edit
configs/setup.yaml
for data splitting (split_size, stratify_by, etc.) - Setup: Run
timeflies setup
to create train/eval splits and verify system - Training: Run
timeflies train
for model training with automatic evaluation - Evaluation: Run
timeflies evaluate
to assess model performance on test data - Analysis: Results available in
outputs/[project]/
with model interpretability
If you need to change splitting parameters (e.g., different stratification or split size):
# Edit configs/setup.yaml with new parameters
timeflies setup --force-split # Recreates splits, preserves batch-corrected files
Smart behavior:
--force-split
removes existing train/eval splits and recreates them- Batch-corrected files are preserved (never deleted)
- Verifies batch-corrected files still match new splits exactly
TimeFlies generates comprehensive outputs organized by project and analysis type:
outputs/
├── fruitfly_aging/ # Project-specific results
│ ├── experiments/ # Model training results
│ │ ├── uncorrected/ # Non-batch-corrected results
│ │ │ └── all_runs/
│ │ │ └── head_cnn_age/ # Config-specific experiments (tissue_model_target)
│ │ │ ├── 2024-08-25_10-30-15/ # Individual experiment
│ │ │ │ ├── model.h5 # Trained TensorFlow model
│ │ │ │ ├── training/ # Training artifacts
│ │ │ │ │ ├── history.json # Training metrics & loss curves
│ │ │ │ │ ├── logs/ # Training logs
│ │ │ │ │ └── plots/ # Training visualizations
│ │ │ │ ├── evaluation/ # Test results
│ │ │ │ │ ├── metrics.json # Performance metrics (accuracy, F1, precision, recall, AUC, baselines)
│ │ │ │ │ ├── predictions.csv # Model predictions
│ │ │ │ │ └── plots/ # Performance visualizations
│ │ │ │ │ ├── confusion_matrix.png
│ │ │ │ │ ├── roc_curve.png
│ │ │ │ │ └── classification_report.png
│ │ │ │ ├── shap_analysis/ # SHAP interpretability
│ │ │ │ │ ├── shap_values.csv
│ │ │ │ │ ├── shap_summary.png
│ │ │ │ │ └── feature_importance.png
│ │ │ │ └── metadata.json # Experiment reproducibility info
│ │ │ ├── latest -> 2024-08-25_10-30-15/ # Symlink to most recent
│ │ │ └── best -> 2024-08-25_10-30-15/ # Symlink to best performance
│ │ ├── batch_corrected/ # Batch-corrected results (same structure)
│ │ └── queue_experiment_2024-08-25/ # Model queue results
│ │ ├── model_comparison_report.md # Queue summary report
│ │ ├── model_metrics.csv # All models comparison
│ │ └── individual_model_results/ # Links to experiment dirs
│ ├── hyperparameter_tuning/ # Hyperparameter optimization
│ │ └── hyperparameter_search_2024-08-25_16-30-45/
│ │ ├── hyperparameter_search_report.md # Best trials & selection reasoning
│ │ ├── hyperparameter_search_metrics.csv # All trials data for analysis
│ │ ├── checkpoint.json # Resume capability for interrupted searches
│ │ ├── search_config.yaml # Configuration backup for reproducibility
│ │ └── optuna_study.db # Bayesian optimization database (if using Optuna)
│ └── eda/ # Exploratory data analysis
│ └── head/ # Tissue-specific analysis
│ ├── uncorrected/ # Raw data EDA
│ │ ├── eda_report.html # Interactive analysis report
│ │ ├── plots/ # EDA visualizations
│ │ │ ├── age_distribution.png
│ │ │ ├── correlation_matrix.png
│ │ │ └── dimensionality_reduction.png
│ │ └── eda_summary.json # Statistical summaries
│ └── batch_corrected/ # Batch-corrected EDA (same structure)
└── fruitfly_alzheimers/ # Separate project outputs
└── [same structure as above]
- Experiment Results: Each training run gets its own timestamped directory with model files, predictions, and analysis
- Hyperparameter Reports: Comprehensive analysis of why best parameters were selected with trial comparisons
- Model Queue Reports: Comparison across multiple model configurations with links to individual experiments
- EDA Reports: Data quality and distribution analysis organized by tissue and batch correction
- SHAP Analysis: Model interpretability and feature importance stored within each experiment
- Fruitfly Aging: Healthy aging analysis in Drosophila head tissue
- Fruitfly Alzheimer's: Disease model analysis with neurodegeneration patterns
- Custom Projects: Any single-cell transcriptomics data in AnnData format
- Deep Learning Models: CNN, MLP architectures for single-cell analysis
- Traditional ML: XGBoost, Random Forest, Logistic Regression for comparison studies
- Automated Evaluation: Built-in performance metrics and automatic post-training evaluation
- Baseline Comparisons: Automatic comparison against random classifier, majority class, and stratified random baselines
- Model Interpretability: Feature importance analysis with SHAP (configurable)
- Model Queue System: Automated sequential training of multiple models with different configurations
- Hyperparameter Tuning: Grid, random, and Bayesian optimization with CNN architecture variants
- Batch Correction: scVI-tools integration with automatic environment management
- Per-project enable/disable configuration
- Proper ML workflow preventing data leakage (train/eval splits)
- Seamless environment switching for dependencies
- Smart Splitting: Stratified train/eval splits preserving biological structure
- Quality Control: Automated data validation and preprocessing
- 3-Tier Test Data: Tiny/synthetic/real fixtures for reliable development
- Comprehensive EDA: Exploratory data analysis with automated reporting
- Flexible Configuration: YAML-based project and model settings
- Sequential Training: Train multiple models automatically with progress tracking
- Configuration Overrides: Per-model settings for preprocessing, hyperparameters, and analysis options
- Checkpoint/Resume: Automatic saving and resuming of interrupted training sessions
- Comprehensive Reports: Markdown summaries and CSV exports for model comparison
- Flexible Preprocessing: Different batch correction, filtering, and splitting methods per model
- Model Queue System Guide - Complete guide for automated multi-model training
- Hyperparameter Tuning Guide - Grid, random, and Bayesian optimization with CNN variants
- Analysis Templates Guide - Custom analysis script templates and examples
- Development Roadmap - Current development status and future plans
- Templates: Pre-built analysis scripts in
templates/
directory - Configuration: YAML examples in
configs/
directory - Test Data: 3-tier test fixtures in
tests/fixtures/
All 12 CLI commands with their full options:
timeflies setup [--batch-correct] [--force-split] [--dev] # Complete setup workflow
timeflies train [--with-eda] [--with-analysis] # Train models (includes automatic evaluation)
timeflies evaluate [--with-eda] [--with-analysis] [--interpret] [--visualize] # Evaluate models on test data
timeflies analyze [--predictions-path PATH] [--analysis-script PATH] [--with-eda] # Project-specific analysis scripts
timeflies queue [configs/model_queue.yaml] [--no-resume] # Automated multi-model training queue (see docs/model_queue_guide.md)
timeflies tune [--no-resume] # Hyperparameter optimization using configs/hyperparameter_tuning.yaml (see docs/hyperparameter_tuning_guide.md)
timeflies split [--force-split] # Create train/eval splits
timeflies eda [--save-report] # Exploratory data analysis
timeflies batch-correct # Create batch-corrected files (requires .venv_batch)
timeflies verify # System verification
timeflies test [unit|integration|functional|system|all] [--coverage] [--verbose] [--fast] [--debug] [--rerun]
timeflies create-test-data [--tier tiny|synthetic|real|all] [--cells N] [--genes N] [--batch-versions]
Keep TimeFlies Updated: Use timeflies update
to get the latest features and bug fixes:
# Update to latest version from GitHub main branch
timeflies update
What happens during update:
- Downloads latest TimeFlies code from GitHub
- Updates the installed package via pip
- Smart file management - updates system files while preserving your work:
Files that get UPDATED:
.timeflies_src/
- source code and templates (completely refreshed)TimeFlies_Launcher.py
- GUI launcher (only if content changed)- Official templates -
README.md
, analysis examples (updated for new features) - Missing config files - adds new configs like
setup.yaml
,hyperparameter_tuning.yaml
Files that are PRESERVED (never touched):
-
data/
- your datasets and H5AD files -
outputs/
- all experiments, analysis results, and trained models -
configs/
- your customized configuration settings -
Custom templates - any analysis scripts you created
-
Requires Git to be installed on your system
GUI Users: Use the "Update TimeFlies" button in the Results tab for the same functionality.
--verbose # Detailed logging
--batch-corrected # Use existing batch-corrected data (any command)
--tissue head|body # Override tissue type
--model CNN|MLP|xgboost|random_forest|logistic # Override model type
--target age # Override target variable
--aging # Use fruitfly_aging project
--alzheimers # Use fruitfly_alzheimers project
TimeFlies uses YAML configuration files to control model training, evaluation, and analysis settings. The main configuration is in configs/default.yaml
.
Control SHAP interpretation and visualizations:
# Feature importance analysis
interpretation:
shap:
enabled: false # Enable/disable SHAP interpretation (includes visualizations)
load_existing: false # Load existing SHAP values instead of computing
reference_size: 100 # Reference size for SHAP analysis
# Visualizations
visualizations:
enabled: true # Enable general visualizations (training plots, confusion matrix, ROC curves, etc.)
Configure project-specific analysis workflows:
analysis:
# Exploratory data analysis
eda:
enabled: false
# Run project-specific analysis scripts
run_analysis_script:
enabled: false # Set to true to run project-specific analysis after training
Override configuration settings using command-line flags:
# Force SHAP interpretation (overrides config)
timeflies evaluate --interpret
# Force visualizations (overrides config)
timeflies evaluate --visualize
# Use custom analysis script
timeflies analyze --analysis-script templates/my_custom_analysis.py
# Combine flags
timeflies evaluate --interpret --visualize --with-analysis
Create custom analysis workflows using templates:
# Copy template and customize
cp templates/aging_analysis_template.py templates/my_analysis.py
# Run your custom analysis
timeflies analyze --analysis-script templates/my_analysis.py
Available templates:
templates/custom_analysis_example.py
- Basic template with all featurestemplates/aging_analysis_template.py
- Aging-specific analysis patterns- templates/README.md - Full documentation and examples
# Clone repository
git clone https://github.com/rsinghlab/TimeFlies.git
cd TimeFlies
# Setup development environments (creates .venv + .venv_batch with all dependencies)
python3 run_timeflies.py setup --dev
# Activate development environment
source .activate.sh
# Now you can use timeflies command directly
timeflies verify
timeflies test --coverage
timeflies create-test-data --tier tiny # (optional - already included)
# For batch correction development (specialized)
source .activate_batch.sh # PyTorch + scVI environment for testing batch correction code
- Tiny: 50 cells, 100 genes (committed, fast CI/CD)
- Synthetic: 500 cells, 1000 genes (generated from metadata)
- Real: 5000 cells, 2000 genes (performance testing)
timeflies create-test-data --tier tiny --batch-versions # (optional - already committed)
timeflies create-test-data --tier synthetic --batch-versions # Generate on-demand for testing
TimeFlies/
├── configs/ # YAML configuration files
├── src/ # Source code
│ └── common/ # Framework components
│ ├── analysis/ # EDA and visualization tools
│ ├── cli/ # Command-line interface
│ ├── core/ # Pipeline and configuration management
│ ├── data/ # Data loading and preprocessing
│ ├── evaluation/ # Model evaluation and metrics
│ ├── models/ # ML model implementations
│ └── utils/ # Utilities and helpers
├── tests/ # Test suite with 3-tier test data
│ ├── fixtures/ # Test data (tiny/synthetic/real)
│ └── outputs/ # Test outputs (temporary)
├── templates/ # Analysis script templates
├── docs/ # Documentation and notebooks
├── install_timeflies.sh # One-click installer
├── run_timeflies.py # Main CLI entry point
└── TimeFlies_Launcher.py # GUI Launcher
After installation, users work with this structure in their project directory:
your_project/
├── configs/ # Configuration directory created by TimeFlies setup
│ ├── default.yaml # Main configuration (customize your settings)
│ ├── setup.yaml # Data splitting configuration
│ └── ... # Other config files
├── templates/ # Analysis script templates (created by setup)
│ ├── aging_analysis_template.py
│ ├── custom_analysis_example.py
│ └── README.md
├── data/ # Your input datasets
│ ├── fruitfly_aging/
│ │ └── head/
│ │ ├── *_original.h5ad # Your raw data files
│ │ ├── *_train.h5ad # Generated by 'split' command
│ │ └── *_eval.h5ad # Generated by 'split' command
│ └── fruitfly_alzheimers/
│ └── head/
│ └── *_original.h5ad # Your raw data files
└── outputs/ # All results generated by TimeFlies
└── [see Output Structure below]
- Install TimeFlies:
curl -O https://raw.githubusercontent.com/.../install_timeflies.sh && chmod +x install_timeflies.sh && ./install_timeflies.sh
- Activate:
source .activate.sh
(installs timeflies command to system) - Add your data: Place
*_original.h5ad
files indata/[project]/[tissue]/
- Setup:
timeflies setup
(creates configs/, templates/, splits data, verifies system) - Configure: Edit configs for your project settings (or use GUI:
python TimeFlies_Launcher.py
) - Run workflow:
timeflies train && timeflies evaluate
- Python: 3.12+
- OS: Linux, macOS, Windows (WSL2)
- Memory: 8GB+ recommended for larger datasets
- Storage: 2GB+ for environments and test data
TimeFlies is designed for researchers studying:
- Aging mechanisms in model organisms
- Single-cell transcriptomic changes over time
- Disease models and neurodegeneration
- Cross-tissue aging comparisons
- Batch effect correction in sc-RNA-seq
- Fork the repository
- Create feature branch:
git checkout -b feature-name
- Run tests:
timeflies test --coverage
- Submit pull request
This project is licensed under the TimeFlies Academic Research License with pre-publication restrictions - see the LICENSE file for details.
Pre-Publication Period: All rights reserved. Commercial use, redistribution, and derivative works require explicit written permission from the Singh Lab, Brown University.
Post-Publication: License will transition to a more permissive open-source license after publication of associated research.
Developed by the Singh Lab for advancing aging research through machine learning.
Contact: Singh Lab Repository: TimeFlies
TimeFlies v1.0 - Advancing aging research through machine learning