Skip to content

Sage-Lab-ai/DT_Lung

Repository files navigation

Respiratory Rate

Digital Twins of Ex Vivo Human Lungs

Python Version Project Page GitHub Code Hugging Face Model Hugging Face Dataset

Colab: DT Lung Demo Streamlit Web App Docker Hub

This is the official repository for Digital Twins of Ex Vivo Human Lungs.


⚠️Important Notice🚧

Steamlit Server Issues (Updated on September 24, 2025)

Streamlit released a new version on September 24, 2025 at 5:30 PM (UTCβˆ’4).

⏳ The web app may be temporarily unavailable while the Streamlit server updates to a new version. Please wait a moment and try again shortly.


Table of Contents


Background and Overview

βœ… Ex vivo lung perfusion (EVLP) is a cutting-edge platform that maintains isolated human lungs in a physiologically active state outside the body, enabling comprehensive functional assessment and targeted therapeutic interventions

βœ… The concept of a β€œdigital twin” – a dynamic, high-fidelity computational replica of a physical system – is rapidly gaining traction in medicine for its ability to simulate complex biological processes in silico

βœ… We’ve built a high-fidelity digital twin of ex vivo human lungs, powered by the world’s largest annotated ex vivo lung function dataset

βœ… The DT model has been validated as a robust digital control for enhanced preclinical therapeutic evaluation

βœ… Complete pipeline: training scripts, inference modules, Docker, Google Colab Notebook & interactive Web-based app

Ex vivo lung perfusion system

EVLPvideo.mp4

Getting Started

1. 🌐 Web-based App (Code-free Deployment + Tutorial)

The web-app offers an easy-to-follow, code-free user interface for seamless DT development that can be tailored to lung-specific conditions at the tip of your finger.
πŸš€ Launch DT Web-app

πŸŽ₯ App Tutorial

We also provide a short tutorial to demonstrate how to use the Web-based App.

πŸ”΄Youtube: β–ΆοΈπŸŽ¬ Watch the Tutorial

2. πŸ“’ Google Colab Notebook

πŸš€ Launch in Colab contains the demo with pre-written code cells on how to build a digital twin using our demo data: no code edit or local environment setup required.

3. 🐳 Running with Docker

Prerequisites:

Docker must be installed and running on your machine.
All docker images are published on Docker Hub

This repository provides ready-to-use helper scripts to launch both the web application and the main.py in Docker containers.

For macOS/Linux users:

To grant execute permission to Docker start scripts, please run:

chmod +x docker_run_app.sh

OR

 chmod +x docker_run_main.sh

Run Docker container to host the web-based Streamlit app for a code-free deployment experience of the DT demo:

./docker_run_app.sh

Run Docker container to execute main.py to run the DT demo locally:

./docker_run_main.sh

For Windows users

Run Docker container to host the web-based Streamlit app for a code-free deployment experience of the DT demo:

Double-click docker_run_app.bat in File Explorer and click on the Local URL in the terminal to open the app in the browser.

Run Docker container to execute main.py to run the DT demo locally:

Double-click docker_run_main.bat in File Explorer and view DT results in work_dir/DT_Lung/Output folder.

4. 🐍 Python users

  1. This DT model works with Python 3.12. Please make sure you have the correct version of Python installed before getting started. Check if you have the correct version installed using:

    python --version
    
  2. Clone this repository:

    git clone https://github.com/Sage-Lab-ai/DT_Lung.git
    
  3. Create a virtual environment (optional but recommended):

    python -m venv env
    source env/bin/activate  # On Windows use `env\Scripts\activate`
    
  4. All system requirements are listed in the requirements.txt file. To set up the environment, please run:

    pip install -r requirements.txt
    
  5. Run the main.py script

    python main.py
    

    Note: Model weights and demo data are fetched automatically β€” no manual setup required.

  6. Once completed, your digital twins using demo data have been built! βœ…
    DT results will be saved to work_dir/DT_Lung/Output. Please view results in the Output folder.

  7. Alternatively, you can also host the Streamlit web-based app locally on your machine by running:

    streamlit run app.py
    

DT Workflow


πŸ’‘ Did you know? These models are trained on the largest dataset of its kind.


Code Structure

The digital-twin pipeline in this repository is implemented using two core machine learning architectures: gated recurrent units (GRU) and XGBoost (XGB). The GRU/ and XGB/ directories each contain scripts needed for model training, including data loading and preprocessing scripts, model architecture definitions and calibration, and utility functions that support the training pipeline.

All inference scripts are located in the inference/ folder, which provides code to load a trained model and generate predicted lung function parameters (inference). For detailed instructions on running inference and building your own digital twin, please see the Getting Started section above and choose the inference method (command-line, Docker, Google Colab, or web-based app) that best suits you.

project-root/
β”œβ”€β”€ main.py                                  # Primary entry point for the DT_Lung pipeline
β”œβ”€β”€ GRU                                      # Pipeline on GRU model training and calibration 
β”‚   β”œβ”€β”€ __init__.py  
β”‚   β”œβ”€β”€ scripts
β”‚   β”‚   β”œβ”€β”€ 20_fold_cv_all.sh                # 20-fold cross-validation for all breath setups and variables
β”‚   β”‚   β”œβ”€β”€ lock_all_models.sh               # Train and save models for all breath setups and variables
β”‚   β”œβ”€β”€ util
β”‚   β”‚   β”œβ”€β”€ baseline.py                      # Baseline implementation (historical average and moving average)
β”‚   β”‚   β”œβ”€β”€ static_feats.py                  # Define the list of static features to include in GRU training
β”‚   β”œβ”€β”€ EVLPMultivariateBreathDataset.py     # Custom dataset class for breath-by-breath time-series data with static features
β”‚   β”œβ”€β”€ forecast_parameters.py               # Model training for single-stage breath setups (A1_A2, A1_A3, A1A2_A3)
β”‚   β”œβ”€β”€ forecast_parameters_w_pred.py        # Performs model training for two-stage breath setups (A1PA2_A3)
β”‚   β”œβ”€β”€ forecasting_pipeline.py              # Defines re-usable functions that are used in the training pipelines
β”‚   β”œβ”€β”€ GRU.py                               # GRU model class that defines the model architecture
β”œβ”€β”€ XGB                                      # Pipeline on XGBoost model training and calibration
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ BaselineModels.py                    # Baseline model definition
β”‚   β”œβ”€β”€ Dataset.py                           # Dataset Class
β”‚   β”œβ”€β”€ TemporalOrder.py                     # Helper class to parse temporal order in column names
β”‚   β”œβ”€β”€ TabularForecasting.py                # Classes for training, evaluation, and output organization
β”‚   β”œβ”€β”€ pipelines.py                         # Pipelines and training example            
β”‚   β”œβ”€β”€ image_gridsearch_static.py           # Image static DT model hyperparameter grid search
β”‚   β”œβ”€β”€ image_gridsearch_dynamic.py          # Image dynamic DT model hyperparameter grid search
β”‚   β”œβ”€β”€ image_train_static.py                # Image static DT model training
β”‚   β”œβ”€β”€ image_train_dynamic.py               # Image static DT model training
β”‚   β”œβ”€β”€ utils.py                             # util functions
β”œβ”€β”€ inference
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ GRU_inference.py                     # Pipeline for all GRU-based model inference
β”‚   β”œβ”€β”€ XGB_inference.py                     # Pipeline for all XGB-based model inference
β”‚   β”œβ”€β”€ reformat.py                          # For enhancement on readability for general users
β”‚   β”œβ”€β”€ visualization.py                     # Helper functions for DT visualization
β”œβ”€β”€ Dockerfile                               # Defines the container image
β”œβ”€β”€ docker_build.sh                          # Build Docker image
β”œβ”€β”€ docker_run_app.sh                        # Running Docker container for web-app on MacOS/Linux
β”œβ”€β”€ docker_run_app.bat                       # Running Docker container for web-app on Windows
β”œβ”€β”€ docker_run_main.sh                       # Running Docker container for main service on MacOS/Linux
β”œβ”€β”€ docker_run_main.bat                      # Running Docker container for main service on Windows
β”œβ”€β”€ requirements.txt                         # System requirements
β”œβ”€β”€ GLOSSARY.md                              # For units, data range references for users, acronyms, and domain terms
β”œβ”€β”€ docs                                     # For project page html document

Naming convention

This project follows a custom naming convention for model configurations and variable identifiers. Detailed explanations are provided below to enhance clarity and improve code readability.

GRU model training setups

A breath setup defines which breaths are included as input and output data to the model. They include:

A1_A2 Setups for Static Digital Lung Forecasting (Forecast 2nd hour lung function using 1st hour baseline data):

  • A1F50_A2F50
  • A1F50L50_A2F50
  • N1L20A1F50L50_A2F50

A1PA2_A3 Setups for Static Digital Lung Forecasting (Forecast 2nd hour lung function using 1st hour baseline and 2nd hour predicted data):

  • A1F50PA2F50_A3F50
  • A1F50L50PA2F50_A3F50
  • N1L20A1F50L50PA2F50_A3F50

A1_A3 Setups for Static Digital Lung Forecasting (Forecast 3rd hour lung function using 1st hour baseline data):

  • A1F50_A3F50
  • A1F50L50_A3F50
  • N1L20A1F50L50_A3F50

A1A2_A3 Setups for Dynamic Digital Lung Forecasting (Forecast 3rd hour lung function using 1st and 2nd hour observed data):

  • A1F50A2F50_A3F50
  • A1F50L50A2F50_A3F50
  • N1L20A1F50L50A2F50_A3F50

Legend: A = assessment period, N = normal breathing period, F = first breaths, L = last breaths , numbers = the number of breaths included

Note: everything before _ is the input variables and everything after _ is the target variable.

GRU model variables

The variable defines which parameter will be forecasted. They include:

  • Dynamic Compliance (Dy_Comp)
  • Peak Pressure (P_peak)
  • Mean Pressure (P_mean)
  • Expiratory Volume (Ex_vol)

XGBoost model training setups

H1_to_H2: Static digital lung forecasting (Forecast 2nd hour lung function using 1st hour baseline data)

H1_to_H3: Static digital lung forecasting (Forecast 3rd hour lung function using 1st hour baseline data)

H1_predH2_to_H3: Static digital lung forecasting (Forecast 3rd hour lung function using 1st hour baseline data and predicted 2nd hour data)

H1_H2_to_H3: Dynamic digital lung forecasting (Forecast 3rd hour lung function using 1st and 2nd hour obsereved data)

XGBoost model variables

Due to a large number of parameters, see GLOSSARY.md for all abbreviation definitions.


πŸ€– Inference (Create digital twins using your dataπŸ“ŠπŸ«)

All trained models developed in this project are published on our HuggingFace Model Repository.

We also provide a Demo Dataset on HuggingFace for users to try out our digital twin models.

We provide 4 distinct methods in the Getting Started section to run DT inference for creating digital twins of human lungs using either our demo data or your own data!


πŸ› οΈ Troubleshooting Errors (Last Update: Aug 2025)

🌐 Web-app

Steamlit Server Issues

The web app may be temporarily unavailable while the Streamlit server updates to a new version. Please wait a moment and try again shortly.

HuggingFace Download Issues

If the model or demo data fails to download from Hugging Face on the first attempt (often due to high request volume).

Click "Optional: Redownload Models and Data" to try again. Alternatively, you can refresh the web page and retry.

🐳 Docker

Please remember to install Docker on your device and grant permission to Docker scripts (as described in Running with Docker).

πŸ“’ Colab

HuggingFace Download Issues

If the model or demo data fails to download from Hugging Face on the first attempt (often due to high request volume).

Please re-run the cell containing

%run main.py

OR

Restart the runtime and re-run the notebook.


πŸ“’ Stay up to date & 🐞 Report Issues

Interested in staying up-to-date with our work? Join our mailing list here: Mailing List

If you encounter any bugs or have ideas for improvement, please file an issue here: Open a new issue

License

This dataset is released under the Creative Commons Attribution‑NonCommercial‑ShareAlike 4.0 International (CCβ€―BY‑NC‑SAβ€―4.0) license. Commercial use is prohibited.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages