This is the official repository for Digital Twins of Ex Vivo Human Lungs.
Steamlit Server Issues (Updated on September 24, 2025)
Streamlit released a new version on September 24, 2025 at 5:30 PM (UTCβ4).
β³ The web app may be temporarily unavailable while the Streamlit server updates to a new version. Please wait a moment and try again shortly.
- Background and Overview
- Getting Started
- DT Workflow
- π€ Inference
- π οΈ Troubleshooting Errors
- π’ Stay up-to-date & π Report Issues
β Ex vivo lung perfusion (EVLP) is a cutting-edge platform that maintains isolated human lungs in a physiologically active state outside the body, enabling comprehensive functional assessment and targeted therapeutic interventions
β The concept of a βdigital twinβ β a dynamic, high-fidelity computational replica of a physical system β is rapidly gaining traction in medicine for its ability to simulate complex biological processes in silico
β Weβve built a high-fidelity digital twin of ex vivo human lungs, powered by the worldβs largest annotated ex vivo lung function dataset
β The DT model has been validated as a robust digital control for enhanced preclinical therapeutic evaluation
β Complete pipeline: training scripts, inference modules, Docker, Google Colab Notebook & interactive Web-based app
Ex vivo lung perfusion system
EVLPvideo.mp4
The web-app offers an easy-to-follow, code-free user interface for seamless DT development that can be tailored to lung-specific conditions at the tip of your finger.
π Launch DT Web-app
We also provide a short tutorial to demonstrate how to use the Web-based App.
π΄Youtube:
π Launch in Colab contains the demo with pre-written code cells on how to build a digital twin using our demo data: no code edit or local environment setup required.
Docker must be installed and running on your machine.
All docker images are published on Docker Hub
This repository provides ready-to-use helper scripts to launch both the web application and the main.py in Docker containers.
To grant execute permission to Docker start scripts, please run:
chmod +x docker_run_app.sh
OR
chmod +x docker_run_main.sh
Run Docker container to host the web-based Streamlit app for a code-free deployment experience of the DT demo:
./docker_run_app.sh
Run Docker container to execute main.py to run the DT demo locally:
./docker_run_main.sh
Run Docker container to host the web-based Streamlit app for a code-free deployment experience of the DT demo:
Double-click docker_run_app.bat
in File Explorer and click on the Local URL in the terminal to open the app in the browser.
Run Docker container to execute main.py to run the DT demo locally:
Double-click docker_run_main.bat
in File Explorer and view DT results in work_dir/DT_Lung/Output
folder.
-
This DT model works with Python 3.12. Please make sure you have the correct version of Python installed before getting started. Check if you have the correct version installed using:
python --version
-
Clone this repository:
git clone https://github.com/Sage-Lab-ai/DT_Lung.git
-
Create a virtual environment (optional but recommended):
python -m venv env source env/bin/activate # On Windows use `env\Scripts\activate`
-
All system requirements are listed in the requirements.txt file. To set up the environment, please run:
pip install -r requirements.txt
-
Run the
main.py
scriptpython main.py
Note: Model weights and demo data are fetched automatically β no manual setup required.
-
Once completed, your digital twins using demo data have been built! β
DT results will be saved towork_dir/DT_Lung/Output
. Please view results in theOutput
folder. -
Alternatively, you can also host the Streamlit web-based app locally on your machine by running:
streamlit run app.py
π‘ Did you know? These models are trained on the largest dataset of its kind.
The digital-twin pipeline in this repository is implemented using two core machine learning architectures: gated recurrent units (GRU) and XGBoost (XGB). The GRU/
and XGB/
directories each contain scripts needed for model training, including data loading and preprocessing scripts, model architecture definitions and calibration, and utility functions that support the training pipeline.
All inference scripts are located in the inference/
folder, which provides code to load a trained model and generate predicted lung function parameters (inference). For detailed instructions on running inference and building your own digital twin, please see the Getting Started section above and choose the inference method (command-line, Docker, Google Colab, or web-based app) that best suits you.
project-root/
βββ main.py # Primary entry point for the DT_Lung pipeline
βββ GRU # Pipeline on GRU model training and calibration
β βββ __init__.py
β βββ scripts
β β βββ 20_fold_cv_all.sh # 20-fold cross-validation for all breath setups and variables
β β βββ lock_all_models.sh # Train and save models for all breath setups and variables
β βββ util
β β βββ baseline.py # Baseline implementation (historical average and moving average)
β β βββ static_feats.py # Define the list of static features to include in GRU training
β βββ EVLPMultivariateBreathDataset.py # Custom dataset class for breath-by-breath time-series data with static features
β βββ forecast_parameters.py # Model training for single-stage breath setups (A1_A2, A1_A3, A1A2_A3)
β βββ forecast_parameters_w_pred.py # Performs model training for two-stage breath setups (A1PA2_A3)
β βββ forecasting_pipeline.py # Defines re-usable functions that are used in the training pipelines
β βββ GRU.py # GRU model class that defines the model architecture
βββ XGB # Pipeline on XGBoost model training and calibration
β βββ __init__.py
β βββ BaselineModels.py # Baseline model definition
β βββ Dataset.py # Dataset Class
β βββ TemporalOrder.py # Helper class to parse temporal order in column names
β βββ TabularForecasting.py # Classes for training, evaluation, and output organization
β βββ pipelines.py # Pipelines and training example
β βββ image_gridsearch_static.py # Image static DT model hyperparameter grid search
β βββ image_gridsearch_dynamic.py # Image dynamic DT model hyperparameter grid search
β βββ image_train_static.py # Image static DT model training
β βββ image_train_dynamic.py # Image static DT model training
β βββ utils.py # util functions
βββ inference
β βββ __init__.py
β βββ GRU_inference.py # Pipeline for all GRU-based model inference
β βββ XGB_inference.py # Pipeline for all XGB-based model inference
β βββ reformat.py # For enhancement on readability for general users
β βββ visualization.py # Helper functions for DT visualization
βββ Dockerfile # Defines the container image
βββ docker_build.sh # Build Docker image
βββ docker_run_app.sh # Running Docker container for web-app on MacOS/Linux
βββ docker_run_app.bat # Running Docker container for web-app on Windows
βββ docker_run_main.sh # Running Docker container for main service on MacOS/Linux
βββ docker_run_main.bat # Running Docker container for main service on Windows
βββ requirements.txt # System requirements
βββ GLOSSARY.md # For units, data range references for users, acronyms, and domain terms
βββ docs # For project page html document
This project follows a custom naming convention for model configurations and variable identifiers. Detailed explanations are provided below to enhance clarity and improve code readability.
A breath setup defines which breaths are included as input and output data to the model. They include:
A1_A2
Setups for Static Digital Lung Forecasting (Forecast 2nd hour lung function using 1st hour baseline data):
A1F50_A2F50
A1F50L50_A2F50
N1L20A1F50L50_A2F50
A1PA2_A3
Setups for Static Digital Lung Forecasting (Forecast 2nd hour lung function using 1st hour baseline and 2nd hour predicted data):
A1F50PA2F50_A3F50
A1F50L50PA2F50_A3F50
N1L20A1F50L50PA2F50_A3F50
A1_A3
Setups for Static Digital Lung Forecasting (Forecast 3rd hour lung function using 1st hour baseline data):
A1F50_A3F50
A1F50L50_A3F50
N1L20A1F50L50_A3F50
A1A2_A3
Setups for Dynamic Digital Lung Forecasting (Forecast 3rd hour lung function using 1st and 2nd hour observed data):
A1F50A2F50_A3F50
A1F50L50A2F50_A3F50
N1L20A1F50L50A2F50_A3F50
Legend: A = assessment period
, N = normal breathing period
, F = first breaths
, L = last breaths
, numbers = the number of breaths included
Note: everything before _
is the input variables and everything after _
is the target variable.
The variable defines which parameter will be forecasted. They include:
- Dynamic Compliance (
Dy_Comp
) - Peak Pressure (
P_peak
) - Mean Pressure (
P_mean
) - Expiratory Volume (
Ex_vol
)
H1_to_H2
: Static digital lung forecasting (Forecast 2nd hour lung function using 1st hour baseline data)
H1_to_H3
: Static digital lung forecasting (Forecast 3rd hour lung function using 1st hour baseline data)
H1_predH2_to_H3
: Static digital lung forecasting (Forecast 3rd hour lung function using 1st hour baseline data and predicted 2nd hour data)
H1_H2_to_H3
: Dynamic digital lung forecasting (Forecast 3rd hour lung function using 1st and 2nd hour obsereved data)
Due to a large number of parameters, see GLOSSARY.md
for all abbreviation definitions.
All trained models developed in this project are published on our HuggingFace Model Repository.
We also provide a Demo Dataset on HuggingFace for users to try out our digital twin models.
We provide 4 distinct methods in the Getting Started section to run DT inference for creating digital twins of human lungs using either our demo data or your own data!
Steamlit Server Issues
The web app may be temporarily unavailable while the Streamlit server updates to a new version. Please wait a moment and try again shortly.
HuggingFace Download Issues
If the model or demo data fails to download from Hugging Face on the first attempt (often due to high request volume).
Click "Optional: Redownload Models and Data" to try again. Alternatively, you can refresh the web page and retry.
Please remember to install Docker on your device and grant permission to Docker scripts (as described in Running with Docker).
HuggingFace Download Issues
If the model or demo data fails to download from Hugging Face on the first attempt (often due to high request volume).
Please re-run the cell containing
%run main.py
OR
Restart the runtime and re-run the notebook.
Interested in staying up-to-date with our work? Join our mailing list here: Mailing List
If you encounter any bugs or have ideas for improvement, please file an issue here: Open a new issue
This dataset is released under the Creative Commons AttributionβNonCommercialβShareAlike 4.0 International (CCβ―BYβNCβSAβ―4.0) license. Commercial use is prohibited.