Skip to content

isayevlab/AIMNet-X2D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AIMNet-X2D

AIMNet-X2D is a Graph Neural Network-based model for molecular property prediction with multi-task learning capabilities, designed to scale from small to large datasets. The architecture is capable of scaling to foundation model level, allowing researchers to create their own molecular foundation models with limited compute resources or scale efficiently when more hardware is available.

Stay tuned for our upcoming paper with detailed results and methodology!

Features

  • Multi-task learning for molecular properties
  • Size-extensive additive (SAE) normalization for energy properties
  • In-memory and iterable (streaming) dataset loading options for scalability
  • HDF5 support for large datasets
  • Multi-GPU training support via DistributedDataParallel
  • Attention-based graph pooling
  • Multi-hop message passing
  • Stereochemical feature inclusion
  • Partial charges prediction
  • Molecule embedding extraction
  • Foundation model scaling capabilities

Installation

Requirements

  • Python 3.8+
  • PyTorch 1.10+
  • RDKit
  • CUDA (optional, for GPU acceleration)

Setup

  1. Clone the repository:
git clone https://github.com/isayevlab/aimnet-x2d.git
cd aimnet-x2d
  1. Create a conda environment:
conda create -n aimnet-x2d python=3.12
conda activate aimnet-x2d
  1. Install dependencies:
pip install -r requirements.txt

Quick Start

After installation, see USAGE.md for detailed instructions on how to run AIMNet-X2D.

Project Structure

aimnet-x2d/
├── main.py                    # Main entry point
├── src/                       # Modular source code
│   ├── config/               # Configuration management
│   │   ├── args.py           # Argument parsing
│   │   ├── validation.py     # Input validation
│   │   ├── experiment.py     # Experiment tracking
│   │   └── paths.py          # Path management
│   ├── datasets/             # Data handling
│   │   ├── molecular.py      # PyTorch Geometric datasets
│   │   ├── features.py       # Molecular feature computation
│   │   ├── loaders.py        # Data loaders
│   │   └── io.py             # File I/O operations
│   ├── models/               # Model architecture
│   │   ├── gnn.py            # Main GNN model
│   │   ├── layers.py         # Neural network layers
│   │   ├── pooling.py        # Graph pooling mechanisms
│   │   ├── losses.py         # Loss functions
│   │   └── normalizers.py    # Data normalization
│   ├── training/             # Training pipeline
│   │   ├── trainer.py        # Training loops
│   │   ├── evaluator.py      # Model evaluation
│   │   ├── predictor.py      # Prediction methods
│   │   └── extractors.py     # Feature extraction
│   ├── inference/            # Inference pipeline
│   │   ├── engine.py         # Inference orchestration
│   │   ├── pipeline.py       # Processing pipeline
│   │   ├── uncertainty.py    # Uncertainty estimation
│   │   └── embeddings.py     # Embedding extraction
│   ├── data/                 # Data preprocessing
│   │   └── preprocessing.py  # SAE & scaling pipelines
│   ├── main/                 # Execution management
│   │   ├── runner.py         # Main execution logic
│   │   ├── cli.py            # Command-line interface
│   │   ├── hyperopt.py       # Hyperparameter optimization
│   │   └── utils.py          # Execution utilities
│   └── utils/                # General utilities
│       ├── distributed.py    # Multi-GPU utilities
│       ├── optimization.py   # Training optimizations
│       ├── activation.py     # Activation functions
│       └── random.py         # Reproducibility tools
├── sample-data/              # Example datasets
├── requirements.txt          # Python dependencies
├── USAGE.md                  # Detailed usage guide
└── README.md                 # This file

Authors

Rohit Nandakumar, Roman Zubatyuk, Olexandr Isayev

License

[MIT License]

About

Scalable 2D GNN framework for molecular property prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages