PhysicsNeMo-Curator

PhysicsNeMo Curator | Getting started | Documentation | Contributing Guidelines | Communication

What is PhysicsNeMo Curator?

PhysicsNeMo Curator is a sub-module of PhysicsNeMo framework, a pythonic library designed to streamline and accelerate the crucial process of data curation at scale for engineering and scientific datasets for training and inference. It accelerates data curation by leveraging GPUs.

This includes customizable interfaces and pipelines for extracting, transforming and loading data in supported formats and schema. Please refer to the DoMINO ETL example that illustrates the concept.

This package is intended to be used as part of the PhysicsNeMo framework.

Installation and Usage

The recommended way of using PhysicsNeMo-Curator is to leverage the PhysicsNeMo docker image. This can be pulled from the NVIDIA Container Registry.

Current limitations:

Currently only linux/amd64 platform is supported
Currently we don't provide a PyPi wheel, and support installing from source

PhysicsNeMo Container (Recommended)

The instructions to get started with PhysicsNeMo-Curator within the PhysicsNeMo docker container are shown below.

docker pull nvcr.io/nvidia/physicsnemo/physicsnemo:25.06

# Install from source
git clone [email protected]:NVIDIA/physicsnemo-curator.git && cd physicsnemo-curator

pip install --upgrade pip
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Getting Started

New to PhysicsNeMo-Curator?

If you're new to the framework, start with our comprehensive Tutorial. It walks you through building a complete ETL pipeline from scratch. You'll learn how to:

Define data schemas
Implement schema validation, data sources, transformations, and sinks
Convert HDF5 data to ML-optimized Zarr format
Configure and run parallel processing pipelines

Working with Your CFD Data

Have CFD simulation data from a solver like Fluent? PhysicsNeMo-Curator can process your data through the following approaches:

Option 1: Convert to Supported Formats (Recommended)

Currently Supported Formats:

VTK formats: VTU (volume mesh data), VTP (surface mesh data)
STL: Geometry files

Next Steps:

Organize your converted data according to one of the supported dataset formats
Use the built-in DoMINO pipeline to convert your data to an AI model training ready format
Train your DoMINO Model on your own data by following the example in PhysicsNeMo!

Option 2: Extend the Framework for Custom Formats

If your data is in a format not directly supported (VTU/VTP/STL), you can extend the framework. The Tutorial demonstrates creating a complete pipeline that reads in HDF5 data and converts it to Zarr data.

Getting Help

Domain-Specific Examples: Check if your use case matches our automotive aerodynamics pipeline. This provides an example ETL pipeline for training DoMINO models for automotive aerodynamics applications. For more questions about the formats, please refer to Data Processing Reference
Architecture Questions: See the Tutorial for framework concepts, and to understand how to extend the pipeline
Anything else: Please open a GitHub issue and we'll engage with you to answer the questions!

Contributing to PhysicsNeMo-Curator

PhysicsNeMo-Curator and PhysicsNeMo are open source collaborations and their success is rooted in community contribution to further the field of Physics-ML. Thank you for contributing to the project so others can build on top of your contribution.

For guidance on contributing to PhysicsNeMo-Curator, please refer to the contributing guidelines.

Cite PhysicsNeMo-Curator

If PhysicsNeMo-Curator helped your research and you would like to cite it, please refer to the guidelines.

Communication

Github Discussions: Discuss new data formats, transformations, Physics-ML research, etc.
GitHub Issues: Bug reports, feature requests, install issues, etc.
PhysicsNeMo Forum: The PhysicsNeMo Forum hosts an audience of new to moderate-level users and developers for general chat, online discussions, collaboration, etc.

Feedback

Want to suggest some improvements to PhysicsNeMo-Curator? Use our feedback form.

License

PhysicsNeMo-Curator is provided under the Apache License 2.0, please see LICENSE.txt for full license text.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
examples		examples
physicsnemo_curator		physicsnemo_curator
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PhysicsNeMo-Curator

What is PhysicsNeMo Curator?

Installation and Usage

PhysicsNeMo Container (Recommended)

Getting Started

New to PhysicsNeMo-Curator?

Working with Your CFD Data

Option 1: Convert to Supported Formats (Recommended)

Option 2: Extend the Framework for Custom Formats

Getting Help

Contributing to PhysicsNeMo-Curator

Cite PhysicsNeMo-Curator

Communication

Feedback

License

About

Uh oh!

Releases

Packages

Languages

License

NVIDIA/physicsnemo-curator

Folders and files

Latest commit

History

Repository files navigation

PhysicsNeMo-Curator

What is PhysicsNeMo Curator?

Installation and Usage

PhysicsNeMo Container (Recommended)

Getting Started

New to PhysicsNeMo-Curator?

Working with Your CFD Data

Option 1: Convert to Supported Formats (Recommended)

Option 2: Extend the Framework for Custom Formats

Getting Help

Contributing to PhysicsNeMo-Curator

Cite PhysicsNeMo-Curator

Communication

Feedback

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages