Make `test_utils.py` `fork`-safe for `torchelastic` #1030

amorehead · 2025-09-10T21:38:05Z

Summary:
Makes test_utils.py (and torchtnt in general) safe to use start_method=fork for multi-GPU training with torchelastic. An example of a project that would benefit from this change is fairchem, which uses both torchelastic and torchtnt in conjunction for multi-GPU training.

Test plan:
I verified that making this change allows me to train models within the fairchem codebase when start_method=fork for elastic_launch. Without this change, a CUDA context will be created within the parent process of any Python package that imports torchtnt, which would subsequently make training with fork impossible when using multiple GPUs in parallel.

Fixes:
Together with this fairchem PR, this will fix crashes related to multi-GPU (local, not SLURM) model training using the fairchem codebase when start_method=fork.

Update test_utils.py

4ec9e16

meta-cla bot added the cla signed label Sep 10, 2025

amorehead mentioned this pull request Sep 10, 2025

Add optional start_method argument to _cli.py facebookresearch/fairchem#1476

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `test_utils.py` `fork`-safe for `torchelastic` #1030

Make `test_utils.py` `fork`-safe for `torchelastic` #1030

Uh oh!

amorehead commented Sep 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Make test_utils.py fork-safe for torchelastic #1030

Are you sure you want to change the base?

Make test_utils.py fork-safe for torchelastic #1030

Uh oh!

Conversation

amorehead commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Make `test_utils.py` `fork`-safe for `torchelastic` #1030

Make `test_utils.py` `fork`-safe for `torchelastic` #1030

amorehead commented Sep 10, 2025 •

edited

Loading