Unable to replicate DM Humanoid results

I’m trying to replicate the results for PPO presented in the paper using the `learning/train_jax_ppo.py` script. The results for most environments appear correct, but the returns for the humanoid tasks are significantly off (see attached image).

I’m running the scripts using the [NVIDIA JAX Container 25.01](https://docs.nvidia.com/deeplearning/frameworks/jax-release-notes/rel-25-01.html) on H100 GPUs.

The commands I’m using are:

```
python learning/train_jax_ppo.py --env_name HumanoidStand --num_timesteps 100000000
python learning/train_jax_ppo.py --env_name HumanoidWalk --num_timesteps 100000000
python learning/train_jax_ppo.py --env_name HumanoidRun --num_timesteps 100000000
```

Each script was run with 5 different random seeds: 0, 1, 2, 3, and 4.

<img width="4550" height="2390" alt="Image" src="https://github.com/user-attachments/assets/bbfaa7c5-b153-4920-b07e-1fe2ec166363" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to replicate DM Humanoid results #171

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to replicate DM Humanoid results #171

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions