Skip to content

Unable to replicate DM Humanoid results #171

@mttga

Description

@mttga

I’m trying to replicate the results for PPO presented in the paper using the learning/train_jax_ppo.py script. The results for most environments appear correct, but the returns for the humanoid tasks are significantly off (see attached image).

I’m running the scripts using the NVIDIA JAX Container 25.01 on H100 GPUs.

The commands I’m using are:

python learning/train_jax_ppo.py --env_name HumanoidStand --num_timesteps 100000000
python learning/train_jax_ppo.py --env_name HumanoidWalk --num_timesteps 100000000
python learning/train_jax_ppo.py --env_name HumanoidRun --num_timesteps 100000000

Each script was run with 5 different random seeds: 0, 1, 2, 3, and 4.

Image

Metadata

Metadata

Assignees

Labels

wontfixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions