-
Notifications
You must be signed in to change notification settings - Fork 210
Closed
Labels
wontfixThis will not be worked onThis will not be worked on
Description
I’m trying to replicate the results for PPO presented in the paper using the learning/train_jax_ppo.py
script. The results for most environments appear correct, but the returns for the humanoid tasks are significantly off (see attached image).
I’m running the scripts using the NVIDIA JAX Container 25.01 on H100 GPUs.
The commands I’m using are:
python learning/train_jax_ppo.py --env_name HumanoidStand --num_timesteps 100000000
python learning/train_jax_ppo.py --env_name HumanoidWalk --num_timesteps 100000000
python learning/train_jax_ppo.py --env_name HumanoidRun --num_timesteps 100000000
Each script was run with 5 different random seeds: 0, 1, 2, 3, and 4.

Metadata
Metadata
Assignees
Labels
wontfixThis will not be worked onThis will not be worked on