Adds dexterous lift and reorientation manipulation environments #3378

ooctipus · 2025-09-08T06:05:34Z

Description

This PR provides remake and extension to orginal environment kuka-allegro-reorientation implemented in paper:
DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training
(https://arxiv.org/abs/2305.12127)
Aleksei Petrenko, Arthur Allshire, Gavriel State, Ankur Handa, Viktor Makoviychuk

and another environment kuka-allegro-lift implemented in paper:
Visuomotor Policies to Grasp Anything with Dexterous Hands
(https://dextrah-rgb.github.io/)
Ritvik Singh, Arthur Allshire, Ankur Handa, Nathan Ratliff, Karl Van Wyk

Though this is a remake, this remake ends up differs quite a lot in environment details for reasons like:

Simplify reward structure,
Unify environment implemtation,
Standarize mdp,
Utilizes manager-based API

That in my opinion, makes environment study and extension more accessible, and analyzable. For example you can train lift policy first then continuing the checkpoint in reorientation environment, since they share the observation space. : ))

It is a best to consider this a very careful re-interpretation rather than exact execution to migrate them to IsaacLab

Here is the training curve if you just train with
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Dexsuite-Kuka-Allegro-Lift-v0 --num_envs 8192 --headless

./isaaclab.sh -p -m torch.distributed.run --nnodes=1 --nproc_per_node=4 scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Dexsuite-Kuka-Allegro-Reorient-v0 --num_envs 40960 --headless --distributed

lift training ~ 4 hours
reorientation training ~ 2 days

Note that it requires a order of magnitude more data and time for reorientation to converge compare to lift under almost identical setup

training curve(screen captured from Wandb) - reward,
Cyan: reorient, Purple: Lift

video results
lift

reorient

Memo:
I really enjoy working on this remake, and hopefully for whoever plan to play and extend on this remake find it helpful and similarily joyful as I did. I will be very excited to see what you got : ))

Octi

CAUTION:
Do Not Merge until the asset is uploaded to S3 bucket!

Fixes # (issue)

New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/config/__init__.py

source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/mdp/__init__.py

source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/mdp/curriculums.py

source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/mdp/observations.py

...e/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/mdp/pose_commands_cfg.py

...sks/manager_based/manipulation/dexsuite/config/kuka_allegro/dexsuite_kuka_allegro_env_cfg.py

source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/mdp/utils.py

docs/source/overview/environments.rst

# Description This PR provides remake and extension to orginal environment kuka-allegro-reorientation implemented in paper: DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training (https://arxiv.org/abs/2305.12127) [Aleksei Petrenko](https://arxiv.org/search/cs?searchtype=author&query=Petrenko,+A), [Arthur Allshire](https://arxiv.org/search/cs?searchtype=author&query=Allshire,+A), [Gavriel State](https://arxiv.org/search/cs?searchtype=author&query=State,+G), [Ankur Handa](https://arxiv.org/search/cs?searchtype=author&query=Handa,+A), [Viktor Makoviychuk](https://arxiv.org/search/cs?searchtype=author&query=Makoviychuk,+V) and another environment kuka-allegro-lift implemented in paper: Visuomotor Policies to Grasp Anything with Dexterous Hands (https://dextrah-rgb.github.io/) [Ritvik Singh](https://www.ritvik-singh.com/), [Arthur Allshire](https://allshire.org/), [Ankur Handa](https://ankurhanda.github.io/), [Nathan Ratliff](https://www.nathanratliff.com/), [Karl Van Wyk](https://scholar.google.com/citations?user=TCYAoF8AAAAJ&hl=en) Though this is a remake, this remake ends up differs quite a lot in environment details for reasons like: 1. Simplify reward structure, 2. Unify environment implemtation, 3. Standarize mdp, 4. Utilizes manager-based API That in my opinion, makes environment study and extension more accessible, and analyzable. For example you can train lift policy first then continuing the checkpoint in reorientation environment, since they share the observation space. : )) It is a best to consider this a very careful re-interpretation rather than exact execution to migrate them to IsaacLab Here is the training curve if you just train with `./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Dexsuite-Kuka-Allegro-Lift-v0 --num_envs 8192 --headless` `./isaaclab.sh -p -m torch.distributed.run --nnodes=1 --nproc_per_node=4 scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Dexsuite-Kuka-Allegro-Reorient-v0 --num_envs 40960 --headless --distributed` lift training ~ 4 hours reorientation training ~ 2 days Note that it requires a order of magnitude more data and time for reorientation to converge compare to lift under almost identical setup training curve(screen captured from Wandb) - reward, Cyan: reorient, Purple: Lift <img width="1487" height="780" alt="Screenshot from 2025-09-07 22-58-13" src="https://github.com/user-attachments/assets/bfa911de-4fee-4c0d-b39c-e9c33fae28f4" /> video results lift ![cone_lift](https://github.com/user-attachments/assets/e626eadb-b281-4ec9-af16-57f626fcc6aa) ![fat_capsule_lift](https://github.com/user-attachments/assets/cde57d4c-ceb2-40ab-88dd-44320da689c5) reorient ![cube_reorient](https://github.com/user-attachments/assets/752809cb-ea19-4701-b124-20c1909e4566) ![rod_reorient](https://github.com/user-attachments/assets/f009605a-d93c-491c-b124-ff08606c63ec) Memo: I really enjoy working on this remake, and hopefully for whoever plan to play and extend on this remake find it helpful and similarily joyful as I did. I will be very excited to see what you got : )) Octi CAUTION: Do Not Merge until the asset is uploaded to S3 bucket! Fixes # (issue)  - New feature (non-breaking change which adds functionality) ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

romesco · 2025-09-10T19:49:24Z

Just want to comment and say, awesome work Octi!!

…c-sim#3378) # Description This PR provides remake and extension to orginal environment kuka-allegro-reorientation implemented in paper: DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training (https://arxiv.org/abs/2305.12127) [Aleksei Petrenko](https://arxiv.org/search/cs?searchtype=author&query=Petrenko,+A), [Arthur Allshire](https://arxiv.org/search/cs?searchtype=author&query=Allshire,+A), [Gavriel State](https://arxiv.org/search/cs?searchtype=author&query=State,+G), [Ankur Handa](https://arxiv.org/search/cs?searchtype=author&query=Handa,+A), [Viktor Makoviychuk](https://arxiv.org/search/cs?searchtype=author&query=Makoviychuk,+V) and another environment kuka-allegro-lift implemented in paper: Visuomotor Policies to Grasp Anything with Dexterous Hands (https://dextrah-rgb.github.io/) [Ritvik Singh](https://www.ritvik-singh.com/), [Arthur Allshire](https://allshire.org/), [Ankur Handa](https://ankurhanda.github.io/), [Nathan Ratliff](https://www.nathanratliff.com/), [Karl Van Wyk](https://scholar.google.com/citations?user=TCYAoF8AAAAJ&hl=en) Though this is a remake, this remake ends up differs quite a lot in environment details for reasons like: 1. Simplify reward structure, 2. Unify environment implemtation, 3. Standarize mdp, 4. Utilizes manager-based API That in my opinion, makes environment study and extension more accessible, and analyzable. For example you can train lift policy first then continuing the checkpoint in reorientation environment, since they share the observation space. : )) It is a best to consider this a very careful re-interpretation rather than exact execution to migrate them to IsaacLab Here is the training curve if you just train with `./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Dexsuite-Kuka-Allegro-Lift-v0 --num_envs 8192 --headless` `./isaaclab.sh -p -m torch.distributed.run --nnodes=1 --nproc_per_node=4 scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Dexsuite-Kuka-Allegro-Reorient-v0 --num_envs 40960 --headless --distributed` lift training ~ 4 hours reorientation training ~ 2 days Note that it requires a order of magnitude more data and time for reorientation to converge compare to lift under almost identical setup training curve(screen captured from Wandb) - reward, Cyan: reorient, Purple: Lift <img width="1487" height="780" alt="Screenshot from 2025-09-07 22-58-13" src="https://github.com/user-attachments/assets/bfa911de-4fee-4c0d-b39c-e9c33fae28f4" /> video results lift ![cone_lift](https://github.com/user-attachments/assets/e626eadb-b281-4ec9-af16-57f626fcc6aa) ![fat_capsule_lift](https://github.com/user-attachments/assets/cde57d4c-ceb2-40ab-88dd-44320da689c5) reorient ![cube_reorient](https://github.com/user-attachments/assets/752809cb-ea19-4701-b124-20c1909e4566) ![rod_reorient](https://github.com/user-attachments/assets/f009605a-d93c-491c-b124-ff08606c63ec) Memo: I really enjoy working on this remake, and hopefully for whoever plan to play and extend on this remake find it helpful and similarily joyful as I did. I will be very excited to see what you got : )) Octi CAUTION: Do Not Merge until the asset is uploaded to S3 bucket! Fixes # (issue)  - New feature (non-breaking change which adds functionality) ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

ooctipus requested review from Mayankm96, Toni-SM, jtigue-bdai and kellyguo11 as code owners September 8, 2025 06:05

ooctipus force-pushed the dexsuite_state_only branch from a1bf1d9 to 0b5a9b9 Compare September 8, 2025 06:06

ooctipus changed the title ~~Dexsuite state only~~ Adds dexterous lift, dexterous reorientation manipulation environments Sep 8, 2025