Skip to content

This repo refers to paper Invariant Transform Experience Replay. And this repo is built on top of OpenAI Baseline. For more information please check:

Notifications You must be signed in to change notification settings

learningLogisticsLab/ITER_KER_GER

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ITER_KER_GER

Description

This repo refers to the paper Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning. This repo would be open source once our paper gets accepted.

Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. We propose two novel data augmentation techniques for DRL based on invariant transformations of trajectories in order to reuse more efficiently observed interaction. The first one called Kaleidoscope Experience Replay exploits reflectional symmetries, while the second called Goal-augmented Experience Replay takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show a large increase in learning speed.

And this repo is built on top of OpenAI Baselines and OpenAI Gym.

Installation

This implementation requires the installation of the OpenAI Baselines module (commit version 2bca79).

Conda Virtual Environment

We would like to provide further guidance in the install of the old repo here.

  • As of 2025, we recommend using conda for this to install python 3.6. Python 3.6 is ncecessary to be compatible with version 1.14 of torch-gpu used in baselines.
conda create -n iter python==3.6

Mujoco Install

  • You will also need mujoco 2.1.0 installed.
wget https://github.com/google-deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz

You can place it in the home directory or for multiple users /opt is recommended.

tar -xzf mujoco210-linux-x86_64.tar.gz -C ~/.mujoco 
# tar -xzf mujoco210-linux-x86_64.tar.gz -C /opt/mujoco/

Tell your favorite shell (.bashrc or .zshrc) where to look for mujoco (adjust depending on above folder):

echo "export MUJOCO_HOME=$HOME/.mujoco/mujoco210" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia:$MUJOCO_HOME/bin" >> ~/.bashrc

If you have multiple mujoco installations, you will want to be careful which libraries/files are being read. You could either export the above commands directly into your terminal or tmux session or create a script to load them at will.

OpenAI Baselines Clone

Go to the baselines main fork and follow

git clone https://github.com/openai/baselines.git
cd baselines

Install baselines package (If you meet any problem during the installation, please click here).

Build-Time Dependencies

You may need to install some build-time dependencies first not included in the setup.py. Binaries:

sudo apt update && sudo apt install libopenmpi-dev openmpi-bin

Pip packages:

pip install tensorflow-gpu==1.14.0 # Or non GPU tensorflow==1.14.0
pip install Cython==0.29.30
pip install mpi4py==4.0.3
pip install Cython==0.29.30
pip install patchelf==0.17.2.1
pip install ipdb
pip install tensorboardX

Setup.py

Now run the setup.py file:

pip install -e .

Errors

If you get errors with building the wheel for opencv-2, do the following:

  • Install python3-opencv:
python3-opencv
  • And run the setup.py with a --no-build-isolation option:
pip install -e . --no-build-isolation

Copy ITER files into Baselines

After the installation, please create a new folder for this repo and go inside.

mkdir ITER_KER_GER && cd $_

Download all the codes held in this repo.

git clone https://github.com/birlrobotics/ITER_KER_GER.git

Copy the files held in folder ITER_KER_GER/her and paste into baselines/baselines/ to overwrite the vanilla HER and relative files.

cp -rf her ~/baselines/baselines/
cp -f her/run.py ~/baselines/baselines
cp -f her/cmd_util.py ~/baselines/baselines/common

Finally, install the tensorflow 1.14.0, mpi4py, ipdb, tensorboardX.

Usage

To reproduce the best result in our paper, please run :

python -m baselines.run --alg=her --env=FetchPickAndPlace-v1 --num_timesteps=1e6 --n_cycles=100 --save_path=/home/user/policies/her/iter --log_path=/home/user/log_data/her/iter --before_GER_minibatch_size=256 --n_KER=8 --n_GER=4

options include:

  • --num_cpu: Number of workers(threads/cpus). The results in our paper just used 1 worker in order to show the significant improvements in learning speed. The original HER paper presents this HER implementation. (Please note that as the HER's author said, running the code with different cpus is NOT equivalent. For more information about this issue, please check here.)
  • --env: To specify the experimental environment in each run. Possible choices are FetchPickAndPlace-v1, FetchSlide-v1, FetchPush-v1. (There will be more choices on Baxter robot in the near future, please keep watching on our repo :). )
  • --before_GER_minibatch_size: To specify the original minibatch size.
  • --n_KER: To specify the hyperparameter of KER. More specifically, it is to specify how many reflectional planes you would like to augment the samples. For more information, please checkout our Paper.
  • --n_GER: To specify the hyperparameter of GER. Specifically, it is to specify how many transitions' goals you would like to augment. For more information, please checkout our Paper.
  • --log_path: To specify the log file saved path.
  • --save_path: To specify the policy parameters saved path.

Loading and visualizing models

This page from OpenAI Baselines has a good indicaition on loading and visualizing models.

Training Environment

Fig. 1 Testing robotic tasks: Fetch Pushing, Sliding, and Pick-and-Place without obstacles (each column left) and with obstacles (each column right).

Results

ITER greatly improves the robot's generalization ability by augmenting the observed transition samples with KER and GER, leading to a highly efficient learning process. The learning curves (the Testing Success Rate vs Epoch) plotted below show the significant improvements in three robotic tasks learning with or without obstacles:

Fig. 2 Training results for aforementioned robotic tasks without obstacles constrasting between training with and without ITER.
Fig. 3 Training results for aforementioned robotic tasks with obstacles constrasting between training with and without ITER.

For more experimental results please read our Paper.

Quick Visulization on Learning Results

You can visulize the testing results during training with TensorBoard. SummaryWriter saves the testing result after each epoch in the running directory. You can open a new terminal with that directory and run

tensorboard --logdir ~/

Learned performance with ITER in a More Complex Dynamical Environment

Our method preserves any contact that may occur between the robot and any object it may encounter (table included) as long as a symmetry is applied to all the objects and obstacles in the robot's workspace. Therefore, our approach also works in any contact-rich robotic task (a more complex dynamical environment), including problems where some obstacles may limit the movements of objects or the robot. When the poses of obstacles are observed in each state but not fixed across episodes, the agent can learn the effects of contact. For example, the agent can avoid obstacles or leverage contact to reach a goal (e.g. in the pushing task it may learn to push an object and let the obstacle stop the moving object).

The following gifs show the comparisons of learned behaviors with HER and ITER.

Learned behaviors at training epoch 80 in Pushing task.
Learned behaviors at training epoch 100 in Sliding task.
Learned behaviors at training epoch 230 in Pick-and-Place task.

Deployment on a Physical Robot

We also applied a well-trained policy with ITER to a real Baxter robot. To do that, we first trained a pick-and-place policy in simulation (Baxter in Gym). Then we transfer it to the real one, and the object pose is detected by using ALVAR (more information is in Appendix).

A vertual Baxter robot is trained in the pick-and-place task with ITER (the goal is located at the red ball)
A real Baxter robot running a pick-and-place policy trained via ITER (the goal is located at the orange rectangle)

More Information

For more information please check:

  1. Website Blog
  2. Paper
  3. Video: Youtube, Youku
  4. Appendix

Credits

ITER_KER_GER is maintained by the BIRL Intelligent Manipulation group. Contributors include:

About

This repo refers to paper Invariant Transform Experience Replay. And this repo is built on top of OpenAI Baseline. For more information please check:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%