This project is a C++ application designed to train an agent to master Atari games, with a specific focus on the classic game "Breakout". It leverages reinforcement learning, implementing the Proximal Policy Optimisation (PPO) algorithm to enable the agent to learn and improve its gameplay through trial and error. Multi-threading is used to achieve greater throughput for interacting with the game environment in parallel.
Built using Bazel, this project integrates libtorch
(the C++ frontend for PyTorch) for its neural network components and the Arcade Learning Environment (ALE) to interface with the Atari games. This combination provides a high-performance environment for cutting-edge AI research.
While Python-based libraries dominate the open-source RL scene by offering ease of use and a vast ecosystem, ALE-libtorch-PPO
contributes a high-performance, C++-native alternative. It is designed for developers and researchers who need high performance, a clean C++ integration path, and a transparent, focused implementation of a strong & popular RL algorithm.
To run the project, follow these steps:
-
Install Bazel by following the Bazel installation guide for your operating system.
-
Install FFmpeg which the environment video recorder uses to generate MP4s of the agent playing the game.
-
Clone the repository:
git clone https://github.com/cemlyn007/ALE-libtorch-PPO.git cd ALE-libtorch-PPO
-
Download the ROMs:
mkdir roms ./scripts/download_unpack_roms.sh
-
Train the agent using Bazel:
bazel run //src/bin:train --compilation_mode=opt -- $(pwd)/roms/breakout.bin $(pwd)/logs/train $(pwd)/videos/train train $(pwd)/configs/v0.yaml
Or alternatively, with VS Code, you can run the tasks. The command line arguments do the following:
- Specify which ROM to use.
- Specify the directory to write TensorBoard logs to.
- Specify the directory to write videos to.
- Specify the group name used for logging parameters to TensorBoard.
- Specify the path to the YAML file containing the config to use for running the application.
- Optional: specify the location to write a libtorch profile to which can be examined using Perfetto.
Evaluated using the following hardware:
- ASUS ROG STRIX X670E-F GAMING WIFI
- AMD Ryzen™ 9 7950X3D × 32
- NVIDIA GeForce RTX™ 4090
There are three views for profiling this application, using ./scripts/flamegraph.sh
, running the application with a 6th command line argument which specifies where to save the Perfetto profile, lastly you can use nsys to profile the application. The flamegraph script will generate a flamegraph of the application, which can be viewed in a web browser. The Perfetto profile can be opened in the Perfetto UI, and NVIDIA Nsight Systems UI can also be used for profiling if you hook up the path to the train
binary.
I welcome contributions from the community! If you're interested in improving ALE-libtorch-PPO
, here are some ways you can help:
- Reporting Bugs: If you find a bug, please open an issue and provide as much detail as possible.
- Suggesting Enhancements: Have an idea for a new feature or an improvement to an existing one? I'd love to hear it.
- Code Contributions: If you'd like to contribute code, please fork the repository and submit a pull request. I appreciate all contributions, from small bug fixes to major new features.
I look forward to collaborating with you!
Obviously, the authors of any of the libraries & tools used in this project deserve credit, including but not limited to:
Additionally, kudos to Costa Huang who authored CleanRL which served as a baseline for comparing the results of this project.
This project is dedicated to my late Gran, who always supported my endeavours. I love you, Gran.