ALE-libtorch-PPO

This project is a C++ application designed to train an agent to master Atari games, with a specific focus on the classic game "Breakout". It leverages reinforcement learning, implementing the Proximal Policy Optimisation (PPO) algorithm to enable the agent to learn and improve its gameplay through trial and error. Multi-threading is used to achieve greater throughput for interacting with the game environment in parallel.

Built using Bazel, this project integrates libtorch (the C++ frontend for PyTorch) for its neural network components and the Arcade Learning Environment (ALE) to interface with the Atari games. This combination provides a high-performance environment for cutting-edge AI research.

While Python-based libraries dominate the open-source RL scene by offering ease of use and a vast ecosystem, ALE-libtorch-PPO contributes a high-performance, C++-native alternative. It is designed for developers and researchers who need high performance, a clean C++ integration path, and a transparent, focused implementation of a strong & popular RL algorithm.

Run Instructions

To run the project, follow these steps:

Install Bazel by following the Bazel installation guide for your operating system.
Install FFmpeg which the environment video recorder uses to generate MP4s of the agent playing the game.

Clone the repository:

git clone https://github.com/cemlyn007/ALE-libtorch-PPO.git
cd ALE-libtorch-PPO

Download the ROMs:

mkdir roms
./scripts/download_unpack_roms.sh

Train the agent using Bazel:
```
bazel run //src/bin:train --compilation_mode=opt -- $(pwd)/roms/breakout.bin $(pwd)/logs/train $(pwd)/videos/train train $(pwd)/configs/v0.yaml
```
Or alternatively, with VS Code, you can run the tasks. The command line arguments do the following:
1. Specify which ROM to use.
2. Specify the directory to write TensorBoard logs to.
3. Specify the directory to write videos to.
4. Specify the group name used for logging parameters to TensorBoard.
5. Specify the path to the YAML file containing the config to use for running the application.
6. Optional: specify the location to write a libtorch profile to which can be examined using Perfetto.

Results

Evaluated using the following hardware:

ASUS ROG STRIX X670E-F GAMING WIFI
AMD Ryzen™ 9 7950X3D × 32
NVIDIA GeForce RTX™ 4090

V0 Config

Achieved 10 million agent steps in 37 minutes and 39 seconds, using the v0 config, with an average steps per second of ~4426 with video recording enabled as well.

V1 Config

Achieved an average of ~26,289 steps per second, with video recording enabled, with hardware still not fully utilised.

Profiling

There are three views for profiling this application, using ./scripts/flamegraph.sh, running the application with a 6th command line argument which specifies where to save the Perfetto profile, lastly you can use nsys to profile the application. The flamegraph script will generate a flamegraph of the application, which can be viewed in a web browser. The Perfetto profile can be opened in the Perfetto UI, and NVIDIA Nsight Systems UI can also be used for profiling if you hook up the path to the train binary.

Contributions Welcome

I welcome contributions from the community! If you're interested in improving ALE-libtorch-PPO, here are some ways you can help:

Reporting Bugs: If you find a bug, please open an issue and provide as much detail as possible.
Suggesting Enhancements: Have an idea for a new feature or an improvement to an existing one? I'd love to hear it.
Code Contributions: If you'd like to contribute code, please fork the repository and submit a pull request. I appreciate all contributions, from small bug fixes to major new features.

I look forward to collaborating with you!

Acknowledgements

Obviously, the authors of any of the libraries & tools used in this project deserve credit, including but not limited to:

Additionally, kudos to Costa Huang who authored CleanRL which served as a baseline for comparing the results of this project.

In Memory

This project is dedicated to my late Gran, who always supported my endeavours. I love you, Gran.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.vscode		.vscode
configs		configs
scripts		scripts
src		src
test/ai		test/ai
third_party		third_party
.bazelrc		.bazelrc
.gitignore		.gitignore
.gitmodules		.gitmodules
BUILD		BUILD
MODULE.bazel		MODULE.bazel
MODULE.bazel.lock		MODULE.bazel.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ALE-libtorch-PPO

Run Instructions

Results

V0 Config

V1 Config

Profiling

Contributions Welcome

Acknowledgements

In Memory

About

Uh oh!

Languages

cemlyn007/ale-libtorch-ppo

Folders and files

Latest commit

History

Repository files navigation

ALE-libtorch-PPO

Run Instructions

Results

V0 Config

V1 Config

Profiling

Contributions Welcome

Acknowledgements

In Memory

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages