GRL: Game Reinforcement Learning for Post‑training LLMs

Game Reinforcement Learning (GRL) for post‑training large language models

GRL (Game Reinforcement Learning) is an open‑source framework that post‑trains LLMs via multi‑turn reinforcement learning on games, yielding general gains across diverse benchmarks.

Release

[2025/08/27] We release GRL to reproduce the paper’s results and to demonstrate general gains across benchmarks by post‑training LLMs via reinforcement learning.

Installation

# clone the repo
git clone --recurse-submodules https://github.com/lmgame-org/GRL.git
cd GRL

# create a conda environment
conda create --name grl python=3.10
conda activate grl

# install all dependencies
source scripts/install_submodules.sh
# avoid compiling flash-attn from source
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install flash-attn==2.8.0.post2 --no-build-isolation
pip install -e .

# export environment variables
export WANDB_API_KEY=your_wandb_api_key
export WANDB_ENTITY=your_wandb_entity
export HF_TOKEN=your_huggingface_token

Optional: Install Datasets

If you want to reproduce paper results and validate BIRD SQL performance or WebShop full dataset performance:

source scripts/install_dataset.sh --all

Quick Run

For quick experimentation: Trains on 6×6 (1‑box) Sokoban and evaluate the transferability to Tetris, Blocksworld, and GSM8K.

source quick_train_qwen_halfb.sh

Training Examples

General gains of LLM ability from game RL training (paper‑reported results)

Expected Observed validation success rate curves (examples)

Note: RL training results may fluctuate relative to reported results, but the overall trend and gains remain consistent.

Sokoban Agent Training:

source examples/sokoban_ppo/qwen_7b.sh

Tetris Agent Training:

source examples/tetris_ppo/qwen_7b.sh

Note: BirdAgent may wait on SQLite file readiness or locks; heavy SQL can stall rollouts and prolong validation.

Hardware Configuration

The framework is pre‑configured for different GPU setups:

GPU Type	GPUs	Agent Groups	Group Size	Total Agents	Default Model	Task
A100	1	8	16	128	Qwen/Qwen2.5-0.5B-Instruct	Sokoban
L40	1	4	8	32	Qwen/Qwen2.5-0.5B-Instruct	Sokoban
A100	8	8	16	128	Qwen/Qwen2.5-7B-Instruct	Sokoban
H200	4	8	16	128	Qwen/Qwen2.5-7B-Instruct	Sokoban
A100	8	8	16	128	Qwen/Qwen2.5-7B-Instruct	Tetris

Note: The framework automatically scales based on available hardware. Adjust parameters in the training scripts for best performance on your setup.

Supported Games and Agents

Sokoban: Puzzle-solving game requiring spatial reasoning
Tetris: decision‑making and planning
GSM8K: Mathematical reasoning tasks
BlocksWorld: Logical planning and manipulation
WebShop: E‑commerce navigation and decision‑making
BIRD: SQL query generation and database reasoning

Documentation

Tutorial - Contributing and development workflow
System Design Overview - Architecture and design principles
Development Guide - Contributing and development workflow

Acknowledgments

Our work is powered by VERL, an open‑source RLHF library, and draws insights from Ragen.

Citation

If you find this repository helpful, please kindly cite:

@article{hu2025lmgame,
  title={lmgame-Bench: How Good are LLMs at Playing Games?},
  author={Hu, Lanxiang and Huo, Mingjia and Zhang, Yuxuan and Yu, Haoyang and Xing, Eric P and Stoica, Ion and Rosing, Tajana and Jin, Haojian and Zhang, Hao},
  journal={arXiv preprint arXiv:2505.15146},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
configs		configs
docs		docs
examples		examples
external		external
grl		grl
scripts		scripts
tests		tests
verl @ 8d9e350		verl @ 8d9e350
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
quick_train_qwen_halfb.sh		quick_train_qwen_halfb.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GRL: Game Reinforcement Learning for Post‑training LLMs

Release

Installation

Optional: Install Datasets

Quick Run

Training Examples

General gains of LLM ability from game RL training (paper‑reported results)

Expected Observed validation success rate curves (examples)

Hardware Configuration

Supported Games and Agents

Documentation

Acknowledgments

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

lmgame-org/GRL

Folders and files

Latest commit

History

Repository files navigation

GRL: Game Reinforcement Learning for Post‑training LLMs

Release

Installation

Optional: Install Datasets

Quick Run

Training Examples

General gains of LLM ability from game RL training (paper‑reported results)

Expected Observed validation success rate curves (examples)

Hardware Configuration

Supported Games and Agents

Documentation

Acknowledgments

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages