Graph-R1

🧠 A Small Reasoning Language Model for Complex Graph Problems with 10k+ tokens long COT RFT and Reinforcement Learning 🔗

Paper Abstract

Reasoning Language Models have achieved impressive success across various complex reasoning tasks, but their capabilities in solving complex graph problems remain less explored, especially for small language models. To bridge this gap, we present Graph-R1, a small reasoning language model specifically designed to tackle complex graph problems. Our approach integrates cold-start Rejection Sampling Supervised Fine-Tuning (RFT) and Reinforcement Learning (RL) framework with fine-grained rewards and curriculum learning, enhancing both performance and efficiency. This repository contains the reproduction code for Graph-R1, a small reasoning language model specifically designed to tackle complex graph problems through integrated supervised fine-tuning and reinforcement learning.

Models and Resources

🎯 Models

Model	Base	Parameters	Description
Graph-R1-7B	Qwen2.5-7B-Instruct-1M	7.62B	Main reasoning model
Graph-R1-1.5B	Qwen2.5-1.5B	1.78B	Lightweight version

📊 Training Data

Dataset	Size	Description
Graph-R1-SFT-30K	30K samples	Ultra-long CoT reasoning traces

🏢 Organization

HKUST-DSAIL - Data Science & AI Lab at HKUST

📈 Performance Overview

Complex Graph Problems (Small Scale)

Model	TSP Acc.	GED Acc.	MCP Acc.	Average
QwQ-32B	89.4	70.2	96.2	85.3
Claude-3.5-Sonnet	45.4	37.2	62.2	48.3
GPT-4o	44.2	32.6	62.4	46.4
Graph-R1-7B	91.8	68.2	97.0	85.7
Graph-R1-1.5B	44.6	28.4	53.0	42.0

Complex Graph Problems (Large Scale)

Model	TSP Acc.	GED Acc.	MCP Acc.	Average
QwQ-32B	7.6	7.4	64.6	26.5
Claude-3.5-Sonnet	2.8	3.4	19.0	8.4
GPT-4o	0.8	2.4	15.2	6.1
Graph-R1-7B	11.2	6.2	63.6	27.0
Graph-R1-1.5B	1.8	3.4	21.4	8.9

Cross-Domain Transferability

Model	AIME25 (pass@64)	AIME24 (pass@64)	Math500 (pass@8)	Avg Improvement
Base Model	26.7	33.3	86.2	-
RFT Model	33.3 (+24.7%)	40.0 (+20.1%)	87.0 (+0.9%)	+17.9%
Graph-R1-7B	33.3 (+24.7%)	30.0 (-9.9%)	88.0 (+2.1%)	+7.6%

Quick Start on Training

Data Preparation

Before starting training, you need to download and prepare the source data files. These files are large and have been excluded from the repository to keep it lightweight.

Download Source Data

# Create the source directory
mkdir -p verl/verl/utils/reward_score/tasks/source

# Download the required data files
cd verl/verl/utils/reward_score/tasks/source

# Method 1: Using gdown (recommended)
pip install gdown
gdown 1meKois5K3SVfTlEhn1FQNfXzq2S6NFvq

# Method 2: Using Google Drive Link
# access https://drive.google.com/file/d/1meKois5K3SVfTlEhn1FQNfXzq2S6NFvq/view?usp=sharing

# Extract the compressed data
tar -xzf source.tar.gz

# Clean up the compressed file (optional)
rm source.tar.gz

Training Environment Setup

Please Follow Verl Setting Shown in verl/README.md.

Two-Stage Training Pipeline

Stage 1: Supervised Fine-Tuning

We recommend VERL's SFT framework for 3x speedup over the original 360-llama-factory approach.

cd verl/recipe/graph-r1/sft/
bash run_sft.sh

Stage 2: Reinforcement Learning with Curriculum Learning

# Full curriculum learning (5 levels)
bash curriculum_learning_full.sh

Repository Structure

Graph-R1/
├── README.md
├── requirements.txt
├── verl/
│   └── recipe/
│       └── graph-r1/
│           ├── sft/                         # Stage 1: SFT Training
│           │   ├── run_sft.sh              #   Training script
│           │   ├── sft_trainer.yaml        #   Configuration
│           │   └── SFT_README.md           #   Documentation
│           ├── curriculum_learning_full.sh # Stage 2: Full curriculum
│           ├── README.md                   # RL training guide
│           └── CURRICULUM_README.md        # Curriculum guide
├── eval/                                   # TODO: Evaluation scripts
├── data/                                   # TODO: Data processing
└── docs/                                   # Documentation

Hardware Requirements

Recommended Configuration

Component	Specification
GPU	8x A800 80GB

Citation

If you use this code or models in your research, please cite:

@misc{graph-r1-2025,
  title={Graph-R1: A Small Reasoning Language Model for Complex Graph Problems},
  author={HKUST-DSAIL},
  year={2025},
  url={https://github.com/Graph-Reasoner/Graph-R1},
  note={arXiv preprint under review}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
verl		verl
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graph-R1

Paper Abstract

Models and Resources

🎯 Models

📊 Training Data

🏢 Organization

📈 Performance Overview

Complex Graph Problems (Small Scale)

Complex Graph Problems (Large Scale)

Cross-Domain Transferability

Quick Start on Training

Data Preparation

Download Source Data

Training Environment Setup

Two-Stage Training Pipeline

Stage 1: Supervised Fine-Tuning

Stage 2: Reinforcement Learning with Curriculum Learning

Repository Structure

Hardware Requirements

Recommended Configuration

Citation

About

Uh oh!

Releases

Packages

Languages

Graph-Reasoner/Graph-R1

Folders and files

Latest commit

History

Repository files navigation

Graph-R1

Paper Abstract

Models and Resources

🎯 Models

📊 Training Data

🏢 Organization

📈 Performance Overview

Complex Graph Problems (Small Scale)

Complex Graph Problems (Large Scale)

Cross-Domain Transferability

Quick Start on Training

Data Preparation

Download Source Data

Training Environment Setup

Two-Stage Training Pipeline

Stage 1: Supervised Fine-Tuning

Stage 2: Reinforcement Learning with Curriculum Learning

Repository Structure

Hardware Requirements

Recommended Configuration

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages