Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

💭 Introduction

This repository contains the code for our paper "Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards".

RewardSQL enhances Text-to-SQL generation through a comprehensive process-level reward modeling approach. Our framework consists of three interconnected stages:

Cold Start with Policy Model and PRM
Online RL Training
Reward-assistance Inference

📂 Data Preparation

We provide all necessary datasets in our Google Drive repository.

Download Datasets

RewardSQL-Datasets: Contains all training and testing data
- Bird training data
- Bird dev data
- Spider test data

After downloading, extract the datasets:

mkdir -p data/
unzip RewardSQL-Datasets.zip -d data/

The extracted structure should be:

data/
├── spider/
│   ├── test.json
│   └── database/
└── bird/
    ├── train.json
    ├── dev.json
    └── database/

💻 Environment Preparation

Please install the required packages:

pip install -r requirements.txt

Prepare the following folders:

mkdir -p checkpoints/cocte_model
mkdir -p checkpoints/prm_model
mkdir -p checkpoints/grpo_model
mkdir -p results

⚡ Quick Start

Download pre-trained models

CoCTE SFT Model: Put it under checkpoints/cocte_model.
Process Reward Model: Put it under checkpoints/prm_model.
GRPO Trained Model: Put it under checkpoints/grpo_model.

Text-to-SQL inference

Our updated inference process consists of three steps:

First, start the LLM service:

sh evaluation/apply_llm.sh

Then start the Process Reward Model service:

CUDA_VISIBLE_DEVICES=0 python evaluation/prm_api.py --port 5050

Finally, run the evaluation:

sh evaluation/evaluate_bestofN.sh

👐 Train with GRPO

Our updated training process consists of two steps:

First, start the SQL executor service:

python verl/sql_executor.py

Then start the GRPO training:

sh verl/scripts/run_grpo.sh

We recommend using tmux for managing these different services in separate windows.

📊 Results

Our RewardSQL framework achieves outstanding performance on multiple Text-to-SQL benchmarks:

Model	Bird Dev	Spider Test
Qwen2.5-7B	52.5	75.6
RewardSQL (Greedy)	59.7	77.0
RewardSQL (PRM@32)	68.9	81.7

💬 Citation

If our code is helpful to you, please cite our work:

@article{rewardsql2025,
  title={Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards},
  author={Zhang, Yuxin and Fan, Meihao and Fan, Ju and Yi, Mingyang and Luo, Yuyu and Tan, Jian and Li, Guoliang},
  journal={arXiv preprint arXiv:2505.04671},
  year={2025}
}

🌻 Acknowledgement

We implement our reinforcement learning algorithm extending from veRL framework. We utilize vLLM for efficient inference and develop evaluation scripts based on BIRD and Spider datasets. Thanks for their great contributions!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
evaluation		evaluation
verl		verl
LICENSE		LICENSE
README.md		README.md
overview.jpg		overview.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

💭 Introduction

📂 Data Preparation

Download Datasets

💻 Environment Preparation

⚡ Quick Start

Download pre-trained models

Text-to-SQL inference

👐 Train with GRPO

📊 Results

💬 Citation

🌻 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ruc-datalab/RewardSQL

Folders and files

Latest commit

History

Repository files navigation

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

💭 Introduction

📂 Data Preparation

Download Datasets

💻 Environment Preparation

⚡ Quick Start

Download pre-trained models

Text-to-SQL inference

👐 Train with GRPO

📊 Results

💬 Citation

🌻 Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages