This repository contains the code for our paper "Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards".
RewardSQL enhances Text-to-SQL generation through a comprehensive process-level reward modeling approach. Our framework consists of three interconnected stages:
- Cold Start with Policy Model and PRM
- Online RL Training
- Reward-assistance Inference
We provide all necessary datasets in our Google Drive repository.
- RewardSQL-Datasets: Contains all training and testing data
- Bird training data
- Bird dev data
- Spider test data
After downloading, extract the datasets:
mkdir -p data/
unzip RewardSQL-Datasets.zip -d data/
The extracted structure should be:
data/
├── spider/
│ ├── test.json
│ └── database/
└── bird/
├── train.json
├── dev.json
└── database/
Please install the required packages:
pip install -r requirements.txt
Prepare the following folders:
mkdir -p checkpoints/cocte_model
mkdir -p checkpoints/prm_model
mkdir -p checkpoints/grpo_model
mkdir -p results
- CoCTE SFT Model: Put it under
checkpoints/cocte_model
. - Process Reward Model: Put it under
checkpoints/prm_model
. - GRPO Trained Model: Put it under
checkpoints/grpo_model
.
Our updated inference process consists of three steps:
- First, start the LLM service:
sh evaluation/apply_llm.sh
- Then start the Process Reward Model service:
CUDA_VISIBLE_DEVICES=0 python evaluation/prm_api.py --port 5050
- Finally, run the evaluation:
sh evaluation/evaluate_bestofN.sh
Our updated training process consists of two steps:
- First, start the SQL executor service:
python verl/sql_executor.py
- Then start the GRPO training:
sh verl/scripts/run_grpo.sh
We recommend using tmux
for managing these different services in separate windows.
Our RewardSQL framework achieves outstanding performance on multiple Text-to-SQL benchmarks:
Model | Bird Dev | Spider Test |
---|---|---|
Qwen2.5-7B | 52.5 | 75.6 |
RewardSQL (Greedy) | 59.7 | 77.0 |
RewardSQL (PRM@32) | 68.9 | 81.7 |
If our code is helpful to you, please cite our work:
@article{rewardsql2025,
title={Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards},
author={Zhang, Yuxin and Fan, Meihao and Fan, Ju and Yi, Mingyang and Luo, Yuyu and Tan, Jian and Li, Guoliang},
journal={arXiv preprint arXiv:2505.04671},
year={2025}
}
We implement our reinforcement learning algorithm extending from veRL framework. We utilize vLLM for efficient inference and develop evaluation scripts based on BIRD and Spider datasets. Thanks for their great contributions!