Skip to content

ruc-datalab/RewardSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

💭 Introduction

This repository contains the code for our paper "Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards".

RewardSQL enhances Text-to-SQL generation through a comprehensive process-level reward modeling approach. Our framework consists of three interconnected stages:

  1. Cold Start with Policy Model and PRM
  2. Online RL Training
  3. Reward-assistance Inference

Overview

📂 Data Preparation

We provide all necessary datasets in our Google Drive repository.

Download Datasets

  • RewardSQL-Datasets: Contains all training and testing data
    • Bird training data
    • Bird dev data
    • Spider test data

After downloading, extract the datasets:

mkdir -p data/
unzip RewardSQL-Datasets.zip -d data/

The extracted structure should be:

data/
├── spider/
│   ├── test.json
│   └── database/
└── bird/
    ├── train.json
    ├── dev.json
    └── database/

💻 Environment Preparation

Python PyTorch Transformers

Please install the required packages:

pip install -r requirements.txt

Prepare the following folders:

mkdir -p checkpoints/cocte_model
mkdir -p checkpoints/prm_model
mkdir -p checkpoints/grpo_model
mkdir -p results

⚡ Quick Start

Download pre-trained models

Text-to-SQL inference

Our updated inference process consists of three steps:

  1. First, start the LLM service:
sh evaluation/apply_llm.sh
  1. Then start the Process Reward Model service:
CUDA_VISIBLE_DEVICES=0 python evaluation/prm_api.py --port 5050
  1. Finally, run the evaluation:
sh evaluation/evaluate_bestofN.sh

👐 Train with GRPO

Our updated training process consists of two steps:

  1. First, start the SQL executor service:
python verl/sql_executor.py
  1. Then start the GRPO training:
sh verl/scripts/run_grpo.sh

We recommend using tmux for managing these different services in separate windows.

📊 Results

Our RewardSQL framework achieves outstanding performance on multiple Text-to-SQL benchmarks:

Model Bird Dev Spider Test
Qwen2.5-7B 52.5 75.6
RewardSQL (Greedy) 59.7 77.0
RewardSQL (PRM@32) 68.9 81.7

💬 Citation

If our code is helpful to you, please cite our work:

@article{rewardsql2025,
  title={Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards},
  author={Zhang, Yuxin and Fan, Meihao and Fan, Ju and Yi, Mingyang and Luo, Yuyu and Tan, Jian and Li, Guoliang},
  journal={arXiv preprint arXiv:2505.04671},
  year={2025}
}

🌻 Acknowledgement

We implement our reinforcement learning algorithm extending from veRL framework. We utilize vLLM for efficient inference and develop evaluation scripts based on BIRD and Spider datasets. Thanks for their great contributions!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •