See project explanation : here.
Wandb project : here.
| 🚩 Uncertainty Marking | 📝 Progressive Summarization | ✅ Self Verification | 🌐 Multilingual Switching | 
|---|---|---|---|
| Flag ambiguous steps for verification | Maintain intermediate conclusions | First verify then answer | Chinese reasoning traces with English answers | 
![]()  | 
    ![]()  | 
  
| Test Score Plot | Average Output Length Plot | 
![]()  | 
  
| Model Output Example | 
| Model | 2ppl | 3ppl | 4ppl | 5ppl | 6ppl | 7ppl | 8ppl | 
|---|---|---|---|---|---|---|---|
| o1-2024-12-17 | 0.83 | 0.51 | 0.38 | 0.38 | 0.35 | 0.30 | 0.20 | 
| GPT-4o | 0.68 | 0.57 | 0.49 | 0.32 | 0.23 | 0.21 | 0.11 | 
| Deepseek-Math-7b | 0.35 | 0.21 | 0.08 | 0.06 | 0.02 | 0.00 | 0.00 | 
| Qwen2.5-7B-Instruct-1M | 0.49 | 0.40 | 0.25 | 0.11 | 0.02 | 0.06 | 0.01 | 
| Qwen2.5-7B-Logic-RL (ours) | 0.68 | 0.59 | 0.44 | 0.34 | 0.22 | 0.16 | 0.15 | 
Our model only used 2K training data with 400 training steps. More model benchmarks will be updated later this week.
conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e .  # For verl integration
pip install wandb IPython matplotlibYou can directly use /data.
For your own data generation, here's a demo:
python ./examples/data_preprocess/kk.py \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}python ./examples/data_preprocess/kk.py \
    --template_type=qwen-instruct \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}conda activate logic
bash main_grpo.sh  # 4×A100 80G| Component | Location | 
|---|---|
| Reward Modeling | verl/utils/reward_score/kk.py | 
| Data Preprocessing | examples/data_preprocess/kk.py | 
@misc{logic-rl,
author       = {Tian Xie and Qingnan Ren and Yuqian Hong and Zitian Gao},
title        = {Logic-RL},
howpublished = {https://github.com/Unakar/Logic-RL},
note         = {Accessed: 2025-02-03},
year         = {2025}
}


