A novel self-backtracking method for improving language model reasoning.
This repository implements the Self-BackTracking method, that equips LLMs with the ability to backtrack during both training and inference. This mechanism not only enhances reasoning ability but also efficiency by transforming slow-thinking processes into fast-thinking through self-improvement.
The project utilizes the Countdown dataset, which is pre-constructed and accessible on Hugging Face. Additionally, we have open-sourced our trained model based on Llama-3.2-1B.
To train the model:
CUDA_VISIBLE_DEVICES=0 python train.py \
--config ../configs/sft.conf
You can change the parameters in the configs/sft.conf
file.
If you want to use multiple GPUs:
accelerate launch \
--config_file ../configs/accelerate.yaml \
train.py \
--config ../configs/sft.conf
To inference the model using our self-backtracking method, you can run the following command:
CUDA_VISIBLE_DEVICES=0 python eval_search.py \
--num 5000 \
--ckpt [your_model_ckpt] \
--data [val/val_new] \
--decoder self_backtrack \
--b 1 \
--n 32
--ckpt defaults to yangxw/Llama-3.2-1B-countdown-backtrack
. You can use our trained model available on Hugging Face.
To further improve the model, you can run the following command:
CUDA_VISIBLE_DEVICES=0 python train_self_improvement.py \
--num 5000 \
--past_model [your_model_ckpt] \
--data [val/val_new]
If you use this work, please cite it as follows:
@article{selfbacktracking,
title={Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models},
author={Xiao-Wen Yang and Xuan-Yi Zhu and Wen-Da Wei and Ding-Chu Zhang and Jie-Jing Shao and Zhi Zhou and Lan-Zhe Guo and Yu-Feng Li},
journal={arXiv preprint arXiv:2502.04404},
year={2025}
}