Skip to content

pzs19/LEMMA

Repository files navigation

✍️ LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

GitHub Repo Paper on arXiv OpenCompass Discord Quick Start

MetaMath

Model Checkpoint Paper GSM8k MATH License
LEMMA-LLAMA-3-8B 🤗 HF Link 📃 [LEMMA] 79.2 38.3 Llama 3
LEMMA-LLAMA-3-70B 🤗 HF Link 📃 [LEMMA] 91.5 51.8 Llama 3

📝 Key Takeaways

💡 Systematic analysis on error types: Categorizes common model-generated mathematical reasoning errors, revealing consistent error patterns across models and guiding targeted improvements.

💡 Error-type grounded error augmentation: Introduces diverse and meaningful errors by leveraging a teacher model to intentionally inject representative mistakes with type sampled from the analyzed distribution, enhancing the model’s ability to learn from failures.

💡 Two complementary self-correction mechanisms: Combines Fix & Continue (correcting mistakes within the original reasoning) and Fresh & Restart (restarting the reasoning process from scratch) to generate effective revision trajectories.

LEMMA – A novel framework that fine-tunes LLMs on error-corrective trajectories, enabling autonomous error detection and correction during mathematical reasoning.

📊 Result – Up to 13.3% accuracy improvement for LLaMA3-8B with only 90k synthesized data.

The framework of LEMMA. LEMMA uses an error-type grounded mistake augmentation module, and explores two error correction strategies to construct the error-corrective trajectory as training corpus.

MetaMath

Experiments demonstrate that LEMMA significantly outperforms SOTA baselines. LEMMA-trained models also achieve strong generalization ability through evaluation on out-of-distribution (OOD) benchmarks.

MetaMath

🎯 Quick Start

LEMMA mainly requires the following two packages:
# Install LLaMA-Factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

# Install Qwen Evaluation Tool Case, which is is adapted from math-evaluation-harness.
git clone https://github.com/QwenLM/Qwen2.5-Math
cd Qwen2.5-Math
cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt 
pip install vllm==0.5.1 --no-build-isolation
pip install transformers==4.42.3

💾 Dataset Usage

Run the following command to load the data:

from datasets import load_dataset
dataset = load_dataset("panzs19/LEMMA", split="train")

📈 Training

Download the LEMMA dataset from huggingface and convert it to the json format.

dataset = load_dataset("panzs19/LEMMA", split="train")

dataset_list = dataset.to_list()

with open('your_data_dir/dataset.json', 'w', encoding='utf-8') as f:
    json.dump(dataset_list, f, indent=4, ensure_ascii=False)

Specify the data path in scripts/train.sh and LLaMA-Factory/data/dataset_info.json.

bash scripts/train.sh

📏 Evaluation

We use the evaluation tool case in Qwen2.5-Math repository. We provide a shell script to launch all evaluations after training the model.

# Specify the model path in scripts/eval.sh
bash scripts/eval.sh

The inference prompt is:

"### Instruction:\n{instruction}\n\n### Response: Let's think step by step."

For evaluation on mawps and deepmind_math, we use the data provided in RefAug repository to ensure a fair comparison.

⚙️ Dataset Collection

To collect your own LEMMA data, please refer to the following scripts:

# Error type and step analysis
bash scripts/error_type.sh
bash scripts/error_step.sh
# Error Augmentation
bash scripts/error_inject.sh
# Fresh & Restart Correction
bash scripts/error_connect.sh
# Fix & Continue Correction
bash scripts/error_correct.sh
# Smooth
bash scripts/smooth.sh

Thanks for the open source code of LLaMA-Factory, math-evaluation-harness and Qwen2.5-Math. Some of our codes are based on them.

Citation

Please cite the paper if you refer to our model, code, data or paper.

@article{LEMMA,
  title={LEMMA: Learning from Errors for MatheMatical Advancement in LLMs},
  author={Zhuoshi Pan, Yu Li, Honglin Lin, Qizhi Pei, Zinan Tang, Wei Wu, Chenlin Ming, H. Vicky Zhao, Conghui He, Lijun Wu},
  journal={arXiv preprint arXiv:2503.17439},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages