✍️ LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

Model	Checkpoint	Paper	GSM8k	MATH	License
LEMMA-LLAMA-3-8B	🤗 HF Link	📃 [LEMMA]	79.2	38.3	Llama 3
LEMMA-LLAMA-3-70B	🤗 HF Link	📃 [LEMMA]	91.5	51.8	Llama 3

📝 Key Takeaways

💡 Systematic analysis on error types: Categorizes common model-generated mathematical reasoning errors, revealing consistent error patterns across models and guiding targeted improvements.

💡 Error-type grounded error augmentation: Introduces diverse and meaningful errors by leveraging a teacher model to intentionally inject representative mistakes with type sampled from the analyzed distribution, enhancing the model’s ability to learn from failures.

💡 Two complementary self-correction mechanisms: Combines Fix & Continue (correcting mistakes within the original reasoning) and Fresh & Restart (restarting the reasoning process from scratch) to generate effective revision trajectories.

✅ LEMMA – A novel framework that fine-tunes LLMs on error-corrective trajectories, enabling autonomous error detection and correction during mathematical reasoning.

📊 Result – Up to 13.3% accuracy improvement for LLaMA3-8B with only 90k synthesized data.

The framework of LEMMA. LEMMA uses an error-type grounded mistake augmentation module, and explores two error correction strategies to construct the error-corrective trajectory as training corpus.

Experiments demonstrate that LEMMA significantly outperforms SOTA baselines. LEMMA-trained models also achieve strong generalization ability through evaluation on out-of-distribution (OOD) benchmarks.

🎯 Quick Start

LEMMA mainly requires the following two packages:

LLaMA-Factory for model training.
math-evaluation-harness for evaluation. We use the version adapted from Qwen2.5-Math.

# Install LLaMA-Factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

# Install Qwen Evaluation Tool Case, which is is adapted from math-evaluation-harness.
git clone https://github.com/QwenLM/Qwen2.5-Math
cd Qwen2.5-Math
cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt 
pip install vllm==0.5.1 --no-build-isolation
pip install transformers==4.42.3

💾 Dataset Usage

Run the following command to load the data:

from datasets import load_dataset
dataset = load_dataset("panzs19/LEMMA", split="train")

📈 Training

Download the LEMMA dataset from huggingface and convert it to the json format.

dataset = load_dataset("panzs19/LEMMA", split="train")

dataset_list = dataset.to_list()

with open('your_data_dir/dataset.json', 'w', encoding='utf-8') as f:
    json.dump(dataset_list, f, indent=4, ensure_ascii=False)

Specify the data path in scripts/train.sh and LLaMA-Factory/data/dataset_info.json.

bash scripts/train.sh

📏 Evaluation

We use the evaluation tool case in Qwen2.5-Math repository. We provide a shell script to launch all evaluations after training the model.

# Specify the model path in scripts/eval.sh
bash scripts/eval.sh

The inference prompt is:

"### Instruction:\n{instruction}\n\n### Response: Let's think step by step."

For evaluation on mawps and deepmind_math, we use the data provided in RefAug repository to ensure a fair comparison.

⚙️ Dataset Collection

To collect your own LEMMA data, please refer to the following scripts:

# Error type and step analysis
bash scripts/error_type.sh
bash scripts/error_step.sh
# Error Augmentation
bash scripts/error_inject.sh
# Fresh & Restart Correction
bash scripts/error_connect.sh
# Fix & Continue Correction
bash scripts/error_correct.sh
# Smooth
bash scripts/smooth.sh

Thanks for the open source code of LLaMA-Factory, math-evaluation-harness and Qwen2.5-Math. Some of our codes are based on them.

Citation

Please cite the paper if you refer to our model, code, data or paper.

@article{LEMMA,
  title={LEMMA: Learning from Errors for MatheMatical Advancement in LLMs},
  author={Zhuoshi Pan, Yu Li, Honglin Lin, Qizhi Pei, Zinan Tang, Wei Wu, Chenlin Ming, H. Vicky Zhao, Conghui He, Lijun Wu},
  journal={arXiv preprint arXiv:2503.17439},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LLaMA-Factory		LLaMA-Factory
asset		asset
evaluation		evaluation
prompts		prompts
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
_utils.py		_utils.py
analyze_error.py		analyze_error.py
connect.py		connect.py
correct.py		correct.py
inject_error.py		inject_error.py
smooth.py		smooth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✍️ LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

📝 Key Takeaways

🎯 Quick Start

💾 Dataset Usage

📈 Training

📏 Evaluation

⚙️ Dataset Collection

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

pzs19/LEMMA

Folders and files

Latest commit

History

Repository files navigation

✍️ LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

📝 Key Takeaways

🎯 Quick Start

💾 Dataset Usage

📈 Training

📏 Evaluation

⚙️ Dataset Collection

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages