Awesome-Efficient-Reasoning-LLMs

📢 Want to add related papers? Feel free to open a pull request!

📢 News

March 20, 2025: We release the first survey for efficient reasoning of LLMs "Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models".
Feel free to cite, contribute, or open a pull request to add recent related papers!
April 22, 2025: Updated.

In this paper, we present the first structured survey that systematically investigates and organizes the current progress in achieving efficient reasoning in LLMs.

📊 Taxonomy

Below is a taxonomy graph summarizing the current landscape of efficient reasoning research for LLMs:

📚 Table of Contents

Awesome-Efficient-Reasoning-LLM
- Model-based Efficient Reasoning
  - Section I: RL with Length Reward Design
  - Section II: SFT with Variable-Length CoT Data
- Reasoning Output-based Efficient Reasoning
  - Section III: Compressing Reasoning Steps into Fewer Latent Representation
  - Section IV: Dynamic Reasoning Paradigm during Inference
- Input Prompt-based Efficient Reasoning
  - Section V: Prompt-Guided Efficient Reasoning
  - Section VI: Prompts Attribute-Driven Reasoning Routing
- Reasoning Abilities with Efficient Data and Small Language Models
  - Section VII: Reasoning Abilities via Efficient Training Data and Model Compression
- Evaluation and Benchmark
  - Section VIII: Evaluation and Benchmark

"(.)" stands for "To Be Updated" in the survey paper.

Section I: RL with Length Reward Design

Demystifying Long Chain-of-Thought Reasoning in LLMs [Paper]
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [Paper]
Kimi k1.5: Scaling Reinforcement Learning with LLMs [Paper]
Training Language Models to Reason Efficiently [Paper]
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning [Paper]
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models [Paper]
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [Paper]
HAWKEYE: Efficient Reasoning with Model Collaboration [Paper]
THINKPRUNE: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning [Paper]
Think When You Need: Self-Adaptive Chain-of-Thought Learning [Paper]
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning (.) [Paper]
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models (.) [Paper]
Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning (.) [Paper]
Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning (.) [Paper]

Section II: SFT with Variable-Length CoT Data

TokenSkip: Controllable Chain-of-Thought Compression in LLMs [Paper]
C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness [Paper]
CoT-Valve: Length-Compressible Chain-of-Thought Tuning [Paper]
Self-Training Elicits Concise Reasoning in Large Language Models [Paper]
Distilling System 2 into System 1 [Paper]
Can Language Models Learn to Skip Steps? [Paper]
Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models [Paper]
Z1: Efficient Test-time Scaling with Code [Paper]
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models (.) [Paper]
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models (.) [Paper]
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning (.) [Paper]
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models (.) [Paper]

Section III: Compressing Reasoning Steps into Fewer Latent Representation

Training Large Language Models to Reason in a Continuous Latent Space [Paper]
Compressed Chain of Thought: Efficient Reasoning through Dense Representations [Paper]
Efficient Reasoning with Hidden Thinking (MLLM) [Paper]
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [Paper]
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [Paper]
Reasoning with Latent Thoughts: On the Power of Looped Transformers [Paper]
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation [Paper]
Efficient Reasoning with Hidden Thinking [Paper]
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [Paper]
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models (.) [Paper]
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains (.) [Paper]

Section IV: Dynamic Reasoning Paradigm during Inference

Efficiently Serving LLM Reasoning Programs with Certaindex [Paper]
When More is Less: Understanding Chain-of-Thought Length in LLMs [Paper]
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [Paper]
Reward-Guided Speculative Decoding for Efficient LLM Reasoning [Paper]
Fast Best-of-N Decoding via Speculative Rejection [Paper]
FastMCTS: A Simple Sampling Strategy for Data Synthesis [Paper]
Dynamic Parallel Tree Search for Efficient LLM Reasoning [Paper]
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding [Paper]
LightThinker: Thinking Step-by-Step Compression (training LLMs to compress thoughts into gist tokens) [Paper]
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models [Paper]
Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing [Paper]
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning [Paper]
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [Paper]
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time [Paper]
Can atomic step decomposition enhance the self-structured reasoning of multimodal large models? [Paper]
Think smarter not harder: Adaptive reasoning with inference aware optimization [Paper]
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [Paper]
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning [Paper]
Confidence Improves Self-Consistency in LLMs [Paper]
Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning [Paper]
Path-consistency: Prefix enhancement for efficient inference in llm [Paper]
Bridging internal probability and self-consistency for effective and efficient llm reasoning [Paper]
Towards thinking-optimal scaling of test-time compute for llm reasoning [Paper]
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods[Paper]
Reasoning models can be effective without thinking [Paper]
Retro-search: Exploring untaken paths for deeper and efficient reasoning [Paper]
Thought manipulation: External thought can be efficient for large reasoning models [Paper]
Sleep-time compute: Beyond inference scaling at test-time [Paper]
Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought [Paper]
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models [Paper]
Dynamic Early Exit in Reasoning Models [Paper]
Accelerated Test-Time Scaling with Model-Free Speculative Sampling (.) [Paper]

Section V: Prompt-Guided Efficient Reasoning

Token-Budget-Aware LLM Reasoning [Paper]
Chain of Draft: Thinking Faster by Writing Less [Paper]
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach [Paper]
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models [Paper]

Section VI: Prompts Attribute-Driven Reasoning Routing

Claude 3.7 Sonnet and Claude Code [website]
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [Paper]
Learning to Route LLMs with Confidence Tokens [Paper]
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [Paper]
RouteLLM: Learning to Route LLMs with Preference Data [Paper]

Section VII: Reasoning Abilities via Efficient Training Data and Model Compression

LIMO: Less is More for Reasoning [Paper]
s1: Simple test-time scaling [Paper]
S2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [Paper]
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond [Paper]
Small Models Struggle to Learn from Strong Reasoners [Paper]
Towards Reasoning Ability of Small Language Models [Paper]
Mixed Distillation Helps Smaller Language Models Reason Better [Paper]
Small language models need strong verifiers to self-correct reasoning [Paper]
Teaching Small Language Models Reasoning through Counterfactual Distillation [Paper]
Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation [Paper]
Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [Paper]
Distilling Reasoning Ability from Large Language Models with Adaptive Thinking [Paper]
SKIntern: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models [Paper]
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation [Paper]
Improving mathematical reasoning capabilities of small language models via feedback-driven distillation [Paper]
Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [Paper]
TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance [Paper]
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks [Paper]

Section VIII: Evaluation and Benchmark

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [Paper]
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks [Paper]
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights [Paper]
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs [Paper]
The Impact of Reasoning Step Length on Large Language Models [Paper]
S1-bench: A simple benchmark for evaluating system 1 thinking capability of large reasoning models [Paper]
When reasoning meets compression: Benchmarking compressed large reasoning models on complex reasoning tasks. [Paper]
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models [Paper]

Citation

If you find this work useful, welcome to cite us.

@misc{sui2025stopoverthinkingsurveyefficient,
      title={Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models}, 
      author={Yang Sui and Yu-Neng Chuang and Guanchu Wang and Jiamu Zhang and Tianyi Zhang and Jiayi Yuan and Hongyi Liu and Andrew Wen and Shaochen Zhong and Hanjie Chen and Xia Hu},
      year={2025},
      eprint={2503.16419},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.16419}, 
}

Acknowledgment

🧩 Layout inspired by zzli2022/Awesome-System2-Reasoning-LLM. Many thanks for the great structure!

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-Efficient-Reasoning-LLMs

📢 Want to add related papers? Feel free to open a pull request!

📢 News

📊 Taxonomy

📚 Table of Contents

Section I: RL with Length Reward Design

Section II: SFT with Variable-Length CoT Data

Section III: Compressing Reasoning Steps into Fewer Latent Representation

Section IV: Dynamic Reasoning Paradigm during Inference

Section V: Prompt-Guided Efficient Reasoning

Section VI: Prompts Attribute-Driven Reasoning Routing

Section VII: Reasoning Abilities via Efficient Training Data and Model Compression

Section VIII: Evaluation and Benchmark

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

Eclipsess/Awesome-Efficient-Reasoning-LLMs

Folders and files

Latest commit

History

Repository files navigation

Awesome-Efficient-Reasoning-LLMs

📢 Want to add related papers? Feel free to open a pull request!

📢 News

📊 Taxonomy

📚 Table of Contents

Section I: RL with Length Reward Design

Section II: SFT with Variable-Length CoT Data

Section III: Compressing Reasoning Steps into Fewer Latent Representation

Section IV: Dynamic Reasoning Paradigm during Inference

Section V: Prompt-Guided Efficient Reasoning

Section VI: Prompts Attribute-Driven Reasoning Routing

Section VII: Reasoning Abilities via Efficient Training Data and Model Compression

Section VIII: Evaluation and Benchmark

Citation

Acknowledgment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Packages