- March 20, 2025: We release the first survey for efficient reasoning of LLMs "Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models".
Feel free to cite, contribute, or open a pull request to add recent related papers! - April 22, 2025: Updated.
In this paper, we present the first structured survey that systematically investigates and organizes the current progress in achieving efficient reasoning in LLMs.
Below is a taxonomy graph summarizing the current landscape of efficient reasoning research for LLMs:
- Awesome-Efficient-Reasoning-LLM
- Model-based Efficient Reasoning
- Reasoning Output-based Efficient Reasoning
- Input Prompt-based Efficient Reasoning
- Reasoning Abilities with Efficient Data and Small Language Models
- Evaluation and Benchmark
"(.)" stands for "To Be Updated" in the survey paper.
- Demystifying Long Chain-of-Thought Reasoning in LLMs [Paper]
- O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [Paper]
- Kimi k1.5: Scaling Reinforcement Learning with LLMs [Paper]
- Training Language Models to Reason Efficiently [Paper]
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning [Paper]
- DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models [Paper]
- Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [Paper]
- HAWKEYE: Efficient Reasoning with Model Collaboration [Paper]
- THINKPRUNE: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning [Paper]
- Think When You Need: Self-Adaptive Chain-of-Thought Learning [Paper]
- Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning (.) [Paper]
- ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models (.) [Paper]
- Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning (.) [Paper]
- Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning (.) [Paper]
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs [Paper]
- C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness [Paper]
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning [Paper]
- Self-Training Elicits Concise Reasoning in Large Language Models [Paper]
- Distilling System 2 into System 1 [Paper]
- Can Language Models Learn to Skip Steps? [Paper]
- Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models [Paper]
- Z1: Efficient Test-time Scaling with Code [Paper]
- Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models (.) [Paper]
- DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models (.) [Paper]
- Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning (.) [Paper]
- AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models (.) [Paper]
- Training Large Language Models to Reason in a Continuous Latent Space [Paper]
- Compressed Chain of Thought: Efficient Reasoning through Dense Representations [Paper]
- Efficient Reasoning with Hidden Thinking (MLLM) [Paper]
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [Paper]
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [Paper]
- Reasoning with Latent Thoughts: On the Power of Looped Transformers [Paper]
- CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation [Paper]
- Efficient Reasoning with Hidden Thinking [Paper]
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [Paper]
- Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models (.) [Paper]
- Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains (.) [Paper]
- Efficiently Serving LLM Reasoning Programs with Certaindex [Paper]
- When More is Less: Understanding Chain-of-Thought Length in LLMs [Paper]
- Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [Paper]
- Reward-Guided Speculative Decoding for Efficient LLM Reasoning [Paper]
- Fast Best-of-N Decoding via Speculative Rejection [Paper]
- FastMCTS: A Simple Sampling Strategy for Data Synthesis [Paper]
- Dynamic Parallel Tree Search for Efficient LLM Reasoning [Paper]
- Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding [Paper]
- LightThinker: Thinking Step-by-Step Compression (training LLMs to compress thoughts into gist tokens) [Paper]
- InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models [Paper]
- Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing [Paper]
- SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning [Paper]
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [Paper]
- Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time [Paper]
- Can atomic step decomposition enhance the self-structured reasoning of multimodal large models? [Paper]
- Think smarter not harder: Adaptive reasoning with inference aware optimization [Paper]
- Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [Paper]
- Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning [Paper]
- Confidence Improves Self-Consistency in LLMs [Paper]
- Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning [Paper]
- Path-consistency: Prefix enhancement for efficient inference in llm [Paper]
- Bridging internal probability and self-consistency for effective and efficient llm reasoning [Paper]
- Towards thinking-optimal scaling of test-time compute for llm reasoning [Paper]
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods[Paper]
- Reasoning models can be effective without thinking [Paper]
- Retro-search: Exploring untaken paths for deeper and efficient reasoning [Paper]
- Thought manipulation: External thought can be efficient for large reasoning models [Paper]
- Sleep-time compute: Beyond inference scaling at test-time [Paper]
- Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought [Paper]
- THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models [Paper]
- Dynamic Early Exit in Reasoning Models [Paper]
- Accelerated Test-Time Scaling with Model-Free Speculative Sampling (.) [Paper]
- Token-Budget-Aware LLM Reasoning [Paper]
- Chain of Draft: Thinking Faster by Writing Less [Paper]
- How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach [Paper]
- The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models [Paper]
- Claude 3.7 Sonnet and Claude Code [website]
- Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [Paper]
- Learning to Route LLMs with Confidence Tokens [Paper]
- Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [Paper]
- RouteLLM: Learning to Route LLMs with Preference Data [Paper]
- LIMO: Less is More for Reasoning [Paper]
- s1: Simple test-time scaling [Paper]
- S2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [Paper]
- Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond [Paper]
- Small Models Struggle to Learn from Strong Reasoners [Paper]
- Towards Reasoning Ability of Small Language Models [Paper]
- Mixed Distillation Helps Smaller Language Models Reason Better [Paper]
- Small language models need strong verifiers to self-correct reasoning [Paper]
- Teaching Small Language Models Reasoning through Counterfactual Distillation [Paper]
- Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation [Paper]
- Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [Paper]
- Distilling Reasoning Ability from Large Language Models with Adaptive Thinking [Paper]
- SKIntern: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models [Paper]
- TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation [Paper]
- Improving mathematical reasoning capabilities of small language models via feedback-driven distillation [Paper]
- Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [Paper]
- TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance [Paper]
- When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks [Paper]
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [Paper]
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks [Paper]
- Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights [Paper]
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs [Paper]
- The Impact of Reasoning Step Length on Large Language Models [Paper]
- S1-bench: A simple benchmark for evaluating system 1 thinking capability of large reasoning models [Paper]
- When reasoning meets compression: Benchmarking compressed large reasoning models on complex reasoning tasks. [Paper]
- Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models [Paper]
If you find this work useful, welcome to cite us.
@misc{sui2025stopoverthinkingsurveyefficient,
title={Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models},
author={Yang Sui and Yu-Neng Chuang and Guanchu Wang and Jiamu Zhang and Tianyi Zhang and Jiayi Yuan and Hongyi Liu and Andrew Wen and Shaochen Zhong and Hanjie Chen and Xia Hu},
year={2025},
eprint={2503.16419},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.16419},
}
đź§© Layout inspired by zzli2022/Awesome-System2-Reasoning-LLM. Many thanks for the great structure!