Skip to content

Eclipsess/Awesome-Efficient-Reasoning-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 

Repository files navigation

Awesome-Efficient-Reasoning-LLMs

arXiv

📢 Want to add related papers? Feel free to open a pull request!

📢 News

Pipeline

In this paper, we present the first structured survey that systematically investigates and organizes the current progress in achieving efficient reasoning in LLMs.

📊 Taxonomy

Below is a taxonomy graph summarizing the current landscape of efficient reasoning research for LLMs:

Taxonomy


📚 Table of Contents


"(.)" stands for "To Be Updated" in the survey paper.

Section I: RL with Length Reward Design

  • Demystifying Long Chain-of-Thought Reasoning in LLMs [Paper]
  • O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [Paper]
  • Kimi k1.5: Scaling Reinforcement Learning with LLMs [Paper]
  • Training Language Models to Reason Efficiently [Paper]
  • L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning [Paper]
  • DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models [Paper]
  • Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [Paper]
  • HAWKEYE: Efficient Reasoning with Model Collaboration [Paper]
  • THINKPRUNE: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning [Paper]
  • Think When You Need: Self-Adaptive Chain-of-Thought Learning [Paper]
  • Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning (.) [Paper]
  • ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models (.) [Paper]
  • Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning (.) [Paper]
  • Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning (.) [Paper]

Section II: SFT with Variable-Length CoT Data

  • TokenSkip: Controllable Chain-of-Thought Compression in LLMs [Paper]
  • C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness [Paper]
  • CoT-Valve: Length-Compressible Chain-of-Thought Tuning [Paper]
  • Self-Training Elicits Concise Reasoning in Large Language Models [Paper]
  • Distilling System 2 into System 1 [Paper]
  • Can Language Models Learn to Skip Steps? [Paper]
  • Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models [Paper]
  • Z1: Efficient Test-time Scaling with Code [Paper]
  • Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models (.) [Paper]
  • DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models (.) [Paper]
  • Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning (.) [Paper]
  • AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models (.) [Paper]

Section III: Compressing Reasoning Steps into Fewer Latent Representation

  • Training Large Language Models to Reason in a Continuous Latent Space [Paper]
  • Compressed Chain of Thought: Efficient Reasoning through Dense Representations [Paper]
  • Efficient Reasoning with Hidden Thinking (MLLM) [Paper]
  • SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [Paper]
  • Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [Paper]
  • Reasoning with Latent Thoughts: On the Power of Looped Transformers [Paper]
  • CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation [Paper]
  • Efficient Reasoning with Hidden Thinking [Paper]
  • Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [Paper]
  • Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models (.) [Paper]
  • Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains (.) [Paper]

Section IV: Dynamic Reasoning Paradigm during Inference

  • Efficiently Serving LLM Reasoning Programs with Certaindex [Paper]
  • When More is Less: Understanding Chain-of-Thought Length in LLMs [Paper]
  • Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [Paper]
  • Reward-Guided Speculative Decoding for Efficient LLM Reasoning [Paper]
  • Fast Best-of-N Decoding via Speculative Rejection [Paper]
  • FastMCTS: A Simple Sampling Strategy for Data Synthesis [Paper]
  • Dynamic Parallel Tree Search for Efficient LLM Reasoning [Paper]
  • Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding [Paper]
  • LightThinker: Thinking Step-by-Step Compression (training LLMs to compress thoughts into gist tokens) [Paper]
  • InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models [Paper]
  • Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing [Paper]
  • SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning [Paper]
  • AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [Paper]
  • Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time [Paper]
  • Can atomic step decomposition enhance the self-structured reasoning of multimodal large models? [Paper]
  • Think smarter not harder: Adaptive reasoning with inference aware optimization [Paper]
  • Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [Paper]
  • Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning [Paper]
  • Confidence Improves Self-Consistency in LLMs [Paper]
  • Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning [Paper]
  • Path-consistency: Prefix enhancement for efficient inference in llm [Paper]
  • Bridging internal probability and self-consistency for effective and efficient llm reasoning [Paper]
  • Towards thinking-optimal scaling of test-time compute for llm reasoning [Paper]
  • Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods[Paper]
  • Reasoning models can be effective without thinking [Paper]
  • Retro-search: Exploring untaken paths for deeper and efficient reasoning [Paper]
  • Thought manipulation: External thought can be efficient for large reasoning models [Paper]
  • Sleep-time compute: Beyond inference scaling at test-time [Paper]
  • Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought [Paper]
  • THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models [Paper]
  • Dynamic Early Exit in Reasoning Models [Paper]
  • Accelerated Test-Time Scaling with Model-Free Speculative Sampling (.) [Paper]

Section V: Prompt-Guided Efficient Reasoning

  • Token-Budget-Aware LLM Reasoning [Paper]
  • Chain of Draft: Thinking Faster by Writing Less [Paper]
  • How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach [Paper]
  • The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models [Paper]

Section VI: Prompts Attribute-Driven Reasoning Routing

  • Claude 3.7 Sonnet and Claude Code [website]
  • Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [Paper]
  • Learning to Route LLMs with Confidence Tokens [Paper]
  • Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [Paper]
  • RouteLLM: Learning to Route LLMs with Preference Data [Paper]

Section VII: Reasoning Abilities via Efficient Training Data and Model Compression

  • LIMO: Less is More for Reasoning [Paper]
  • s1: Simple test-time scaling [Paper]
  • S2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [Paper]
  • Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond [Paper]
  • Small Models Struggle to Learn from Strong Reasoners [Paper]
  • Towards Reasoning Ability of Small Language Models [Paper]
  • Mixed Distillation Helps Smaller Language Models Reason Better [Paper]
  • Small language models need strong verifiers to self-correct reasoning [Paper]
  • Teaching Small Language Models Reasoning through Counterfactual Distillation [Paper]
  • Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation [Paper]
  • Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [Paper]
  • Distilling Reasoning Ability from Large Language Models with Adaptive Thinking [Paper]
  • SKIntern: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models [Paper]
  • TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation [Paper]
  • Improving mathematical reasoning capabilities of small language models via feedback-driven distillation [Paper]
  • Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [Paper]
  • TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance [Paper]
  • When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks [Paper]

Section VIII: Evaluation and Benchmark

  • Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [Paper]
  • The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks [Paper]
  • Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights [Paper]
  • Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs [Paper]
  • The Impact of Reasoning Step Length on Large Language Models [Paper]
  • S1-bench: A simple benchmark for evaluating system 1 thinking capability of large reasoning models [Paper]
  • When reasoning meets compression: Benchmarking compressed large reasoning models on complex reasoning tasks. [Paper]
  • Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models [Paper]

Citation

If you find this work useful, welcome to cite us.

@misc{sui2025stopoverthinkingsurveyefficient,
      title={Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models}, 
      author={Yang Sui and Yu-Neng Chuang and Guanchu Wang and Jiamu Zhang and Tianyi Zhang and Jiayi Yuan and Hongyi Liu and Andrew Wen and Shaochen Zhong and Hanjie Chen and Xia Hu},
      year={2025},
      eprint={2503.16419},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.16419}, 
}

Acknowledgment

đź§© Layout inspired by zzli2022/Awesome-System2-Reasoning-LLM. Many thanks for the great structure!

About

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7