Feature Request: Top-nσ sampler

### Prerequisites

- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Here are the key points:

**The Problem:** When LLMs generate text, they typically use either greedy decoding (always picking the most likely token) or temperature sampling. Current sampling methods often struggle to balance diversity with accuracy, especially for reasoning tasks.

**The Innovation:** The authors discovered that when LLMs generate tokens, the logits (pre-softmax scores) naturally separate into two regions:

- A "noisy" region following a Gaussian distribution (background noise)

- An "informative" region containing the actually relevant tokens


**The Solution:** Top-nσ works by:

- Identifying the maximum logit value

- Selecting tokens that are within n standard deviations (σ) of this maximum

- Only sampling from these selected tokens

- Using temperature to control sampling within this filtered set


**Key Benefits:**

- Maintains consistent performance even at high temperatures, unlike other methods

- Computationally efficient as it operates directly on logits

- Outperforms both existing sampling methods and greedy decoding on reasoning tasks

- Works particularly well for tasks requiring careful reasoning


**Results:** The method was tested on four reasoning-focused datasets and showed superior performance, especially at higher temperatures where other methods typically fail.

The paper essentially shows that by being more selective about which tokens to sample from based on their statistical properties, you can get better and more reliable results from language models, particularly for tasks that require careful reasoning.

### Motivation

Looks to be the best sampler yet, and will be a clear differentiator for llama.cpp

### Possible Implementation

See white paper: _"Top-nσ Not All Logits Are You Need"_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Top-nσ sampler #11057

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Top-nσ sampler #11057

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions