Skip to content

Conversation

@gspeter-max
Copy link

Fixes #38268

Feature Description

This PR implements the feature request to add sampling capabilities (e.g., Top-K, Top-P, temperature) to Group Beam Search, which was previously a purely greedy algorithm.

Problem

Currently, _group_beam_search in the GenerationMixin is implemented as a deterministic process. After applying the diversity penalty to the logits, it always selects the highest-probability tokens using torch.topk. This prevents users from leveraging the creative and diverse outputs that stochastic sampling methods provide, which is especially useful for tasks like biological sequence or code generation.

Solution

This implementation modifies _group_beam_search by adding a conditional path that is triggered when generation_config.do_sample=True. The new sampling path includes the following logic:

  1. Applies All Processors & Warpers: It correctly applies all LogitsProcessors (including the ForcedDiversityLogitsProcessor) and then applies the LogitsWarpers (for Temperature, Top-K, Top-P) to the scores.
  2. Safe Candidate Selection: It safely calculates the number of candidates to sample by taking the min() of what the beam_scorer requires and the number of tokens available after warping, preventing potential torch.multinomial errors.
  3. Stochastic Sampling: It uses torch.multinomial to stochastically sample candidate tokens from the resulting probability distribution.
  4. Score Gathering: It gathers the log-scores of the sampled tokens to ensure compatibility with the rest of the beam search algorithm.

Additionally, the validation check in generation/configuration_utils.py that previously raised a ValueError for do_sample=True with group beam search has been removed to enable this new feature.

Testing

The feature has been tested locally by running model.generate with do_sample=True and various sampling parameters (temperature, top_k, top_p). The tests confirm that:

  1. The code runs without errors.
  2. The generated output is stochastic and differs from the deterministic greedy output.
  3. The generated output changes on subsequent runs, confirming that sampling is active.

--- Generating with Greedy Group Beam Search ---
Setting pad_token_id to eos_token_id:50256 for open-end generation.

Greedy Outputs:
0: The best way to learn about large language models is to learn about the language.
1: The best way to learn about large language models is to learn about the language.
2: The best way to learn about large language models is to look at a few examples of how to use them.
The first step is to look at a few examples of how to use them. The second step is to look at a few examples of how to use them. The third step
3: The best way to learn about large language models is to look at a few examples of how to use them.
The first step is to look at a few examples of how to use them. The first step is to look at a few examples of how to use them. The second step

--- Generating with Sampling Group Beam Search ---

Sampling Outputs:
0: The best way to learn about large language models is to look at a few simple examples of how a language can be used to understand languages. For example, a simple example can look at a simple example of how a language can be used to understand languages. For example, a simple example can look at
1: The best way to learn about large language models is to find new ways to work around them.
When you start making your own languages, you should be careful not to think about how you are doing this.
You should not be just focusing on the tools that you
2: The best way to learn about large language models is to look at a few simple examples of how a language can be used to understand languages. For example, a simple example can look at a simple example of how an interpreter can be used to understand languages.
3: The best way to learn about large language models is to find new ways to work around them.
When you start making your own languages, you should be careful not to think about how you are doing this.
You should not be just focusing on the tools that are

--- Generating with MORE RANDOM Sampling Group Beam Search ---

More Random Sampling Outputs:
0: The best way to learn about large language models is through the research paper published in Psychological Science .
I was so lucky (because everyone else I was lucky to be with has been with) that it was very helpful to do some small tasks so I couldn't stop feeling inspired by
1: The best way to learn about large language models is through the research paper published in Psychological Science .
I was so lucky (because everyone else I was lucky to be with has been with) that it was very helpful to do some small tasks so I couldn't stop feeling very good
2: The best way to learn about large language models is to take a look at a few simple language modeling tutorials you can be sure you’ll learn a lot about the languages and frameworks that you’ll be used to writing them. These tutorials can be found in The Cucumber, Python
3: The best way to learn about large language models is to take a look at a few simple language modeling tutorials you can be sure you’ll learn a lot about the languages and frameworks that you’ll be used to writing them. These tutorials can be found in The A Language Modeler blog

@gspeter-max gspeter-max changed the title Add sampling support to group beam search #38648 Add sampling support to group beam search Jun 7, 2025
@Rocketknight1
Copy link
Member

cc @gante

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Group beam search with sampling?

2 participants