Add sampling support to group beam search #38653
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #38268
Feature Description
This PR implements the feature request to add sampling capabilities (e.g., Top-K, Top-P, temperature) to Group Beam Search, which was previously a purely greedy algorithm.
Problem
Currently,
_group_beam_searchin theGenerationMixinis implemented as a deterministic process. After applying the diversity penalty to the logits, it always selects the highest-probability tokens usingtorch.topk. This prevents users from leveraging the creative and diverse outputs that stochastic sampling methods provide, which is especially useful for tasks like biological sequence or code generation.Solution
This implementation modifies
_group_beam_searchby adding a conditional path that is triggered whengeneration_config.do_sample=True. The new sampling path includes the following logic:LogitsProcessors (including theForcedDiversityLogitsProcessor) and then applies theLogitsWarpers (for Temperature, Top-K, Top-P) to the scores.min()of what thebeam_scorerrequires and the number of tokens available after warping, preventing potentialtorch.multinomialerrors.torch.multinomialto stochastically sample candidate tokens from the resulting probability distribution.Additionally, the validation check in
generation/configuration_utils.pythat previously raised aValueErrorfordo_sample=Truewith group beam search has been removed to enable this new feature.Testing
The feature has been tested locally by running
model.generatewithdo_sample=Trueand various sampling parameters (temperature,top_k,top_p). The tests confirm that:--- Generating with Greedy Group Beam Search ---
Setting
pad_token_idtoeos_token_id:50256 for open-end generation.Greedy Outputs:
0: The best way to learn about large language models is to learn about the language.
1: The best way to learn about large language models is to learn about the language.
2: The best way to learn about large language models is to look at a few examples of how to use them.
The first step is to look at a few examples of how to use them. The second step is to look at a few examples of how to use them. The third step
3: The best way to learn about large language models is to look at a few examples of how to use them.
The first step is to look at a few examples of how to use them. The first step is to look at a few examples of how to use them. The second step
--- Generating with Sampling Group Beam Search ---
Sampling Outputs:
0: The best way to learn about large language models is to look at a few simple examples of how a language can be used to understand languages. For example, a simple example can look at a simple example of how a language can be used to understand languages. For example, a simple example can look at
1: The best way to learn about large language models is to find new ways to work around them.
When you start making your own languages, you should be careful not to think about how you are doing this.
You should not be just focusing on the tools that you
2: The best way to learn about large language models is to look at a few simple examples of how a language can be used to understand languages. For example, a simple example can look at a simple example of how an interpreter can be used to understand languages.
3: The best way to learn about large language models is to find new ways to work around them.
When you start making your own languages, you should be careful not to think about how you are doing this.
You should not be just focusing on the tools that are
--- Generating with MORE RANDOM Sampling Group Beam Search ---
More Random Sampling Outputs:
0: The best way to learn about large language models is through the research paper published in Psychological Science .
I was so lucky (because everyone else I was lucky to be with has been with) that it was very helpful to do some small tasks so I couldn't stop feeling inspired by
1: The best way to learn about large language models is through the research paper published in Psychological Science .
I was so lucky (because everyone else I was lucky to be with has been with) that it was very helpful to do some small tasks so I couldn't stop feeling very good
2: The best way to learn about large language models is to take a look at a few simple language modeling tutorials you can be sure you’ll learn a lot about the languages and frameworks that you’ll be used to writing them. These tutorials can be found in The Cucumber, Python
3: The best way to learn about large language models is to take a look at a few simple language modeling tutorials you can be sure you’ll learn a lot about the languages and frameworks that you’ll be used to writing them. These tutorials can be found in The A Language Modeler blog