-
Couldn't load subscription status.
- Fork 31k
Description
Feature request
In the current generation code, group beam search is necessarily greedy. From a theoretical point of view, it is not very clear why that should be the case, since the diversity penalty is applied on the logits anyway, yielding a full distribution from which sampling can still be performed.
Motivation
I think there is a reasonable use case for such a feature: diversity beam search is very useful in particular for modalities like biological sequences which increasingly use the transformers library, but I could see it be useful as well for natural language or code, to generate diverse paths without falling to the drawbacks of greedy generation. From a more abstract point of view it is also seemingly unjustified to allow sampling for standard beam search and not for diversity beam search.
Your contribution
I am aware of the work in #30810 so don't want to disrupt but would be happy to look into it.