-
Notifications
You must be signed in to change notification settings - Fork 625
Description
Motivation
The min_p sampling parameter is becoming quite popular. It's conceptually simple and "makes sense", and (at least anecdotally, according to opinions of many model fine-tuners and users in the LocalLlama community) it tends to perform better than the usual top_p+top_k approach. You can see the readmes of HF repositories of many new model finetunes/merges recommend to use min_p instead of top_p and top_k.
Related resources
min_p: Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
So e.g. a min_p of 0.07 means that if a token's probability is less than 7% of the size of the highest-probability token, it will be disqualified. A min_p of 0.5 would mean that if a token's probability is not at least half the size of the highest-probability token, then it is disqualified. Said another way, min_p allows you to set a minimum fraction of the most likely token's probability, else the token cannot be sampled.
- Support Min P Sampler vllm-project/vllm#1642
- Implement Min P as a sampler option in HF loaders oobabooga/text-generation-webui#4449
- Min P sampler implementation [alternative to Top P/Top K] ggml-org/llama.cpp#3841
- (Edit - SGLang recently added it:) Support min-p sampling sgl-project/sglang#1167
Please see the above links for more info.
