-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] min_p
sampling parameter
#1745
Comments
@irexyc Please put this feature in the work list |
Wondering if there's any chance this could get implemented soon? Currently supported in vLLM, SGLang, llama.cpp, text-generation-webui, with increasing usage across the community. Seems to basically a "free" intelligence boost without hurting creativity. It basically allows to drop off tokens after any sudden/large "cliff"/drop in the probability distribution. To be clear, this isn't a small improvement - it has a non-trivial impact on output quality. |
Hi, @josephrocca |
Thanks! I've tested this, and there are no issues as far as I can tell, so I think this can now be closed. If I come across any issues I'll re-open. To test it via the official docker image, I had to add lmdeploy/lmdeploy/serve/openai/api_server.py Line 610 in e2aa4bd
And here: lmdeploy/lmdeploy/serve/openai/protocol.py Line 254 in e2aa4bd
|
Motivation
The
min_p
sampling parameter is becoming quite popular. It's conceptually simple and "makes sense", and (at least anecdotally, according to opinions of many model fine-tuners and users in the LocalLlama community) it tends to perform better than the usualtop_p
+top_k
approach. You can see the readmes of HF repositories of many new model finetunes/merges recommend to usemin_p
instead oftop_p
andtop_k
.Related resources
So e.g. a
min_p
of 0.07 means that if a token's probability is less than 7% of the size of the highest-probability token, it will be disqualified. Amin_p
of 0.5 would mean that if a token's probability is not at least half the size of the highest-probability token, then it is disqualified. Said another way,min_p
allows you to set a minimum fraction of the most likely token's probability, else the token cannot be sampled.Please see the above links for more info.
The text was updated successfully, but these errors were encountered: