Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] min_p sampling parameter #1745

Closed
josephrocca opened this issue Jun 8, 2024 · 4 comments
Closed

[Feature] min_p sampling parameter #1745

josephrocca opened this issue Jun 8, 2024 · 4 comments
Assignees

Comments

@josephrocca
Copy link

josephrocca commented Jun 8, 2024

Motivation

The min_p sampling parameter is becoming quite popular. It's conceptually simple and "makes sense", and (at least anecdotally, according to opinions of many model fine-tuners and users in the LocalLlama community) it tends to perform better than the usual top_p+top_k approach. You can see the readmes of HF repositories of many new model finetunes/merges recommend to use min_p instead of top_p and top_k.

Related resources

min_p: Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.

So e.g. a min_p of 0.07 means that if a token's probability is less than 7% of the size of the highest-probability token, it will be disqualified. A min_p of 0.5 would mean that if a token's probability is not at least half the size of the highest-probability token, then it is disqualified. Said another way, min_p allows you to set a minimum fraction of the most likely token's probability, else the token cannot be sampled.

Please see the above links for more info.

image

@lvhan028
Copy link
Collaborator

@irexyc Please put this feature in the work list

@josephrocca
Copy link
Author

josephrocca commented Aug 31, 2024

Wondering if there's any chance this could get implemented soon? Currently supported in vLLM, SGLang, llama.cpp, text-generation-webui, with increasing usage across the community. Seems to basically a "free" intelligence boost without hurting creativity.

It basically allows to drop off tokens after any sudden/large "cliff"/drop in the probability distribution. To be clear, this isn't a small improvement - it has a non-trivial impact on output quality.

@lvhan028
Copy link
Collaborator

lvhan028 commented Sep 2, 2024

Hi, @josephrocca
@irexyc used to work it out in PR #1966
But this PR involves other features and improvements, making it hard to review.
So, @irexyc is splitting the PR into smaller ones.
I think min_p will be supported soon.
Stay tuned.

@josephrocca
Copy link
Author

Thanks! I've tested this, and there are no issues as far as I can tell, so I think this can now be closed. If I come across any issues I'll re-open. To test it via the official docker image, I had to add min_p as an additional field here:

top_p=request.top_p,

And here:

seed: Optional[int] = None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants