Implement Min P as a sampler option in HF loaders #4449

kalomaze · 2023-11-02T16:10:04Z

Checklist:

[✔️] I have read the Contributing guidelines.

A backport of the Min P sampler option that was merged into llama.cpp recently. I think this is a suitable alternative to accomodate for the flaws of both Top P (Top P doesn't work well if there's not good concentration of high probability for top tokens) and Top K (can limit diversity unnecessarily), and it has gained a very positive reception.

kalomaze · 2023-11-02T16:10:50Z

Technically the min_tokens_to_keep is not being properly followed here, even if that option usually goes unused, which probably needs to be changed before it gets merged.

kalomaze · 2023-11-02T16:29:29Z

Should be good to merge now.

EDIT: Somehow, Temperature is affecting how this chooses from the lists of considered tokens, but in llama.cpp, this would typically be measured and ran before Temperature which comes last, so the fact it's impacting the measurement means the HF samplers are coming after temp randomization, I believe.
That is potentially worrying since I assumed the standard was to apply temperature last? Though this might just be a design decision that hasn't been discussed. I know in koboldcpp you can customize a sampler order. Do we have that at all?

oobabooga · 2023-11-02T16:59:27Z

I was about to add min p manually since it looks cool. Thanks for the PR.

About the order, temperature does come before top p/top k in the transformers library:

https://github.com/huggingface/transformers/blob/acc394c4f5e1283c19783581790b3dc3105a3697/src/transformers/generation/utils.py#L825

I couldn't find any information as to how OpenAI does it.

Personally, I think that it makes sense to use temperature first.

kalomaze · 2023-11-02T17:00:43Z

I was about to add min p manually since it looks cool. Thanks for the PR.

About the order, temperature does come before top p/top k in the transformers library:

https://github.com/huggingface/transformers/blob/acc394c4f5e1283c19783581790b3dc3105a3697/src/transformers/generation/utils.py#L825

I couldn't find any information as to how OpenAI does it.

Personally, I think that it makes sense to use temperature first.

I can see arguments for both applying it last like how it is done in llama.cpp, and applying it like how it's done 'officially', so I think having this configurable / customizable to the end user would be helpful. Is that possible without changing how the transformers library functions?

Btw this is how it looks in GPT2, temp comes first there:

kalomaze · 2023-11-02T17:04:21Z

Changing how the sampler order logic works is out of scope for this PR though, so I'm ready for this to be merged in the meantime

BadisG · 2023-11-03T12:37:18Z

I think you should put the default min_p at 0 instead of 1, having it at 1 makes everything else useless because it's the equivalent to top_k = 1

kalomaze · 2023-11-03T13:52:43Z

I think you should put the default min_p at 0 instead of 1, having it at 1 makes everything else useless because it's the equivalent to top_k = 1

Is it not 0 by default? I thought I set it to that

oobabooga · 2023-11-03T15:15:08Z

Should be fixed now 4537853

BadisG · 2023-11-03T15:24:35Z

It's still at min_p = 1 by default for me

oobabooga · 2023-11-03T15:26:22Z

I messed up again, should be good after 7f9c1cb

BadisG · 2023-11-03T15:45:04Z

All right it's working now, thanks!

BadisG · 2023-11-03T21:56:05Z

I've also noticed that min_p and presence_penalty don't mix well, from time to time my generation stops abruptly because min_p removes some tokens and presence_penalty then removes the remaining tokens, leaving nothing to go on. I think this can be fixed by reversing the order of the samplers, presence_penalty should be applied before min_p. Maybe we should have the choice of sampler order, that would be a good feature to add, a bit like KoboldAI actually.

commit e18a046 Author: kabachuha <[email protected]> Date: Sat Nov 4 22:12:51 2023 +0300 fix openai extension not working because of absent new defaults (oobabooga#4477) commit b7a409e Merge: b5c5304 fb3bd02 Author: oobabooga <[email protected]> Date: Sat Nov 4 15:04:43 2023 -0300 Merge pull request oobabooga#4476 from oobabooga/dev Merge dev branch commit fb3bd02 Author: oobabooga <[email protected]> Date: Sat Nov 4 11:02:24 2023 -0700 Update docs commit 1d8c7c1 Author: oobabooga <[email protected]> Date: Sat Nov 4 11:01:15 2023 -0700 Update docs commit b5c5304 Merge: 262f8ae 40f7f37 Author: oobabooga <[email protected]> Date: Sat Nov 4 14:19:55 2023 -0300 Merge pull request oobabooga#4475 from oobabooga/dev Merge dev branch commit 40f7f37 Author: oobabooga <[email protected]> Date: Sat Nov 4 10:12:06 2023 -0700 Update requirements commit 2081f43 Author: Orang <[email protected]> Date: Sun Nov 5 00:00:24 2023 +0700 Bump transformers to 4.35.* (oobabooga#4474) commit 4766a57 Author: feng lui <[email protected]> Date: Sun Nov 5 00:59:33 2023 +0800 transformers: add use_flash_attention_2 option (oobabooga#4373) commit add3593 Author: wouter van der plas <[email protected]> Date: Sat Nov 4 17:41:42 2023 +0100 fixed two links in the ui (oobabooga#4452) commit cfbd108 Author: Casper <[email protected]> Date: Sat Nov 4 17:09:41 2023 +0100 Bump AWQ to 0.1.6 (oobabooga#4470) commit aa5d671 Author: oobabooga <[email protected]> Date: Sat Nov 4 13:09:07 2023 -0300 Add temperature_last parameter (oobabooga#4472) commit 1ab8700 Author: oobabooga <[email protected]> Date: Fri Nov 3 17:38:19 2023 -0700 Change frequency/presence penalty ranges commit 45fcb60 Author: oobabooga <[email protected]> Date: Fri Nov 3 11:29:31 2023 -0700 Make truncation_length_max apply to max_seq_len/n_ctx commit 7f9c1cb Author: oobabooga <[email protected]> Date: Fri Nov 3 08:25:22 2023 -0700 Change min_p default to 0.0 commit 4537853 Author: oobabooga <[email protected]> Date: Fri Nov 3 08:13:50 2023 -0700 Change min_p default to 1.0 commit 367e5e6 Author: kalomaze <[email protected]> Date: Thu Nov 2 14:32:51 2023 -0500 Implement Min P as a sampler option in HF loaders (oobabooga#4449) commit fcb7017 Author: oobabooga <[email protected]> Date: Thu Nov 2 12:24:09 2023 -0700 Remove a checkbox commit fdcaa95 Author: Julien Chaumond <[email protected]> Date: Thu Nov 2 20:20:54 2023 +0100 transformers: Add a flag to force load from safetensors (oobabooga#4450) commit c065547 Author: oobabooga <[email protected]> Date: Thu Nov 2 11:23:04 2023 -0700 Add cache_8bit option commit 42f8163 Merge: 77abd9b a56ef2a Author: oobabooga <[email protected]> Date: Thu Nov 2 11:09:26 2023 -0700 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev commit 77abd9b Author: oobabooga <[email protected]> Date: Thu Nov 2 08:19:42 2023 -0700 Add no_flash_attn option commit a56ef2a Author: Julien Chaumond <[email protected]> Date: Thu Nov 2 18:07:08 2023 +0100 make torch.load a bit safer (oobabooga#4448) commit deba039 Author: deevis <[email protected]> Date: Tue Oct 31 22:51:00 2023 -0600 (fix): OpenOrca-Platypus2 models should use correct instruction_template and custom_stopping_strings (oobabooga#4435) commit aaf726d Author: Mehran Ziadloo <[email protected]> Date: Tue Oct 31 21:29:57 2023 -0700 Updating the shared settings object when loading a model (oobabooga#4425) commit 9bd0724 Author: oobabooga <[email protected]> Date: Tue Oct 31 20:57:56 2023 -0700 Change frequency/presence penalty ranges commit 6b7fa45 Author: Orang <[email protected]> Date: Wed Nov 1 05:12:14 2023 +0700 Update exllamav2 version (oobabooga#4417) commit 41e159e Author: Casper <[email protected]> Date: Tue Oct 31 23:11:22 2023 +0100 Bump AutoAWQ to v0.1.5 (oobabooga#4410) commit 0707ed7 Author: Meheret <[email protected]> Date: Wed Nov 1 01:09:05 2023 +0300 updated wiki link (oobabooga#4415) commit 262f8ae Author: oobabooga <[email protected]> Date: Fri Oct 27 06:49:14 2023 -0700 Use default gr.Dataframe for evaluation table commit f481ce3 Author: James Braza <[email protected]> Date: Thu Oct 26 21:02:28 2023 -0700 Adding `platform_system` to `autoawq` (oobabooga#4390) commit af98587 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri Oct 27 00:46:16 2023 -0300 Update accelerate requirement from ==0.23.* to ==0.24.* (oobabooga#4400) commit 839a87b Author: oobabooga <[email protected]> Date: Thu Oct 26 20:26:25 2023 -0700 Fix is_ccl_available & is_xpu_available imports commit 778a010 Author: Abhilash Majumder <[email protected]> Date: Fri Oct 27 08:09:51 2023 +0530 Intel Gpu support initialization (oobabooga#4340) commit 317e2c8 Author: GuizzyQC <[email protected]> Date: Thu Oct 26 22:03:21 2023 -0400 sd_api_pictures: fix Gradio warning message regarding custom value (oobabooga#4391) commit 92b2f57 Author: oobabooga <[email protected]> Date: Thu Oct 26 18:57:32 2023 -0700 Minor metadata bug fix (second attempt) commit 2d97897 Author: oobabooga <[email protected]> Date: Wed Oct 25 11:21:18 2023 -0700 Don't install flash-attention on windows + cuda 11 commit 0ced78f Author: LightningDragon <[email protected]> Date: Wed Oct 25 09:15:34 2023 -0600 Replace hashlib.sha256 with hashlib.file_digest so we don't need to load entire files into ram before hashing them. (oobabooga#4383) commit 72f6fc6 Author: tdrussell <[email protected]> Date: Wed Oct 25 10:10:28 2023 -0500 Rename additive_repetition_penalty to presence_penalty, add frequency_penalty (oobabooga#4376) commit ef1489c Author: oobabooga <[email protected]> Date: Mon Oct 23 20:45:43 2023 -0700 Remove unused parameter in AutoAWQ commit 1edf321 Author: oobabooga <[email protected]> Date: Mon Oct 23 13:09:03 2023 -0700 Lint commit 280ae72 Author: oobabooga <[email protected]> Date: Mon Oct 23 13:07:17 2023 -0700 Organize commit 49e5eec Merge: 82c11be 4bc4113 Author: oobabooga <[email protected]> Date: Mon Oct 23 12:54:05 2023 -0700 Merge remote-tracking branch 'refs/remotes/origin/main' commit 82c11be Author: oobabooga <[email protected]> Date: Mon Oct 23 12:49:07 2023 -0700 Update 04 - Model Tab.md commit 306d764 Author: oobabooga <[email protected]> Date: Mon Oct 23 12:46:24 2023 -0700 Minor metadata bug fix commit 4bc4113 Author: adrianfiedler <[email protected]> Date: Mon Oct 23 19:09:57 2023 +0200 Fix broken links (oobabooga#4367) --------- Co-authored-by: oobabooga <[email protected]> commit 92691ee Author: oobabooga <[email protected]> Date: Mon Oct 23 09:57:44 2023 -0700 Disable trust_remote_code by default

kalomaze · 2023-11-11T05:12:04Z

I've also noticed that min_p and presence_penalty don't mix well, from time to time my generation stops abruptly because min_p removes some tokens and presence_penalty then removes the remaining tokens, leaving nothing to go on. I think this can be fixed by reversing the order of the samplers, presence_penalty should be applied before min_p. Maybe we should have the choice of sampler order, that would be a good feature to add, a bit like KoboldAI actually.

Presence penalty... removes the tokens? I thought the rep pen (and associated variations of it) are biasing the logits and not actually truncating them, that's weird. And in koboldcpp rep pen comes first, so it being applied after the other truncation samplers is also foreign to me. I think the default kobold sampler order should be adapted + customizable

ariez-xyz · 2023-11-13T14:46:55Z

As far as I can tell, minP is missing from the generation parameters panel when llama.cpp is selected as a loader, though it works with llamacpp_HF. Could it be added to the llama.cpp settings as well?

Initial test implementation for HF samplers

ecb2cfa

kalomaze added 2 commits November 2, 2023 11:17

Change to a more granular step selection

3c728d2

Add support for min_tokens_to_keep arg

bb5398d

kalomaze changed the title ~~Implement Min P as a sampler option in HF sampler hijack~~ Implement Min P as a sampler option in HF loaders Nov 2, 2023

Minor changes

2e6af68

oobabooga merged commit 367e5e6 into oobabooga:dev Nov 2, 2023

esmeetu mentioned this pull request Nov 13, 2023

Support Min P Sampler vllm-project/vllm#1642

Merged

ZackBradshaw pushed a commit to 0pen-Sourcerers/text-generation-webui that referenced this pull request Jan 17, 2024

Implement Min P as a sampler option in HF loaders (oobabooga#4449)

08e0552

josephrocca mentioned this pull request Jun 8, 2024

[Feature] min_p sampling parameter InternLM/lmdeploy#1745

Closed

81549361 mentioned this pull request Aug 13, 2024

[Feature] support min_p sampling sgl-project/sglang#1071

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Min P as a sampler option in HF loaders #4449

Implement Min P as a sampler option in HF loaders #4449

kalomaze commented Nov 2, 2023 •

edited

Loading

kalomaze commented Nov 2, 2023 •

edited

Loading

kalomaze commented Nov 2, 2023 •

edited

Loading

oobabooga commented Nov 2, 2023

kalomaze commented Nov 2, 2023 •

edited

Loading

kalomaze commented Nov 2, 2023

BadisG commented Nov 3, 2023

kalomaze commented Nov 3, 2023

oobabooga commented Nov 3, 2023

BadisG commented Nov 3, 2023

oobabooga commented Nov 3, 2023

BadisG commented Nov 3, 2023

BadisG commented Nov 3, 2023

kalomaze commented Nov 11, 2023 •

edited

Loading

ariez-xyz commented Nov 13, 2023

Implement Min P as a sampler option in HF loaders #4449

Implement Min P as a sampler option in HF loaders #4449

Conversation

kalomaze commented Nov 2, 2023 • edited Loading

Checklist:

kalomaze commented Nov 2, 2023 • edited Loading

kalomaze commented Nov 2, 2023 • edited Loading

oobabooga commented Nov 2, 2023

kalomaze commented Nov 2, 2023 • edited Loading

kalomaze commented Nov 2, 2023

BadisG commented Nov 3, 2023

kalomaze commented Nov 3, 2023

oobabooga commented Nov 3, 2023

BadisG commented Nov 3, 2023

oobabooga commented Nov 3, 2023

BadisG commented Nov 3, 2023

BadisG commented Nov 3, 2023

kalomaze commented Nov 11, 2023 • edited Loading

ariez-xyz commented Nov 13, 2023

kalomaze commented Nov 2, 2023 •

edited

Loading

kalomaze commented Nov 2, 2023 •

edited

Loading

kalomaze commented Nov 2, 2023 •

edited

Loading

kalomaze commented Nov 2, 2023 •

edited

Loading

kalomaze commented Nov 11, 2023 •

edited

Loading