-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add repetition_penalty aligned with huggingface #866
Conversation
8e78387
to
b48f6b0
Compare
b48f6b0
to
e428cdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am waiting for this, I need some method to make hf and vllm compat when I use penalty arguments
I have tested the repetition_penalty under greedy search mode by Llama 7b and Baichuan models. The results are all aligned with huggingface transformers. |
Co-authored-by: Dong-Yong Lee <[email protected]>
this pr solved my problem, for my llama2 model, repetition_penalty is necessary, otherwise the result is incorrect |
Great! Looking forward to testing this as my perception is that vLLM is very poor with repetition currently, and that the existing repetition penalty params (freqency_penalty etc) do little or nothing. Although part of that problem is that there's no per-request seed, something we also really need. |
Beautiful! |
Very nice |
Hey, what's wrong?I did not find the repetition_penalty attribute in |
@@ -162,30 +170,61 @@ def _apply_penalties( | |||
indices.append(i) | |||
|
|||
# Return early if all sequences have zero penalties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is misleading now?
logits[indices] -= frequency_penalties.unsqueeze(dim=1) * bin_counts | ||
presence_mask = (bin_counts > 0.0).to(dtype=logits.dtype) | ||
logits[indices] -= presence_penalties.unsqueeze(dim=1) * presence_mask | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why else
? will presence_penalty
and repetition_penalty
work together?
@WoosukKwon please check if the fix can be merged. |
@Abraham-Xu @heshuguo This PR is a bit in conflict with #1048, which tries to refactor the sampler code. @zhuohan123 Can we merge this PR before yours? |
Hi @Abraham-Xu, sorry for the late response. Could you update the PR? There are some merge conflicts because we refactored sampler in #1048. |
I want to know if |
Supported with #1424. But again, thank you for your contribution! Let us know if there is any other issue. |
Sorry for late reply, busy with working and other things recently. It is great to support repetition penalty in any way. |
In huggingface/transformers generate method:
https://github.com/huggingface/transformers/blob/ae320fa53f74cc4dfa0e4fc3c95b6129a86b0512/src/transformers/generation/utils.py#L1295
https://github.com/huggingface/transformers/blob/ae320fa53f74cc4dfa0e4fc3c95b6129a86b0512/src/transformers/generation/utils.py#L1540
https://github.com/huggingface/transformers/blob/ae320fa53f74cc4dfa0e4fc3c95b6129a86b0512/src/transformers/generation/utils.py#L2457
Specificaly, repetition penalty is a frequently used pre-process method in logits_processor. It prevents repetitions through a penalty. This penalized sampling works by discounting the scores of previously generated tokens. (https://arxiv.org/pdf/1909.05858.pdf)
I introduced the the repetition penalty pre-process to sampler.py aligned with huggingface implementation:
https://github.com/huggingface/transformers/blob/ae320fa53f74cc4dfa0e4fc3c95b6129a86b0512/src/transformers/generation/logits_process.py#L328