-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix handling of stop strings and stop token ids #3672
Conversation
73766bd
to
25fe99f
Compare
This addresses the following bugs: - Stop strings ends having to align with token boundaries - Stop string not being excluded properly from output when it spans multiple tokens and include_stop_str_in_output==True - Incorrect output truncation when stopping due to a token in stop_token_ids that is a special token when skip_special_tokens==True
caea3d4
to
7e5fa65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks a lot for the tests
Thank you @dgoupil! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix.
# Check if the sequence has generated the EOS token. | ||
if ((not sampling_params.ignore_eos) | ||
and seq.get_last_token_id() == seq.eos_token_id): | ||
seq.status = SequenceStatus.FINISHED_STOPPED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to set seq.stop_reason to eos_token_id here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments/questions - Thanks for the fix!
Thanks alot for the review @ywang96! |
This addresses the following bugs:
include_stop_str_in_output==False
(primarily a problem when streaming output)stop_token_ids
that is a special token whenskip_special_tokens==True
Fixes #3574
Fixes #3572
Fixes #2577
Fixes #3026