[Bugfix] Fix bug in detokenizer.py #8112

cafeii · 2024-09-03T11:56:09Z

Fix the bug where the length of prev_tokens and next_iter_tokens in detokenizer.py would grow exponentially in some cases.

The original code assign the reference of next_iter_tokens to prev_tokens directly, making them point to a same object. When prev_tokens is extended by next_iter_tokens, both of their length will be doubled, which makes the length grows exponentially.

ps: I didn't find any issue that directly point out this bug.

Fix the bug when some cases the length of prev_tokens and next_iter_tokens grow exponentially

github-actions · 2024-09-03T11:56:23Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

cafeii · 2024-09-03T12:30:28Z

/ready

zifeitong · 2024-09-03T20:58:36Z

Do you have a reproducer?

#5919 is almost the same.

#6223 is also related.

cafeii · 2024-09-04T06:05:37Z

reproduce.zip

vllm: 0.5.1
transformers 4.45.0.dev0
tokenizers 0.19.1

If successfully reproduced, it will slow down at the 8th sample, and completely stuck at 9th sample.
The bug's behavior is similar to #5872, but I didn't encounter the same bug as #5872 while reproducing it.
#6223 can also fix the bug.
#5919 is almost the same as this PR.

njhill · 2024-10-22T23:19:01Z

Addressed by #5919

cafeii added 2 commits September 3, 2024 18:17

Fix bug in detokenizer.py

94db730

Fix the bug when some cases the length of prev_tokens and next_iter_tokens grow exponentially

Run yapf and ruff

26fbeb2

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2024

njhill closed this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix bug in detokenizer.py #8112

[Bugfix] Fix bug in detokenizer.py #8112

cafeii commented Sep 3, 2024

github-actions bot commented Sep 3, 2024

cafeii commented Sep 3, 2024

zifeitong commented Sep 3, 2024

cafeii commented Sep 4, 2024 •

edited

Loading

njhill commented Oct 22, 2024

[Bugfix] Fix bug in detokenizer.py #8112

[Bugfix] Fix bug in detokenizer.py #8112

Conversation

cafeii commented Sep 3, 2024

github-actions bot commented Sep 3, 2024

cafeii commented Sep 3, 2024

zifeitong commented Sep 3, 2024

cafeii commented Sep 4, 2024 • edited Loading

njhill commented Oct 22, 2024

cafeii commented Sep 4, 2024 •

edited

Loading