Revert "[BugFix] Fix tokenizer out of vocab size" #3740
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reverts #3685
This impacted the performance of vLLM.
this commit cause the L4 opt benchmark to drop performance
before: https://buildkite.com/vllm/ci/builds/3695
after: https://buildkite.com/vllm/ci/builds/3716
before
Avg latency: 0.446163778666687 seconds
Throughput: 21.25 requests/s, 10879.19 tokens/s
after
Avg latency: 4.369133323666698 seconds
Throughput: 0.93 requests/s, 474.10 tokens/s