[Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill #7255

sergeykochetkov · 2024-08-07T08:28:31Z

No (or small) performance improvements at speculative decoding partly related to unefficient scoring. Currently vLLM uses batch_expansion which to score n speculative tokens creates batch of n+1 generation requests. Idea of current PR is to score n spec tokens by single prefill request.

github-actions · 2024-08-07T08:28:44Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

ccamacho · 2024-08-26T12:38:06Z

++

sergeykochetkov changed the title ~~initial~~ [Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill Aug 17, 2024

sergeykochetkov closed this Aug 29, 2024

sergeykochetkov force-pushed the main branch from ba95539 to f205c09 Compare August 29, 2024 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill #7255

[Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill #7255

sergeykochetkov commented Aug 7, 2024 •

edited

Loading

github-actions bot commented Aug 7, 2024

ccamacho commented Aug 26, 2024

[Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill #7255

[Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill #7255

Conversation

sergeykochetkov commented Aug 7, 2024 • edited Loading

github-actions bot commented Aug 7, 2024

ccamacho commented Aug 26, 2024

sergeykochetkov commented Aug 7, 2024 •

edited

Loading