[Speculative decoding 2/9] Multi-step worker for draft model #2424

cadedaniel · 2024-01-12T01:30:16Z

This PR implements the worker which contains the draft model in speculative decoding. See #2188 and the speculative decoding open sourcing plan for more info. This is contributed by Anyscale.

The MultiStepWorker extends the vanilla Worker class and augments execute_model to invoke the underlying model multiple times in a single scheduling iteration. To work end-to-end in vLLM, it requires that the scheduler support scheduling >1 token per scheduling iteration. That will come in a future PR.

…m-multi-step-worker-pr

LiuXiaoxuanPKU

LGTM!

…oject#2424)

multi step worker

b20ed29

cadedaniel changed the title ~~[Speculative decoding 2/9] Multi-step worker for draft model~~ [Draft] [Speculative decoding 2/9] Multi-step worker for draft model Jan 12, 2024

fix block table size miscalculation

5ffceca

cadedaniel marked this pull request as ready for review January 17, 2024 22:50

Merge remote-tracking branch 'upstream/main' into public-vllm-upstrea…

6a22d07

…m-multi-step-worker-pr

cadedaniel changed the title ~~[Draft] [Speculative decoding 2/9] Multi-step worker for draft model~~ [Speculative decoding 2/9] Multi-step worker for draft model Jan 17, 2024

lint

0e1c3b3

cadedaniel requested a review from LiuXiaoxuanPKU January 17, 2024 23:00

LiuXiaoxuanPKU self-assigned this Jan 18, 2024

LiuXiaoxuanPKU approved these changes Jan 19, 2024

View reviewed changes

LiuXiaoxuanPKU merged commit 18bfcdd into vllm-project:main Jan 22, 2024
15 checks passed

This was referenced Jan 22, 2024

ValueError: Port could not be cast to integer value as '<function get_open_port at 0x7fc97190a7a0> #2540

Closed

Fix "Port could not be cast to integer value as <function get_open_port>" #2545

Merged

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[Speculative decoding 2/9] Multi-step worker for draft model (vllm-pr…

5e0cfd0

…oject#2424)

sighingnow mentioned this pull request Feb 25, 2024

Introduce speculative decoding with draft models to vLLM #3029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative decoding 2/9] Multi-step worker for draft model #2424

[Speculative decoding 2/9] Multi-step worker for draft model #2424

cadedaniel commented Jan 12, 2024 •

edited

Loading

LiuXiaoxuanPKU left a comment

[Speculative decoding 2/9] Multi-step worker for draft model #2424

[Speculative decoding 2/9] Multi-step worker for draft model #2424

Conversation

cadedaniel commented Jan 12, 2024 • edited Loading

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

cadedaniel commented Jan 12, 2024 •

edited

Loading