Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speculative decoding 2/9] Multi-step worker #1

Closed
wants to merge 30 commits into from

Conversation

cadedaniel
Copy link
Owner

@cadedaniel cadedaniel commented Jan 5, 2024

Speculative decoding requires running a draft model several times. If this is orchestrated by the LLMEngine process, we incur high overhead going back and forth to the workers which hold the draft model.

To fix this, we instead allow a worker to execute multiple forward passes.

@cadedaniel cadedaniel changed the base branch from rejection-sampler to main January 5, 2024 23:46
@cadedaniel cadedaniel changed the base branch from main to rejection-sampler January 5, 2024 23:46
@cadedaniel cadedaniel changed the title Multi step worker [Speculative decoding 2/9] Multi-step worker Jan 5, 2024
@cadedaniel cadedaniel force-pushed the multi-step-worker branch 2 times, most recently from 391f76a to 4269c84 Compare January 12, 2024 00:32
@cadedaniel cadedaniel closed this Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.