Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick of delayed sampling #263

Merged
merged 3 commits into from
Sep 19, 2024

Conversation

tzielinski-habana
Copy link

Cherry pick of delayed sampling from habana_next. To use it, pass additional flags to vllm server:
--num-lookahead-slots 1 --use-v2-block-manager --enable-delayed-sampling

Tested on 1 Gaudi2 card on llama2 7b on a 4k offline accuracy test. Accuracy is good (>100%), observed perf improvement is ~38%.

@hsubramony hsubramony force-pushed the cherry_pick_of_delayed_sampling branch from b60cf3d to d6616c6 Compare September 10, 2024 18:31
@tzielinski-habana tzielinski-habana added the habana Issues or PRs submitted by Habana Labs label Sep 12, 2024
@tzielinski-habana tzielinski-habana merged commit 401acbd into v1.18.0 Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants