Enables GQA support in the prefix prefill kernels #3007

sighingnow · 2024-02-23T09:45:08Z

No description provided.

sighingnow · 2024-02-23T12:35:41Z

The failure in CI's "Model Test" shouldn't be caused by this pull request, and I have noticed the same failure in other PR as well as main.

WoosukKwon

@sighingnow Awesome! Thanks for submitting the PR! Left a minor comment on a variable name.

vllm/model_executor/layers/triton_kernel/prefix_prefill.py

Signed-off-by: Tao He <[email protected]>

WoosukKwon · 2024-02-27T06:34:11Z

@sighingnow Thanks for the fix! I will merge the PR once it passes the CI tests.

sighingnow · 2024-02-27T06:42:06Z

@sighingnow Thanks for the fix! I will merge the PR once it passes the CI tests.

Thank you!

sighingnow · 2024-02-27T08:30:25Z

@sighingnow Thanks for the fix! I will merge the PR once it passes the CI tests.

Hi @WoosukKwon, CI turns green now. (Just a polite reminding).

Signed-off-by: Tao He <[email protected]>

UranusSeven · 2024-03-07T02:10:34Z

tests/kernels/test_prefix_prefill.py

@@ -17,12 +18,14 @@


 @pytest.mark.parametrize("num_heads", NUM_HEADS)
+@pytest.mark.parametrize("num_queries_per_kv", NUM_HEADS)


Should this parameter be NUM_QUERIES_PER_KV?

Fixed in #3246.

Signed-off-by: Tao He <[email protected]>

sighingnow mentioned this pull request Feb 23, 2024

GQA models have not supported prefix caching #2873

Closed

sighingnow force-pushed the ht/prefix-gqa branch from c99806b to 1f91dfb Compare February 23, 2024 09:52

sighingnow mentioned this pull request Feb 23, 2024

Introduce flash-attn (>= 2.5.0). #3010

Closed

sighingnow mentioned this pull request Feb 25, 2024

Introduce speculative decoding with draft models to vLLM #3029

Closed

WoosukKwon self-requested a review February 26, 2024 19:46

WoosukKwon approved these changes Feb 26, 2024

View reviewed changes

vllm/model_executor/layers/triton_kernel/prefix_prefill.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/triton_kernel/prefix_prefill.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/triton_kernel/prefix_prefill.py Outdated Show resolved Hide resolved

Fixes GQA support in prefix prefill kernels

877deb8

Signed-off-by: Tao He <[email protected]>

sighingnow force-pushed the ht/prefix-gqa branch from 1f91dfb to 877deb8 Compare February 27, 2024 06:21

WoosukKwon merged commit 71bcaf9 into vllm-project:main Feb 27, 2024
21 checks passed

sighingnow deleted the ht/prefix-gqa branch February 27, 2024 14:06

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Enable GQA support in the prefix prefill kernels (vllm-project#3007)

7bd5b89

Signed-off-by: Tao He <[email protected]>

UranusSeven reviewed Mar 7, 2024

View reviewed changes

sighingnow mentioned this pull request Mar 7, 2024

Fixes the incorrect argument in the prefix-prefill test cases #3246

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Enable GQA support in the prefix prefill kernels (vllm-project#3007)

3b847e9

Signed-off-by: Tao He <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enables GQA support in the prefix prefill kernels #3007

Enables GQA support in the prefix prefill kernels #3007

sighingnow commented Feb 23, 2024

sighingnow commented Feb 23, 2024 •

edited

Loading

WoosukKwon left a comment

WoosukKwon commented Feb 27, 2024

sighingnow commented Feb 27, 2024

sighingnow commented Feb 27, 2024

UranusSeven Mar 7, 2024

sighingnow Mar 7, 2024

		@@ -17,12 +18,14 @@


		@pytest.mark.parametrize("num_heads", NUM_HEADS)
		@pytest.mark.parametrize("num_queries_per_kv", NUM_HEADS)

Enables GQA support in the prefix prefill kernels #3007

Enables GQA support in the prefix prefill kernels #3007

Conversation

sighingnow commented Feb 23, 2024

sighingnow commented Feb 23, 2024 • edited Loading

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon commented Feb 27, 2024

sighingnow commented Feb 27, 2024

sighingnow commented Feb 27, 2024

UranusSeven Mar 7, 2024

Choose a reason for hiding this comment

sighingnow Mar 7, 2024

Choose a reason for hiding this comment

sighingnow commented Feb 23, 2024 •

edited

Loading