-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enables GQA support in the prefix prefill kernels #3007
Conversation
c99806b
to
1f91dfb
Compare
The failure in CI's "Model Test" shouldn't be caused by this pull request, and I have noticed the same failure in other PR as well as main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sighingnow Awesome! Thanks for submitting the PR! Left a minor comment on a variable name.
Signed-off-by: Tao He <[email protected]>
1f91dfb
to
877deb8
Compare
@sighingnow Thanks for the fix! I will merge the PR once it passes the CI tests. |
Thank you! |
Hi @WoosukKwon, CI turns green now. (Just a polite reminding). |
Signed-off-by: Tao He <[email protected]>
@@ -17,12 +18,14 @@ | |||
|
|||
|
|||
@pytest.mark.parametrize("num_heads", NUM_HEADS) | |||
@pytest.mark.parametrize("num_queries_per_kv", NUM_HEADS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this parameter be NUM_QUERIES_PER_KV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #3246.
Signed-off-by: Tao He <[email protected]>
No description provided.