-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast Multi-ahead Attention support on AMD ROCM #977
Conversation
…ming for inference
…mark-attn-decoding add benchmark_attn_decoding from upstream xformers; run ck fw ops for decoding
…or ck-tiled integration)
…added ensure ck_decoder does not dispatch in test_attn_bias_padded
Apply the existing linters (1/n)
add rocm_ci workflow
This reverts commit 12fb41c.
@@ -0,0 +1,12 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename this file?
@@ -3,63 +3,430 @@ | |||
# This source code is licensed under the BSD license found in the | |||
# LICENSE file in the root directory of this source tree. | |||
|
|||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this file is to implement the flash attention here rather than importing from third_party/flash-attention/flash_attn/flash_attn_triton.py
, is that right? Is there any amd specific optimization here (btw it's great to import more masks than just the lowerTriangular) or it's similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is from a patch PRed to http://github.com/ROCmSoftwarePlatform/xformers
by [email protected].
But we did not keep the PR records because we just re-build the repo.
I will re-submit PR from http://github.com/ROCmSoftwarePlatform/xformers
branch rather than from my personal repo.
Thanks
was this scrapped or something? |
This PR adds three flash-attention implementation for AMD ROCM
In more details, the following codes are added in this PR
The following scripts are used to verify the implementation
The following scripts are used to benchmark the performance of the implementation