Fast Multi-ahead Attention support on AMD ROCM #977

qianfengz · 2024-02-07T14:08:26Z

This PR adds three flash-attention implementation for AMD ROCM

Generic FMHA forward based on composable_kernel kernel components
decoder FMHA forward directly implemented in HIP kernel
triton FMHA forward operation based on triton

In more details, the following codes are added in this PR

Xformers Operator and its C++ implementation for Generic FMHA forward as well as the underlying composable_kernel_tiled submodule

xformers/ops/fmha/ck.py
xformers/csrc/attention/hip_fmha/
thirty_party/composable_kernel_tiled/
Xformers Operator and its C++ implementation for decoder FMHA forward

xformers/ops/fmha/ck_decoder.py, ck_splitk.py
xformers/csrc/attention/hip_fmha/
Xformers Operator for triton FMHA forward

xformers/ops/fmha/triton.py

The following scripts are used to verify the implementation

#> pytest tests/test_mem_eff_attention.py::test_forward
#> pytest tests/test_mem_eff_attention.py::test_mqa_forwrd
#> pytest tests/test_mem_eff_attention.py::test_decoder
#> pytest tests/test_mem_eff_attention.py::test_splitk_decoder
#> pytest  tests/test_mem_eff_attention.py::test_splitk_reference
#> pytest tests/test_mem_eff_attention.py::test_triton_splitk_decoder

The following scripts are used to benchmark the performance of the implementation

#> python xformers/benchmarks/benchmark_mem_eff_attention.py
#> python xformers/benchmarks/benchmark_mem_eff_attention_mqa.py
#> python xformers/benchmarks/benchmark_mem_eff_attn_decoder.py
#> python xformers/benchmarks/benchmark_attn_decoding.py

…led kernels

…ming for inference

…decoding

…mark-attn-decoding add benchmark_attn_decoding from upstream xformers; run ck fw ops for decoding

…or ck-tiled integration)

… fp16

…rmers into ck-tiled-fa

…or fp16

…branch

…odules

…added ensure ck_decoder does not dispatch in test_attn_bias_padded

Apply the existing linters (1/n)

add rocm_ci workflow

This reverts commit 12fb41c.

xw285cornell · 2024-02-07T18:46:15Z

xformers/csrc/attention/hip_fmha/instances_tiled/\

@@ -0,0 +1,12 @@
+/*


rename this file?

xw285cornell · 2024-02-07T19:32:39Z

xformers/ops/fmha/triton.py

@@ -3,63 +3,430 @@
 # This source code is licensed under the BSD license found in the
 # LICENSE file in the root directory of this source tree.

+"""


IIUC, this file is to implement the flash attention here rather than importing from third_party/flash-attention/flash_attn/flash_attn_triton.py, is that right? Is there any amd specific optimization here (btw it's great to import more masks than just the lowerTriangular) or it's similar?

Yes, this is from a patch PRed to http://github.com/ROCmSoftwarePlatform/xformers by [email protected].

But we did not keep the PR records because we just re-build the repo.

I will re-submit PR from http://github.com/ROCmSoftwarePlatform/xformers branch rather than from my personal repo.

Thanks

HinaHyugaHime · 2024-04-11T21:09:15Z

was this scrapped or something?

qianfengz and others added 30 commits November 20, 2023 19:01

Add https://github.com/asroy/ck_tile.git as submodule for using ck-ti…

e7e83c8

…led kernels

Create codes structure and change to setup.py to use ck-tiled program…

dd3aeab

…ming for inference

add benchmark_attn_decoding from upstream xformers; run ck fw op for …

5b54bf9

…decoding

support None bias for ck_decoder and update benchmark

e2dd08f

improve benchmark results printing

4b711be

fix Mkv when bias is none for ck decoder

7497514

Merge pull request facebookresearch#2 from ROCmSoftwarePlatform/bench…

0573c4f

…mark-attn-decoding add benchmark_attn_decoding from upstream xformers; run ck fw ops for decoding

Remove composable_kernel_tiled for easy access (use ck-tiled branch f…

75a95fd

…or ck-tiled integration)

Remove third_party/composable_kernel_tiled

0b495ce

Tiny fix in setup.py

29843e6

Tiny fix in setup.py

5310738

Add initial implementation of using ck-tiled FA for batched infer for…

d75a181

… fp16

Add HIP_CALL_CHECK to the fmha utility header

e5d7f7a

Tiny fix to the including

2ee3780

Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/xfo…

04f6f32

…rmers into ck-tiled-fa

Add implementation of using ck-tiled FA for grouped infer with bias f…

a34bf6d

…or fp16

Remove the using of has_attn_bias as template for ck-tiled infer

17ca15e

Add clang-format file to control clang-format-10

0cf0d3d

Update to have ck-tiled group mode pass the unit-tests

00a4070

Add runtime setting for NeedPadding for ck-tiled batched infer

dd67c06

Split NeedPadding into MNeedPadding and NNeedPadding

c3ddb79

Add temporary scripts for ck-tiled verification and benchmarking

aebe8ea

Update to benchmark_mem_eff_attention_ck_tiled.py

95aed6d

Synchronize with latest feature update from feature/fmah-pad-support …

25dbca9

…branch

Fix bug in ck-tiled grouped-mode C++ extension

516f2ed

Synchronize with latest feature update from feature/fmah-pad-support …

af6964d

…branch

Synchronize the latest third_party/composable_kernel and update .gitm…

ee53b83

…odules

Add license declaration and re-format with clang-format-10

a816112

Update to tests/test_forward_ck_tiled.py

bbdb8e7

Merge branch 'develop' into ck-tiled-fa

b47beb7

tenpercent and others added 15 commits February 5, 2024 19:27

ensure ck_decoder does not dispatch

389dfb4

Add disable_on_rocm on some test scripts

f8d9043

Merge branch 'ck-tiled-fa' into develop

78df6a9

Update to test_mem_eff_attention.py

6dae63c

Merge branch 'ck-tiled-fa' into develop

a7ed88c

Merge pull request facebookresearch#16 from ROCm/fix_test_attn_bias_p…

20e178a

…added ensure ck_decoder does not dispatch in test_attn_bias_padded

apply isort

0624c92

apply black

b8ebf08

fix flake8 suggestions

3b33c5d

add license headers and reapply black

0a9c933

Merge pull request facebookresearch#17 from ROCm/linters

47367a4

Apply the existing linters (1/n)

Merge pull request facebookresearch#10 from ROCm/enable-ci

fb46611

add rocm_ci workflow

Tiny update to rocm_ci.yml

28d3672

Add conditional compiling for cuda-depending codes in ROCM

12fb41c

Update to benchmark scripts

a9d83c6

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm labels Feb 7, 2024

qianfengz added 7 commits February 7, 2024 14:28

Rename the one script file

9ab3831

Revert "Add conditional compiling for cuda-depending codes in ROCM"

243dc6a

This reverts commit 12fb41c.

Update to scripts

3240ba1

Change and add readme for tests and benchmarks

0c51af1

Remove the stuffs for supporting old ck

f36c93b

Remove old composable_kernel from submodule list

9e4582d

Remove folder third_party/composable_kernel

356cafd

xw285cornell reviewed Feb 7, 2024

View reviewed changes

xformers/csrc/attention/hip_fmha/instances_tiled/\

@@ -0,0 +1,12 @@

/*

Copy link

Contributor

xw285cornell Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this file?

xw285cornell reviewed Feb 7, 2024

View reviewed changes

Merge branch 'develop' into dev_to_upstream

8415b00

qianfengz closed this Feb 8, 2024

qianfengz deleted the dev_to_upstream branch February 8, 2024 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast Multi-ahead Attention support on AMD ROCM #977

Fast Multi-ahead Attention support on AMD ROCM #977

qianfengz commented Feb 7, 2024 •

edited

Loading

xw285cornell Feb 7, 2024

xw285cornell Feb 7, 2024

qianfengz Feb 8, 2024

HinaHyugaHime commented Apr 11, 2024

Fast Multi-ahead Attention support on AMD ROCM #977

Fast Multi-ahead Attention support on AMD ROCM #977

Conversation

qianfengz commented Feb 7, 2024 • edited Loading

This PR adds three flash-attention implementation for AMD ROCM

In more details, the following codes are added in this PR

The following scripts are used to verify the implementation

The following scripts are used to benchmark the performance of the implementation

xw285cornell Feb 7, 2024

Choose a reason for hiding this comment

xw285cornell Feb 7, 2024

Choose a reason for hiding this comment

qianfengz Feb 8, 2024

Choose a reason for hiding this comment

HinaHyugaHime commented Apr 11, 2024

qianfengz commented Feb 7, 2024 •

edited

Loading