[Relax] Integrate cuDNN attention #17157

vinx13 · 2024-07-15T14:24:39Z

This integrates cuDNN attention kernels to BYOC.
A dependency of cudnn_frontend is added.
The cuDNN attention kernel supports fused qkv in BS3NH and SBN3H layouts.

cc @sunggg @masahi @yongwww @tqchen

sunggg

Thank you @vinx13 for the new addition!
Overall, looks good to me.
Would you describe high-level strategy for attention somewhere? (e.g., when to offload cudnn, cutlass, TIR, etc.)
If this is about landing machinery rather than such offloading decision, would appreciate if you can provide some recommendations.

python/tvm/contrib/cutlass/gen_tensor_op.py

vinx13 · 2024-07-15T21:45:09Z

The new attention can be applied via cudnn BYOC. The decision of which BYOC backend (cudnn, cutlass) to use is left to the users. cudnn is likely to perform better on H100 as it has specific optimizations

masahi

I remember cuDNN attention supports fp8, would be interesting to support that too.

github-actions bot requested review from masahi, sunggg, tqchen and yongwww July 15, 2024 14:25

vinx13 force-pushed the feat/cudnn-attn branch from 3bd5706 to da9210a Compare July 15, 2024 17:32

[Relax] Integrate cuDNN attention

abbcce9

vinx13 force-pushed the feat/cudnn-attn branch from da9210a to abbcce9 Compare July 15, 2024 17:52

yongwww approved these changes Jul 15, 2024

View reviewed changes

sunggg reviewed Jul 15, 2024

View reviewed changes

python/tvm/contrib/cutlass/gen_tensor_op.py Show resolved Hide resolved

vinx13 added 4 commits July 15, 2024 14:48

update cmake

37ef936

lint

74ba966

lint

8e2982e

cudnn frontend

e854fe2

masahi approved these changes Jul 16, 2024

View reviewed changes

lint

5ae9005

vinx13 force-pushed the feat/cudnn-attn branch from 87f6468 to 7750edf Compare July 16, 2024 21:33

lint

e930f9a

vinx13 force-pushed the feat/cudnn-attn branch from 7750edf to e930f9a Compare July 16, 2024 21:47

vinx13 added 2 commits July 17, 2024 16:17

fix test

79dce1c

skip test

fcddeca

vinx13 merged commit 5d5edd2 into apache:main Jul 22, 2024
18 checks passed

ysh329 mentioned this pull request Oct 16, 2024

[Release] v0.18.0 Release Candidate Notes #17468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relax] Integrate cuDNN attention #17157

[Relax] Integrate cuDNN attention #17157

vinx13 commented Jul 15, 2024

sunggg left a comment

vinx13 commented Jul 15, 2024

masahi left a comment

[Relax] Integrate cuDNN attention #17157

[Relax] Integrate cuDNN attention #17157

Conversation

vinx13 commented Jul 15, 2024

sunggg left a comment

Choose a reason for hiding this comment

vinx13 commented Jul 15, 2024

masahi left a comment

Choose a reason for hiding this comment