support advanced attention implementations (FA3, FlashInfer, xformers, etc.) #319

feifeibear · 2024-10-25T13:26:35Z

The xdit with parallel degree>2 will execute the attention in the following line in unified SP. The exactly position to execute attention is in the ring_attention's implementation. Because USP applies the Ulysses outside of the Ring.

xDiT/xfuser/core/long_ctx_attention/ring/ring_flash_attn.py

Line 75 in f9e35f7

block_out, _, _, _, _, block_lse, _, _ = _flash_attn_forward(

antferdom · 2024-11-07T17:31:21Z

Tritonbench: [performance] Torch SDPA cuDNN backend vs FlashAttention v3 #41

feifeibear mentioned this issue Oct 25, 2024

RoadMap and Looking for Contributions #213

Open

8 tasks

feifeibear added the help wanted Extra attention is needed label Oct 25, 2024

antferdom mentioned this issue Oct 31, 2024

FLUX Hopper benchmarking #324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support advanced attention implementations (FA3, FlashInfer, xformers, etc.) #319

support advanced attention implementations (FA3, FlashInfer, xformers, etc.) #319

feifeibear commented Oct 25, 2024 •

edited

Loading

antferdom commented Nov 7, 2024

support advanced attention implementations (FA3, FlashInfer, xformers, etc.) #319

support advanced attention implementations (FA3, FlashInfer, xformers, etc.) #319

Comments

feifeibear commented Oct 25, 2024 • edited Loading

antferdom commented Nov 7, 2024

feifeibear commented Oct 25, 2024 •

edited

Loading