[Feature]: Cannot use FlashAttention backend for Volta and Turing GPUs. (but FlashAttention v1.0.9 supports Turing GPU.) #4246

tutu329 · 2024-04-21T23:10:42Z

Turing GPU can use FlashAttention v1.0.9 which can reduce use of vram significantly.

FlashAttention has no plan to support Turing GPU in FlashAttention v2 actually.
so please support FlashAttention v1.0.9. thanks a lot!

many friends having 8*2080ti need this help.

No response

No response

K-Mistele · 2024-04-23T15:52:16Z

+1 - it would be great to have flash attention support for volta GPUs

JayLiu7319 · 2024-04-23T17:20:10Z

+1, if it can support GPUs based on Turing architecture, that would be great.

epochaudio · 2024-04-24T06:03:00Z

+1

Print1n · 2024-05-22T02:38:24Z

+1, if it can support GPUs based on Turing architecture, that would be great

HelloCard · 2024-08-14T04:03:18Z

any update?

tutu329 added the feature request label Apr 21, 2024

esmeetu mentioned this issue Apr 25, 2024

[Misc] Optimize flash attention backend log #4368

Merged

WoosukKwon closed this as completed in #4368 Apr 25, 2024

Provide feedback