Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Cannot use FlashAttention backend for Volta and Turing GPUs. (but FlashAttention v1.0.9 supports Turing GPU.) #4246

Closed
tutu329 opened this issue Apr 21, 2024 · 5 comments · Fixed by #4368

Comments

@tutu329
Copy link

tutu329 commented Apr 21, 2024

🚀 The feature, motivation and pitch

Turing GPU can use FlashAttention v1.0.9 which can reduce use of vram significantly.

FlashAttention has no plan to support Turing GPU in FlashAttention v2 actually.
so please support FlashAttention v1.0.9. thanks a lot!

many friends having 8*2080ti need this help.

Alternatives

No response

Additional context

No response

@K-Mistele
Copy link
Contributor

+1 - it would be great to have flash attention support for volta GPUs

@JayLiu7319
Copy link

+1, if it can support GPUs based on Turing architecture, that would be great.

@epochaudio
Copy link

+1

@Print1n
Copy link

Print1n commented May 22, 2024

+1, if it can support GPUs based on Turing architecture, that would be great

@HelloCard
Copy link

any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants