We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turing GPU can use FlashAttention v1.0.9 which can reduce use of vram significantly.
FlashAttention has no plan to support Turing GPU in FlashAttention v2 actually. so please support FlashAttention v1.0.9. thanks a lot!
many friends having 8*2080ti need this help.
No response
The text was updated successfully, but these errors were encountered:
+1 - it would be great to have flash attention support for volta GPUs
Sorry, something went wrong.
+1, if it can support GPUs based on Turing architecture, that would be great.
+1
+1, if it can support GPUs based on Turing architecture, that would be great
any update?
Successfully merging a pull request may close this issue.
🚀 The feature, motivation and pitch
Turing GPU can use FlashAttention v1.0.9 which can reduce use of vram significantly.
FlashAttention has no plan to support Turing GPU in FlashAttention v2 actually.
so please support FlashAttention v1.0.9. thanks a lot!
many friends having 8*2080ti need this help.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: