Does vLLM support flash attention? #425
Answered
by
zhyncs
zhaoyang-star
asked this question in
Q&A
-
Flash attention is an important optimizing method but I found no flash attention impls in vLLM code base. So does vLLM support flash attention? |
Beta Was this translation helpful? Give feedback.
Answered by
zhyncs
Jul 11, 2023
Replies: 1 comment
-
vLLM use xformers's memory_efficient_attention_forward, so it makes indirect use of flash attention. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
zhaoyang-star
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
vLLM use xformers's memory_efficient_attention_forward, so it makes indirect use of flash attention.