-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xFormers attention op arg #2049
xFormers attention op arg #2049
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, and looks good to me. Thanks a lot for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Patrick, a couple of examples would be nice.
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>
thanks @patrickvonplaten! I've merged examples. |
hmm... I've got the following docstring error. The |
Hey @takuma104 , thanks for adding the examples. The docstring error was because of the wrong syntax ( |
Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Suraj Patil <[email protected]>
Thanks for your help, @patil-suraj ! Wow, that's a very strict code-style checker. Nice. |
I ran the example code to make sure, Flash Attention does not work well with SD1.4, but with SD2.1. Can I change "CompVis/stable-diffusion-v1-4" to "stabilityai/stable-diffusion-2-1"? |
Which one would be more appropriate as a code example?
from diffusers import DiffusionPipeline
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp
pipe = DiffusionPipeline.from_pretrained(model_id).to("cuda")
pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
import torch
from diffusers import DiffusionPipeline
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16).to("cuda")
pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
# Workaround for not accepting attention shape using VAE for Flash Attention
pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None) |
Hey @takuma104 the second example looks great, it's always better to have fully functional examples in docs so readers can just copy-paste it to try. Also out of curiosity, what do you mean by Flash Attention does not work well with SD1.4, think it works no ? |
Hi @patil-suraj Ok, thanks. I got it. I'll update it very soon.
The latest xFormers show the error in detail, and when I do U-Net inference with SD1.x, I get the following.
In SD1.x, it seems that some of the U-Net attentions' size of K in [B, M, K] doesn't fit the required. It must be less than 128. In SD2.x, I don't know if this is intentional or not, but it seems that the size of K is 128 or less in all cases. |
Thanks for updating the doc! |
I am getting this same error, text2img works, but getting this error on : img2img i even tried to disable memory_efficient_attention, pipe.disable_xformers_memory_efficient_attention() But, still doesn't work on img2img,
|
@adhikjoshi Feel free to open an issue with a reproducible code snippet for this :) |
I also encountered same error (max(query.shape[-1] != value.shape[-1]) > 128') in T4 GPU with xformers 0.0.16 or 0.0.17.dev444 on stable diffusion 1.5 model. Is it supported only in Amper or later GPU (with cuda compute capactity >= 80)? |
Hi @tianleiwu, Want to use Flash Attention? As mentioned above, SD1.x models are not supported with Flash Attention because they don't meet the dim<=128 requirement. SD 2.x models seem to be supported with Flash Attention. According to the xFormers code, Tesla T4=sm_75 is in the minimum requirement, so Tesla T4 is supported with Flash Attention. |
Actually, it works with flash attention. I tried with all models in including 1.x and 2.x models, Works great on 3090. Only img2img and inpainting doesn't work. Hence had to drop it. But maybe here is solution. Gonna need to try it |
* allow passing op to xFormers attention original code by @patil-suraj huggingface/diffusers@ae0cc0b * correct style by `make style` * add attention_op arg documents * add usage example to docstring Co-authored-by: Patrick von Platen <[email protected]> * add usage example to docstring Co-authored-by: Patrick von Platen <[email protected]> * code style correction by `make style` * Update docstring code to a valid python example Co-authored-by: Suraj Patil <[email protected]> * Update docstring code to a valid python example Co-authored-by: Suraj Patil <[email protected]> * style correction by `make style` * Update code exmaple to fully functional Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Suraj Patil <[email protected]>
What does this PR do?
To add the
attention_op
argument toenable_xformers_memory_efficient_attention()
. This argument can override theop
argument ofmemory_efficient_attention()
in xFormers. It was originally written by @patil-suraj on thexformers-attention-op-arg
branch, and I have added some tweaks so that it can be merged into the current main branch. A short documentation has also been added.Usage Example:
As an example of the application of this PR, The use of Flash Attention improves the reproducibility of Stable Diffusion image generation due to its deterministic behavior. Discussed in #1997.
I am confirming this PR code by using the following code.
https://gist.github.com/takuma104/acc9ff3809e4b259bf24b8130c021823