xFormers attention op arg #2049

takuma104 · 2023-01-20T15:24:32Z

What does this PR do?

To add the attention_op argument to enable_xformers_memory_efficient_attention(). This argument can override the op argument of memory_efficient_attention() in xFormers. It was originally written by @patil-suraj on the xformers-attention-op-arg branch, and I have added some tweaks so that it can be merged into the current main branch. A short documentation has also been added.

Usage Example:

import xformers
import xformers.ops

op = xformers.ops.MemoryEfficientAttentionFlashAttentionOp
pipe.enable_xformers_memory_efficient_attention(attention_op=op)

As an example of the application of this PR, The use of Flash Attention improves the reproducibility of Stable Diffusion image generation due to its deterministic behavior. Discussed in #1997.

I am confirming this PR code by using the following code.
https://gist.github.com/takuma104/acc9ff3809e4b259bf24b8130c021823

@patil-suraj

original code by @patil-suraj huggingface/diffusers@ae0cc0b

HuggingFaceDocBuilderDev · 2023-01-20T15:30:27Z

The documentation is not available anymore as the PR was closed or merged.

patil-suraj

Very cool, and looks good to me. Thanks a lot for working on this!

src/diffusers/pipelines/pipeline_utils.py

src/diffusers/models/modeling_utils.py

pcuenca

I agree with Patrick, a couple of examples would be nice.

Co-authored-by: Patrick von Platen <[email protected]>

takuma104 · 2023-01-23T14:06:38Z

thanks @patrickvonplaten! I've merged examples.

takuma104 · 2023-01-23T14:19:20Z

hmm... I've got the following docstring error. The ... is wrong? or something?
Error message: Cannot parse: 2:21: from xformers import ... # some attention op

src/diffusers/models/modeling_utils.py

src/diffusers/pipelines/pipeline_utils.py

patil-suraj · 2023-01-23T15:26:13Z

Hey @takuma104 , thanks for adding the examples. The docstring error was because of the wrong syntax (...), the docstrings expects a valid python example. I left suggestions for how to fix this in the comments.

Co-authored-by: Suraj Patil <[email protected]>

takuma104 · 2023-01-23T15:44:48Z

Thanks for your help, @patil-suraj ! Wow, that's a very strict code-style checker. Nice.

takuma104 · 2023-01-23T16:12:47Z

I ran the example code to make sure, Flash Attention does not work well with SD1.4, but with SD2.1. Can I change "CompVis/stable-diffusion-v1-4" to "stabilityai/stable-diffusion-2-1"?

takuma104 · 2023-01-23T17:46:52Z

Which one would be more appropriate as a code example?

It does not work as code, but as an example it is simple and understandable. The only change is "CompVis/stable-diffusion-v1-4” to model_id.

from diffusers import DiffusionPipeline
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp
pipe = DiffusionPipeline.from_pretrained(model_id).to("cuda")
pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)

Fully functional code:

import torch
from diffusers import DiffusionPipeline
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16).to("cuda")
pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
# Workaround for not accepting attention shape using VAE for Flash Attention
pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)

patil-suraj · 2023-01-24T11:12:20Z

Hey @takuma104 the second example looks great, it's always better to have fully functional examples in docs so readers can just copy-paste it to try.

Also out of curiosity, what do you mean by Flash Attention does not work well with SD1.4, think it works no ?

takuma104 · 2023-01-24T15:53:11Z

Hi @patil-suraj

Ok, thanks. I got it. I'll update it very soon.

Also out of curiosity, what do you mean by Flash Attention does not work well with SD1.4, think it works no ?

The latest xFormers show the error in detail, and when I do U-Net inference with SD1.x, I get the following.

ValueError: Operator `memory_efficient_attention` does not support inputs:
     query       : shape=(16, 256, 1, 160) (torch.float16)
     key         : shape=(16, 256, 1, 160) (torch.float16)
     value       : shape=(16, 256, 1, 160) (torch.float16)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`flshattF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 128

In SD1.x, it seems that some of the U-Net attentions' size of K in [B, M, K] doesn't fit the required. It must be less than 128.

In SD2.x, I don't know if this is intentional or not, but it seems that the size of K is 128 or less in all cases.

patil-suraj · 2023-01-24T16:25:59Z

Thanks for updating the doc!

adhikjoshi · 2023-01-31T09:20:29Z

I am getting this same error,

text2img works,

but getting this error on : img2img

i even tried to disable memory_efficient_attention,

pipe.disable_xformers_memory_efficient_attention()

But, still doesn't work on img2img,

'Operator `memory_efficient_attention` does not support inputs:
              query       : shape=(16, 25, 1, 160) (torch.float16)
              key         : shape=(16, 25, 1, 160) (torch.float16)
              value       : shape=(16, 25, 1, 160) (torch.float16)
              attn_bias   : <class \'NoneType\'>
              p           : 0.0

`flshattF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 128'

patil-suraj · 2023-01-31T10:03:31Z

@adhikjoshi Feel free to open an issue with a reproducible code snippet for this :)

tianleiwu · 2023-02-08T20:58:16Z

I also encountered same error (max(query.shape[-1] != value.shape[-1]) > 128') in T4 GPU with xformers 0.0.16 or 0.0.17.dev444 on stable diffusion 1.5 model.

Is it supported only in Amper or later GPU (with cuda compute capactity >= 80)?

takuma104 · 2023-02-09T14:04:36Z

Hi @tianleiwu,

Want to use Flash Attention? As mentioned above, SD1.x models are not supported with Flash Attention because they don't meet the dim<=128 requirement. SD 2.x models seem to be supported with Flash Attention.

According to the xFormers code, Tesla T4=sm_75 is in the minimum requirement, so Tesla T4 is supported with Flash Attention.

adhikjoshi · 2023-02-09T14:51:13Z

Hi @tianleiwu,

Want to use Flash Attention? As mentioned above, SD1.x models are not supported with Flash Attention because they don't meet the dim<=128 requirement. SD 2.x models seem to be supported with Flash Attention.

According to the xFormers code, Tesla T4=sm_75 is in the minimum requirement, so Tesla T4 is supported with Flash Attention.

Actually, it works with flash attention. I tried with all models in including 1.x and 2.x models, Works great on 3090.

Only img2img and inpainting doesn't work. Hence had to drop it.

But maybe here is solution. Gonna need to try it

#2234

@patil-suraj

* allow passing op to xFormers attention original code by @patil-suraj huggingface/diffusers@ae0cc0b * correct style by `make style` * add attention_op arg documents * add usage example to docstring Co-authored-by: Patrick von Platen <[email protected]> * add usage example to docstring Co-authored-by: Patrick von Platen <[email protected]> * code style correction by `make style` * Update docstring code to a valid python example Co-authored-by: Suraj Patil <[email protected]> * Update docstring code to a valid python example Co-authored-by: Suraj Patil <[email protected]> * style correction by `make style` * Update code exmaple to fully functional Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Suraj Patil <[email protected]>

takuma104 added 3 commits January 20, 2023 21:43

allow passing op to xFormers attention

09218c6

original code by @patil-suraj huggingface/diffusers@ae0cc0b

correct style by make style

2955504

add attention_op arg documents

b57d983

patil-suraj approved these changes Jan 20, 2023

View reviewed changes

patil-suraj requested review from patrickvonplaten and pcuenca January 20, 2023 17:44

patrickvonplaten reviewed Jan 22, 2023

View reviewed changes

src/diffusers/pipelines/pipeline_utils.py Show resolved Hide resolved

patrickvonplaten reviewed Jan 22, 2023

View reviewed changes

src/diffusers/models/modeling_utils.py Show resolved Hide resolved

pcuenca reviewed Jan 22, 2023

View reviewed changes

takuma104 and others added 2 commits January 23, 2023 23:04

add usage example to docstring

425f3f2

Co-authored-by: Patrick von Platen <[email protected]>

add usage example to docstring

e783dc0

Co-authored-by: Patrick von Platen <[email protected]>

code style correction by make style

9796e0b

patil-suraj reviewed Jan 23, 2023

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

takuma104 and others added 3 commits January 24, 2023 00:32

Update docstring code to a valid python example

410be66

Co-authored-by: Suraj Patil <[email protected]>

Update docstring code to a valid python example

1a5a352

Co-authored-by: Suraj Patil <[email protected]>

style correction by make style

02a6f7e

Merge branch 'huggingface:main' into xformers_attention_op_arg

58f1083

Update code exmaple to fully functional

1f05d4d

patil-suraj merged commit 16bb505 into huggingface:main Jan 24, 2023

LucasSloan mentioned this pull request Feb 3, 2023

Can't finetune stable diffusion with --enable_xformers_memory_efficient_attention #2234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xFormers attention op arg #2049

xFormers attention op arg #2049

takuma104 commented Jan 20, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 20, 2023 •

edited

Loading

patil-suraj left a comment

pcuenca left a comment

takuma104 commented Jan 23, 2023

takuma104 commented Jan 23, 2023

patil-suraj commented Jan 23, 2023

takuma104 commented Jan 23, 2023

takuma104 commented Jan 23, 2023

takuma104 commented Jan 23, 2023

patil-suraj commented Jan 24, 2023

takuma104 commented Jan 24, 2023

patil-suraj commented Jan 24, 2023

adhikjoshi commented Jan 31, 2023 •

edited

Loading

patil-suraj commented Jan 31, 2023

tianleiwu commented Feb 8, 2023 •

edited

Loading

takuma104 commented Feb 9, 2023

adhikjoshi commented Feb 9, 2023 •

edited

Loading

xFormers attention op arg #2049

xFormers attention op arg #2049

Conversation

takuma104 commented Jan 20, 2023 • edited Loading

What does this PR do?

Usage Example:

HuggingFaceDocBuilderDev commented Jan 20, 2023 • edited Loading

patil-suraj left a comment

Choose a reason for hiding this comment

pcuenca left a comment

Choose a reason for hiding this comment

takuma104 commented Jan 23, 2023

takuma104 commented Jan 23, 2023

patil-suraj commented Jan 23, 2023

takuma104 commented Jan 23, 2023

takuma104 commented Jan 23, 2023

takuma104 commented Jan 23, 2023

patil-suraj commented Jan 24, 2023

takuma104 commented Jan 24, 2023

patil-suraj commented Jan 24, 2023

adhikjoshi commented Jan 31, 2023 • edited Loading

patil-suraj commented Jan 31, 2023

tianleiwu commented Feb 8, 2023 • edited Loading

takuma104 commented Feb 9, 2023

adhikjoshi commented Feb 9, 2023 • edited Loading

takuma104 commented Jan 20, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 20, 2023 •

edited

Loading

adhikjoshi commented Jan 31, 2023 •

edited

Loading

tianleiwu commented Feb 8, 2023 •

edited

Loading

adhikjoshi commented Feb 9, 2023 •

edited

Loading