-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux does not work on MPS devices #9047
Comments
@pcuenca i've been looking at workarounds and there's really nothing, and this model is too big to run on CPU, it just never really completes the first step. |
No idea how to get around this problem :( |
it also doesn't work on ROCm, as the dimensions of the operations overflow the ROCm kernel limits, so, it has to run layer-wise and takes about 2 minutes for one image |
Great investigation @bghira! It's a bit surprising that it degrades so much with
|
Nevermind had a proper look at the code now and can that's a load of rubbish :-) |
in my case: any idea ? on how to run on a Mac M2 |
you cannot run it at all on any mac series but especially not m1 as it doesnt even have bf16 in hardware |
i am on 2.4 for bf16 and its fixes |
If only there was a way you could have two python environments at once <joke> , but it shows that the noisey image from float32 isn't a fundamental issue with flux |
Interesting, if you monkey patch the rope function to move pos to the CPU you still get noise with 2.4.0 with both float32 and float64, the fp64 runs fine on CPU with 2.3.1. import torch
from diffusers import FluxPipeline
import diffusers
_flux_rope = diffusers.models.transformers.transformer_flux.rope
def new_flux_rope(pos: torch.Tensor, dim: int, theta: int) -> torch.Tensor:
assert dim % 2 == 0, "The dimension must be even."
if pos.device.type == "mps":
print("I got called")
return _flux_rope(pos.to("cpu"), dim, theta).to(device=pos.device)
else:
print("I should not be called")
return _flux_rope(pos, dim, theta)
diffusers.models.transformers.transformer_flux.rope = new_flux_rope
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", revision='refs/pr/1', torch_dtype=torch.bfloat16).to("mps")
prompt = "A cat holding a sign that says hello world"
out = pipe(
prompt=prompt,
guidance_scale=0.,
height=1024,
width=1024,
num_inference_steps=4,
max_sequence_length=256,
).images[0]
out.save("flux_image.png") |
:/ the trainer i'm using requires torch 2.4, and the model is basically useless to Apple users without torch 2.4 as that is also required to run it in lower precision levels |
If you need 2.4.0 to work raise an issue over on the PyTorch Repo |
then it will never be solved as the pytorch team often simply ignores mps issues. thanks for the suggestion. |
I will be supported by WIP, difussion kit argmaxinc/DiffusionKit#11 , upvote the topic if interested |
that is totally different - they are making use of MLX, not Pytorch+MPS. |
You are correct but it will ultimately run on Apple devices, more importantly efficiently |
yes... but it has nothing to do with Diffusers and remains to be seen whether it works with the correct outputs on this platform - once it's in DiffusionKit, we still don't have any way to use it in the Diffusers pipeline ecosystem. |
CoreML may be efficient on the surface, but last I tried only supported square images and every custom model has to be converted manually, which will take hours for Flux.D or Flux.S - it already took hours to convert SDXL models on a 128G M3 Max. it's going to need a lot of system memory to quantise Flux using CoreML. i just don't see it as a very useful thing - it's more like a toy |
On the ComfyUI equivalent issue, someone is suggesting the it works with the torch nightlies on the beta version of macOS 15. comfyanonymous/ComfyUI#4165 (comment) I'm not running it so can't confirm if its true to not, I do know that is at least some macOS 15 code in Torch (or was assuming its not been rolled back) so there is hope |
On M3 Macbook pro using the Saw this PR on |
What version of pytorch are you using ? |
@AaronWard I've tried the fp16 fix, and while it fixes running the model for inference on float16, it still gives a noisy image as the end result in torch 2.4.0. It would be nice if someone with proper torch skills could get to the bottom of this but I suspect we're going to have to wait for MacOS 15. |
With a PyTorch nightly ? |
yes |
also, pytorch 2.3.1 has about a 30% speed reduction vs 2.4.1 on MPS |
the decorator syntax added for quantize_symmetric and quantize_affine are only supported on 2.4+ but that version is not working for MPS, producing grainy images. Use the old syntax and allow lower version of torch, so MPS users can install torch 2.3.1 and get models like FLUX working huggingface/diffusers#9047
float64 is not supported on MPS. #9133 proposes to fix that issue. The noisy output image is a separate bug in PyTorch. Follow pytorch/pytorch#133520 for updates. |
is fp64 a hw limitation? or just MPS limitation? i guess i could check metal docs.. |
Seems to be working with diffusers from git main and a torch nightly (from today at least torch-2.5.0.dev20240821) and macOS 14.6.1 without any hacks. Performance has tanked for me though from 80s/i to 130 s/i, that could be small change as I'm swapping a Gb or two bit of memory once macOS has swapped out the t5 model. |
torch nightly has some real performance regressions since they refactored the sdpa backends for mps |
agree, I am seeing a fairly big performance degradation on nightly torch as well (on MPS) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
We recently added support for this. So, inference, should at least work. Closing, hence. Feel free to reopen. |
Release 0.30.3 seems to have the old version of the flux code with the torch.float64 reference in the rope version was that expected ?
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 65, in <listcomp>
[rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 41, in rope
scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. |
Cc: @yiyixuxu @a-r-r-o-w ^. |
Describe the bug
Reproduction
it also doesn't work with cpu offload.
Logs
scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
System Info
Git master
Who can help?
@sayakpaul
The text was updated successfully, but these errors were encountered: