flux does not work on MPS devices #9047

bghira · 2024-08-02T01:36:32Z

Describe the bug

    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Reproduction

import torch
from diffusers import  FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16, revision='refs/pr/1')
#pipe.enable_model_cpu_offload()
pipe.to(device='mps')

prompt = "A cat holding a sign that says hello world"
out = pipe(
    prompt=prompt, 
    guidance_scale=0., 
    height=768, 
    width=1360, 
    num_inference_steps=4, 
    max_sequence_length=256,
).images[0]
out.save("image.png")

it also doesn't work with cpu offload.

Logs

scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

System Info

Git master

Who can help?

@sayakpaul

The text was updated successfully, but these errors were encountered:

bghira · 2024-08-02T01:41:57Z

can't switch to fp32 😢

bghira · 2024-08-02T01:56:13Z

@pcuenca i've been looking at workarounds and there's really nothing, and this model is too big to run on CPU, it just never really completes the first step.

sayakpaul · 2024-08-02T03:00:34Z

No idea how to get around this problem :(

bghira · 2024-08-02T03:13:04Z

it also doesn't work on ROCm, as the dimensions of the operations overflow the ROCm kernel limits, so, it has to run layer-wise and takes about 2 minutes for one image

bghira · 2024-08-02T03:14:36Z

maybe @rromb or @pesser have some ideas

pcuenca · 2024-08-02T08:13:16Z

Great investigation @bghira! It's a bit surprising that it degrades so much with float32. Also unfortunate:

RuntimeError: "arange_mps" not implemented for 'BFloat16'

Vargol · 2024-08-02T10:03:22Z

Has anyone tried running just the arrange on the CPU for MPS if its supported and pushing the results back on the GPU, @bghira when you said flux didn't run on the CPU is that what you meant or were you referring to running the whole model ?

Nevermind had a proper look at the code now and can that's a load of rubbish :-)

mgierschdev · 2024-08-03T07:08:00Z

in my case:
RuntimeError: MPS backend out of memory (MPS allocated: 81.54 GB, other allocations: 384.00 KB, max allowed: 81.60 GB). Tried to allocate 72.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

any idea ? on how to run on a Mac M2

bghira · 2024-08-03T13:26:35Z

you cannot run it at all on any mac series but especially not m1 as it doesnt even have bf16 in hardware

Vargol · 2024-08-04T13:09:14Z

can't switch to fp32 😢

Try it with torch 2.3.1

bghira · 2024-08-04T13:15:57Z

i am on 2.4 for bf16 and its fixes

Vargol · 2024-08-04T13:32:47Z

If only there was a way you could have two python environments at once <joke> , but it shows that the noisey image from float32 isn't a fundamental issue with flux

Vargol · 2024-08-04T14:58:19Z

Interesting, if you monkey patch the rope function to move pos to the CPU you still get noise with 2.4.0 with both float32 and float64, the fp64 runs fine on CPU with 2.3.1.

import torch
from diffusers import  FluxPipeline
import diffusers

_flux_rope = diffusers.models.transformers.transformer_flux.rope
def new_flux_rope(pos: torch.Tensor, dim: int, theta: int) -> torch.Tensor:
    assert dim % 2 == 0, "The dimension must be even."

    if pos.device.type == "mps":
        print("I got called")
        return _flux_rope(pos.to("cpu"), dim, theta).to(device=pos.device)
    else:
        print("I should not be called")
        return _flux_rope(pos, dim, theta)

diffusers.models.transformers.transformer_flux.rope = new_flux_rope

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", revision='refs/pr/1',  torch_dtype=torch.bfloat16).to("mps")

prompt = "A cat holding a sign that says hello world"
out = pipe(
     prompt=prompt,
     guidance_scale=0.,
     height=1024,
     width=1024,
     num_inference_steps=4,
     max_sequence_length=256,
).images[0]
out.save("flux_image.png")

bghira · 2024-08-04T21:43:00Z

:/ the trainer i'm using requires torch 2.4, and the model is basically useless to Apple users without torch 2.4 as that is also required to run it in lower precision levels

Vargol · 2024-08-05T07:30:45Z

If you need 2.4.0 to work raise an issue over on the PyTorch Repo

bghira · 2024-08-05T10:43:25Z

then it will never be solved as the pytorch team often simply ignores mps issues. thanks for the suggestion.

mgierschdev · 2024-08-05T12:14:08Z

I will be supported by WIP, difussion kit argmaxinc/DiffusionKit#11 , upvote the topic if interested

bghira · 2024-08-05T12:28:12Z

that is totally different - they are making use of MLX, not Pytorch+MPS.

mgierschdev · 2024-08-05T12:29:46Z

You are correct but it will ultimately run on Apple devices, more importantly efficiently

bghira · 2024-08-05T12:54:58Z

yes... but it has nothing to do with Diffusers and remains to be seen whether it works with the correct outputs on this platform - once it's in DiffusionKit, we still don't have any way to use it in the Diffusers pipeline ecosystem.

bghira · 2024-08-05T12:56:37Z

CoreML may be efficient on the surface, but last I tried only supported square images and every custom model has to be converted manually, which will take hours for Flux.D or Flux.S - it already took hours to convert SDXL models on a 128G M3 Max. it's going to need a lot of system memory to quantise Flux using CoreML.

i just don't see it as a very useful thing - it's more like a toy

Vargol · 2024-08-06T09:17:11Z

On the ComfyUI equivalent issue, someone is suggesting the it works with the torch nightlies on the beta version of macOS 15.

comfyanonymous/ComfyUI#4165 (comment)

I'm not running it so can't confirm if its true to not, I do know that is at least some macOS 15 code in Torch (or was assuming its not been rolled back) so there is hope

AaronWard · 2024-08-06T09:55:09Z

On M3 Macbook pro using the _flux_rope hack with bfloat16, the model is returning only grainy results.

Flux fp16 inference fix #9097

Saw this PR on diffusers with a potential fix. haven't tried it myself yet.

Vargol · 2024-08-06T10:00:44Z

On M3 Macbook pro using the _flux_rope hack with bfloat16, the model is returning only grainy results.

Flux fp16 inference fix #9097

Saw this PR on diffusers with a potential fix. haven't tried it myself yet.

What version of pytorch are you using ?
If you read the rest of the issue you'll see that that the noise image is due to issues with PyTorch 2.4 on MacOS 14

bghira · 2024-08-07T05:07:27Z

well the same issue occurs with pytorch nightly on MacOS 14. i don't really think upgrading to a beta OS release is the way to resolve it, but that's good to know.

Vargol · 2024-08-07T07:21:35Z

@AaronWard I've tried the fp16 fix, and while it fixes running the model for inference on float16, it still gives a noisy image as the end result in torch 2.4.0.

It would be nice if someone with proper torch skills could get to the bottom of this but I suspect we're going to have to wait for MacOS 15.

bghira · 2024-08-09T10:23:02Z

i've updated to macos 15:

Vargol · 2024-08-09T10:25:10Z

With a PyTorch nightly ?

bghira · 2024-08-09T10:31:09Z

yes

bghira · 2024-08-09T10:32:03Z

also, pytorch 2.3.1 has about a 30% speed reduction vs 2.4.1 on MPS

the decorator syntax added for quantize_symmetric and quantize_affine are only supported on 2.4+ but that version is not working for MPS, producing grainy images. Use the old syntax and allow lower version of torch, so MPS users can install torch 2.3.1 and get models like FLUX working huggingface/diffusers#9047

hvaara · 2024-08-15T17:15:36Z

float64 is not supported on MPS. #9133 proposes to fix that issue.

The noisy output image is a separate bug in PyTorch. Follow pytorch/pytorch#133520 for updates.

bghira · 2024-08-15T17:23:34Z

is fp64 a hw limitation? or just MPS limitation? i guess i could check metal docs..

bghira · 2024-08-15T17:24:04Z

https://github.com/philipturner/metal-float64

ah jeez

Vargol · 2024-08-22T07:44:18Z

Seems to be working with diffusers from git main and a torch nightly (from today at least torch-2.5.0.dev20240821) and macOS 14.6.1 without any hacks.

Performance has tanked for me though from 80s/i to 130 s/i, that could be small change as I'm swapping a Gb or two bit of memory once macOS has swapped out the t5 model.

bghira · 2024-08-22T12:51:12Z

torch nightly has some real performance regressions since they refactored the sdpa backends for mps

bauerwer · 2024-08-22T18:45:13Z

agree, I am seeing a fairly big performance degradation on nightly torch as well (on MPS)

github-actions · 2024-09-16T15:03:31Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-09-16T15:06:03Z

We recently added support for this. So, inference, should at least work. Closing, hence. Feel free to reopen.

hvaara · 2024-09-16T15:09:20Z

xref #9133 #9074

Vargol · 2024-09-20T13:53:54Z

Release 0.30.3 seems to have the old version of the flux code with the torch.float64 reference in the rope version was that expected ?

(Diffusers) M3iMac:Diffusers davidburnett$ pip show Diffusers
Name: diffusers
Version: 0.30.3
Summary: State-of-the-art diffusion in PyTorch and JAX.
Home-page: https://github.com/huggingface/diffusers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/diffusers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: /Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages
Requires: filelock, huggingface-hub, importlib-metadata, numpy, Pillow, regex, requests, safetensors
Required-by:

  File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 65, in <listcomp>
    [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 41, in rope
    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

sayakpaul · 2024-09-20T14:24:35Z

Cc: @yiyixuxu @a-r-r-o-w ^.

bghira added the bug Something isn't working label Aug 2, 2024

cocktailpeanut mentioned this issue Aug 9, 2024

Support torch>=2.3.0 huggingface/optimum-quanto#276

Closed

fursund mentioned this issue Aug 14, 2024

Add Flux inpainting and Flux Img2Img #9135

Merged

5 tasks

Teriks mentioned this issue Aug 28, 2024

Flux.1 and OSX with M1 xhinker/sd_embed#14

Open

github-actions bot added the stale Issues that haven't received updates label Sep 16, 2024

sayakpaul closed this as completed Sep 16, 2024

flux does not work on MPS devices #9047

flux does not work on MPS devices #9047

Comments

bghira commented Aug 2, 2024 • edited Loading

Describe the bug

Reproduction

Logs

System Info

Who can help?

bghira commented Aug 2, 2024

bghira commented Aug 2, 2024

sayakpaul commented Aug 2, 2024

bghira commented Aug 2, 2024

bghira commented Aug 2, 2024

pcuenca commented Aug 2, 2024

Vargol commented Aug 2, 2024 • edited Loading

mgierschdev commented Aug 3, 2024

bghira commented Aug 3, 2024

Vargol commented Aug 4, 2024

bghira commented Aug 4, 2024

Vargol commented Aug 4, 2024 • edited Loading

Vargol commented Aug 4, 2024

bghira commented Aug 4, 2024 • edited Loading

Vargol commented Aug 5, 2024

bghira commented Aug 5, 2024

mgierschdev commented Aug 5, 2024 • edited Loading

bghira commented Aug 5, 2024

mgierschdev commented Aug 5, 2024 • edited Loading

bghira commented Aug 5, 2024

bghira commented Aug 5, 2024

Vargol commented Aug 6, 2024

AaronWard commented Aug 6, 2024

Vargol commented Aug 6, 2024 • edited Loading

bghira commented Aug 7, 2024

Vargol commented Aug 7, 2024

bghira commented Aug 9, 2024

Vargol commented Aug 9, 2024

bghira commented Aug 9, 2024

bghira commented Aug 9, 2024

hvaara commented Aug 15, 2024

bghira commented Aug 15, 2024

bghira commented Aug 15, 2024

Vargol commented Aug 22, 2024 • edited Loading

bghira commented Aug 22, 2024

bauerwer commented Aug 22, 2024

github-actions bot commented Sep 16, 2024

sayakpaul commented Sep 16, 2024

hvaara commented Sep 16, 2024

Vargol commented Sep 20, 2024

sayakpaul commented Sep 20, 2024

bghira commented Aug 2, 2024 •

edited

Loading

Vargol commented Aug 2, 2024 •

edited

Loading

Vargol commented Aug 4, 2024 •

edited

Loading

bghira commented Aug 4, 2024 •

edited

Loading

mgierschdev commented Aug 5, 2024 •

edited

Loading

mgierschdev commented Aug 5, 2024 •

edited

Loading

Vargol commented Aug 6, 2024 •

edited

Loading

Vargol commented Aug 22, 2024 •

edited

Loading