Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLUX Issue | MPS framework doesn't support float64 #4165

Open
alexgenovese opened this issue Aug 1, 2024 · 76 comments
Open

FLUX Issue | MPS framework doesn't support float64 #4165

alexgenovese opened this issue Aug 1, 2024 · 76 comments
Labels
MacOS MPS device related issues Potential Bug User is reporting a bug. This should be tested.

Comments

@alexgenovese
Copy link

Expected Behavior

Run the inference

Actual Behavior

After 273.31 seconds, it throws an exception

Steps to Reproduce

Upload the example workflow for DEV version https://comfyanonymous.github.io/ComfyUI_examples/flux/

Debug Logs

!!! Exception during processing!!! Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Traceback (most recent call last):
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy_extras/nodes_custom_sampler.py", line 612, in sample
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/k_diffusion/sampling.py", line 143, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-TiledDiffusion/.patches.py", line 4, in calc_cond_batch
    return calc_cond_batch_original_tiled_diffusion_91e66834(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 228, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py", line 64, in apply_model_uncond_cleanup_wrapper
    return orig_apply_model(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/model_base.py", line 121, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 135, in forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 112, in forward_orig
    pe = self.pe_embedder(ids)
         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/layers.py", line 21, in forward
    [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/math.py", line 16, in rope
    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Other

No response

@alexgenovese alexgenovese added the Potential Bug User is reporting a bug. This should be tested. label Aug 1, 2024
@comfyanonymous
Copy link
Owner

48eb139

can you check if this fixes it.

@mhale1
Copy link

mhale1 commented Aug 1, 2024

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

@tombearx
Copy link

tombearx commented Aug 1, 2024

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

How much RAM do you have?
For some reason both original and fp8 models are taking around 40+gb. Is it the same for you?

@mhale1
Copy link

mhale1 commented Aug 2, 2024

@tombearx I have a 64 GB M1 Mac and a 16 GB 3080 on my Windows machine. Use the Mac more at work so was trying there first.

@ghogan42
Copy link

ghogan42 commented Aug 2, 2024

It probably won't help fix it. But when I enable preview, I can see that as the image is generating, it is adding new stripes to the top of the image and the actual image may be shifting down by a corresponding amount.

flux_on_mac_m3_max

I also am running on an M3 Max with 128GB ram. Flux won't run at 8-bit at all, comfy gives an error. The T5 model runs at 8 or 16 but that doesn't help with this issue. I updated pytorch to the current daily build of 2.5.0 wich also did not help.

@brkirch
Copy link

brkirch commented Aug 2, 2024

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors):
ComfyUI_00104_

@twalderman
Copy link

this workflow is working on my m3/128
https://civitai.com/models/617060/comfyui-workflow-for-flux-simple

@QueryType
Copy link

QueryType commented Aug 2, 2024

OK guys i pruned the weights, theyre now 11GB and no quality loss, it loads up faster, takes way less space in VRAM... Not sure why they were not relased pruned this way. They are loaded in 8bit still tho, i believe should be in 16, can fp16 be enabled in loader as well ? Cause when i tried to add fp16 on my own, i think it loaded as default and generation was very slow... compared to 8.

class UNETLoader: @classmethod def INPUT_TYPES(s): return {"required": { "unet_name": (folder_paths.get_filename_list("unet"), ), "weight_dtype": (["default", "fp16", "fp8_e4m3fn", "fp8_e5m2"],) }} RETURN_TYPES = ("MODEL",) FUNCTION = "load_unet"

CATEGORY = "advanced/loaders"

def load_unet(self, unet_name, weight_dtype):
    weight_dtype = {"default": None, 
                    "fp16": torch.float16,
                    "fp8_e4m3fn": torch.float8_e4m3fn, 
                    "fp8_e5m2": torch.float8_e4m3fn}[weight_dtype]
    unet_path = folder_paths.get_full_path("unet", unet_name)
    model = comfy.sd.load_unet(unet_path, dtype=weight_dtype)
    return (model,)

Can you explain how to prune it? Where to add? Sorry if it is noob question.

** Sorry, I git pulled and checked the code. All clear. Thanks!

@QueryType
Copy link

QueryType commented Aug 2, 2024

Well, first shot did not work. I am on torch 2.3.1, Mac M2, 24GB. I loaded the schnell, fp8_e4m3fn. As is seen it does not use MPS and triggered a 5GB swap. I think I will wait for fixes to flow in.

Requested to load Flux Loading 1 new model python(4803) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 0%| | 0/4 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0%| | 0/4 [00:04<?, ?it/s]
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.
Traceback (most recent call last):
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 612, in sample
samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 716, in sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 695, in inner_sample
samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 600, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/utils/contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/k_diffusion/sampling.py", line 143, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 299, in call
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 682, in call
return self.predict_noise(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 685, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 279, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 228, in calc_cond_batch
output = model.apply_model(input_x, timestep
, **c).chunk(batch_chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/model_base.py", line 121, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ldm/flux/model.py", line 143, in forward
out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ldm/flux/model.py", line 101, in forward_orig
img = self.img_in(img)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 63, in forward
return self.forward_comfy_cast_weights(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 58, in forward_comfy_cast_weights
weight, bias = cast_bias_weight(self, input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 39, in cast_bias_weight
bias = cast_to(s.bias, dtype, device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 24, in cast_to
return weight.to(device=device, dtype=dtype, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

Prompt executed in 218.28 seconds`

@ghogan42
Copy link

ghogan42 commented Aug 2, 2024

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation.

Yep. This is the way. Downgrading to these versions fixes generation for me on my m3 max based macbook.

@mhale1
Copy link

mhale1 commented Aug 2, 2024

Still no luck yet on my M1 Max even after the torch downgrades.
I take that back. Just pulled latest from this morning (just the clip_l encoder change?), and that combined with the earlier torch downgrade did fix it.

@twalderman
Copy link

the latest MPS nightly is working for me.

@QueryType
Copy link

QueryType commented Aug 3, 2024

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

@RefractAI
Copy link

the latest MPS nightly is working for me.

Nightly is still broken for me. 2.3 downgrade works.

@Adreitz
Copy link

Adreitz commented Aug 3, 2024

I tried latest nightly. It "works" when using the normal cfgguider node, but is extremely blurry. Using basic guider + flux guidance node leads to noise.
ComfyUI_00002_
ComfyUI_00004_
ComfyUI_00005_

[Edit]
Confirmed that downgraded torch does work, though you need basic guider + flux guidance node. Cfgguider node still produces blurry output.
ComfyUI_00013_
ComfyUI_00014_
ComfyUI_00015_
ComfyUI_00016_
(Image pairs differ in scheduler between euler and bosh3 (custom ODE scheduler).)

@twalderman
Copy link

Has anyone seen value to the new guider for flux? If so I will downgrade to try it. With the nightly I'm getting nice output with guidance of 1.

@tombearx
Copy link

tombearx commented Aug 3, 2024

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

@Adreitz
Copy link

Adreitz commented Aug 3, 2024

@twalderman I just tested and there might be something wrong with the guidance. I'm not seeing any difference between scale 1.0 and scale 4.5. Literally zero, when I subtract one image from the other. Nevermind, comfy messed up somehow. How exactly did you get things working with torch nightlies?

@twalderman
Copy link

I didnt do anything unusual. I tested with the nightly and had no issues so I didnt revert back again. I have been generating images all day.

@Adreitz
Copy link

Adreitz commented Aug 3, 2024

@twalderman Weird. What OS version are you using?

Here is an example of the differences you could expect from changing the guidance scale (1.0 - 4.0 in steps of 0.5; 4.5 is above; all using bosh3 sampler).

ComfyUI_00021_
ComfyUI_00023_
ComfyUI_00025_
ComfyUI_00027_
ComfyUI_00029_
ComfyUI_00031_
ComfyUI_00033_

@RainbowBull
Copy link

RainbowBull commented Aug 4, 2024

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

@tombearx
Copy link

tombearx commented Aug 4, 2024

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

I used workflow from previous picture. I have around 90-100s/it probably because bf16 is not supported directly and model using much more ram (and swap) than it should.

@QueryType
Copy link

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

@twalderman
Copy link

@Adreitz i am using the latest sequoia beta.

@tombearx
Copy link

tombearx commented Aug 4, 2024

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

Looks like RAM issue arise due to the fact that text encoders hasn't unloaded from RAM on MPS. I opened the issue: #4201

@dreamrec
Copy link

dreamrec commented Aug 4, 2024

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors): ComfyUI_00104_

perfect solution !

@RainbowBull
Copy link

how long does it takes to generate 1 image? Mine takes 10 min

@zwqjoy
Copy link

zwqjoy commented Aug 15, 2024

downgrade torch as temp fix: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1

Not Worked for me, MacOS Sonoma14.6.1 on Macbook Pro M1 Max. 64g

still noise image.

@Adreitz
Copy link

Adreitz commented Aug 15, 2024

@rhvaara has made strides in fixing the issue and found it in pytorch.

Do you have a link to a bug or PR? I couldn't find anything.

@bghira
Copy link

bghira commented Aug 15, 2024

pytorch/pytorch#133520 (comment)

sorry i meant to come back and add a reference earlier as I was on mobile and couldn't find it either.

@hvaara
Copy link

hvaara commented Aug 15, 2024

Float64 is not supported in MPS. The same issue exists in the HF diffusers library. huggingface/diffusers#9133 proposes a fix.

@godisboy0
Copy link

well, M2 32G. I downgraded pytourch, and changed
image
to default, anything runs as expecting now...1 picture will take me ten minutes..

@godisboy0
Copy link

well, M2 32G. I downgraded pytourch, and changed image to default, anything runs as expecting now...1 picture will take me ten minutes..

well... 2400 seconds actually... I can take a snap for each generation...

@Homemaderobot
Copy link

Have Flux dev fp16 working on M1 Max 32GB. Followed these steps:

• Ensured Comfy was up to date
• Downgraded torch: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1
• Followed these instructions https://comfyanonymous.github.io/ComfyUI_examples/flux/ and used the example workflow.

Worked beautifully but for a 1024px 20 step image it took an excrutiating 1824.25 seconds

@bauerwer
Copy link

works: downgrade torch as temp fix: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1

works: torch nightly (starting with torch-2.5.0.dev20240821 from today): pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

@joneavila
Copy link
Contributor

joneavila commented Aug 22, 2024

works: downgrade torch as temp fix: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1

works: torch nightly (starting with torch-2.5.0.dev20240821 from today): pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Ah, it's working now. Using the example workflow Flux Dev fp16 took 29.26 minutes, and the Q8_0 quantized took 6.13 minutes (M2 Max, 32 GB).

@blixt
Copy link

blixt commented Aug 31, 2024

I've tried latest stable torch, I've tried torch 2.3.1, and nightly torch (20240821) but they all give this error:

TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

I'm trying to run a Flux dev FP8 model in the latest ComfyUI. Is there anything else I can try?

@blixt
Copy link

blixt commented Aug 31, 2024

Based on what I could find from other sources, MPS (used by my Apple MacBook Pro M3 Max) simply does not support FP8 in any version and BF16 or FP16 must be used.

@bghira
Copy link

bghira commented Sep 1, 2024

or int8, or GGUF

@rovo79
Copy link

rovo79 commented Sep 23, 2024

the mac application "drawthing" has found a solution, I don't know how they did it but flux works on my 16 gig MBP

I was extremely skeptical, but finally gave it a try. It works amazingly on my 16GB Mac mini.
I don't think I'll ever be able to get use to the interface, but can easily generate Flux.1 Dev there. The resulting model file is: flux_1_dev_q5p.ckpt (9.37GB). I'm not sure it ends up as a safetensor at any point.

Also, been able to import custom models into Draw Things.

@joneavila
Copy link
Contributor

joneavila commented Sep 26, 2024

or int8, or GGUF

GGUF works fine, e.g., using ComfyUI-GGUF custom nodes and flux1-dev-Q8_0.gguf.

@joneavila
Copy link
Contributor

the mac application "drawthing" has found a solution, I don't know how they did it but flux works on my 16 gig MBP

I was extremely skeptical, but finally gave it a try. It works amazingly on my 16GB Mac mini. I don't think I'll ever be able to get use to the interface, but can easily generate Flux.1 Dev there. The resulting model file is: flux_1_dev_q5p.ckpt (9.37GB). I'm not sure it ends up as a safetensor at any point.

Also, been able to import custom models into Draw Things.

I've been using Draw Things to generate images with FLUX since it's significantly faster than ComfyUI. For example, using similar settings ComfyUI might take 388 seconds whereas Draw Things takes 180 seconds. I think it's due to Draw Things' "Speed-up w/ Guidance Embed" setting, which nearly doubles the speed (like the setting description mentions). Enabling this setting disables the "Guidance Embed" setting, so I wonder if something like this can be done in ComfyUI.

@bghira
Copy link

bghira commented Sep 26, 2024

comfyUI uses pytorch for MPS support to access the GPU via shaders which is honestly a pretty awful design.

tinygrad for example uses Metal directly.

and DrawThings uses Metal directly.

using Metal gives DrawThings access to phillip turner's metal-flash-attn implementation which accelerates things and reduces memory consumption, substantially! it's the equivalent to nvidia's EFFICIENT_FLASH_ATTN for SDPA in pytorch.

MPS will probably never work as well as Metal, and it's not like I don't want it to succeed. the pytorch memory model is not built for unified architecture, so, we have no zero-copy support or any other cool stuff you can access with the 25 step ceremony of instantiating a Metal kernel to run an add op.

@azrahello
Copy link

azrahello commented Sep 26, 2024

I’m just throwing it out there… there are some interesting projects that use the MLX framework and work quite well with Flux, such as https://github.com/filipstrand/mflux , and https://github.com/argmaxinc/DiffusionKit , which are even more efficient than Draw Things (I think). To me, Draw Things is too chaotic, and I can’t be as creative as I am with ConfyUI. Those two solutions are more complex, but I’m able to generate images more efficiently. Unfortunately, though, they’re command-line based, or with Diffuser, which I just can’t figure out because it’s too complex for me. But I’m starting to think it’s the only more or less viable option after ConfyUI, which, from my point of view, would be perfect if it had better support for Apple. I’m not sure if that depends on the developers.

@twalderman
Copy link

twalderman commented Sep 26, 2024

check out https://github.com/filipstrand/mflux and build some shell scripts. It is really fast.

@bghira
Copy link

bghira commented Sep 26, 2024

comfyUI requires too heavily on pytorch. this part of things is definitely not abstracted away enough to integrate any other backend. it's a tragedy because other backends can be fairly drop-in, but will never have a chance to work with extensions and essentially requires a rewrite of all the core nodes.

you can't pass torch Tensors through different methods if you're not using torch. you'd have to have support for MLX Tensors and Tinygrad tensors, which have different arguments and... quite honestly, the entire usage flow.

for example between tinygrad and pytorch, torch.cat([tensor1, tensor2]) becomes tinygrad.Tensor.cat(tensor1, tensor2) and from this pain you can imagine how the rest goes.

and @twalderman maybe don't spam your own projects.

@twalderman
Copy link

twalderman commented Sep 30, 2024

"> and @twalderman maybe don't spam your own projects."

I wish I was able to write this. Its not my project. I have enjoyed using a fast flux on my m3 and is purely a testimonial.

@hvaara
Copy link

hvaara commented Oct 2, 2024

The issue reported by OP was fixed in 48eb139 as @comfyanonymous pointed out already in the second comment. In fact it was fixed before the issue was opened.

There are a lot of other great projects that can run FLUX models, but this issue is about ComfyUI. If you need to run your FLUX workflows in ComfyUI, please update. If the error goes away, but you get noisy images, please update your PyTorch version. No other workarounds are needed. This issue should be closed since it's fixed.

If you see novel errors, please open a new issue. If you see the same error as OP, please check that you have updated your ComfyUI version. If that doesn't work, feel free to comment on this issue.

@highfiiv
Copy link

highfiiv commented Oct 5, 2024

I believe I'm seeing the same type of error, but could be wrong.
Everything is up to date as of today Oct 5, 2024

# ComfyUI Error Report
## Error Details
- **Node Type:** SamplerCustomAdvanced
- **Exception Type:** TypeError
- **Exception Message:** Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

@bghira
Copy link

bghira commented Oct 5, 2024

it just doesn't support fp8, never will. this issue isn't about fp8 at all, but fp64 for RoPE.

@YakDriver
Copy link

I'm new so apologies in advance. Is the issue that ComfyUI is Windows-centric, the bulk of the community in that world, and will never support macOS well? My perception is either that it's slow or doesn't work. As for my own dabbling, after fighting with it for several hours, Flux and Kijai's cool Hunyuan, I'm giving up on ComfyUI permanently.

Apple M3 Max, 64 GB, Sonoma 14.7.1

TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

I've done all the upgrades, downgrades, re-grades.

pip install torch==2.3.1 torchaudio==2.3.1 torchsde==0.2.6 torchvision==0.18.1
pip install tensorflow==2.17.0 tensorflow-metal==1.1.0 numpy==1.24.3
PYTORCH_ENABLE_MPS_FALLBACK=1 python main.py --force-fp32 --fp32-unet --fp32-vae --fp32-text-enc --reserve-vram 2

@hvaara
Copy link

hvaara commented Dec 16, 2024

@YakDriver This specific error should have been fixed already. Can you please try upgrading ComfyUI? If it still doesn't work do you mind also providing your ComfyUI workflow and I can take a quick look.

@jurgenwerk
Copy link

jurgenwerk commented Jan 2, 2025

@hvaara I am also experiencing TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype. on an Apple M3 18GB. I have a fresh installation of ComfyUI (v3.10). Here's my workflow:

image

@bghira
Copy link

bghira commented Jan 2, 2025

there's zero fp8 support on Mac. you will need int8 instead.

@bghira
Copy link

bghira commented Jan 2, 2025

also 18GB probably isn't enough :[

@ltdrdata ltdrdata added the MacOS MPS device related issues label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MacOS MPS device related issues Potential Bug User is reporting a bug. This should be tested.
Projects
None yet
Development

No branches or pull requests