FLUX Issue | MPS framework doesn't support float64 #4165

alexgenovese · 2024-08-01T16:57:16Z

Expected Behavior

Run the inference

Actual Behavior

After 273.31 seconds, it throws an exception

Steps to Reproduce

Upload the example workflow for DEV version https://comfyanonymous.github.io/ComfyUI_examples/flux/

Debug Logs

!!! Exception during processing!!! Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Traceback (most recent call last):
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy_extras/nodes_custom_sampler.py", line 612, in sample
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/k_diffusion/sampling.py", line 143, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-TiledDiffusion/.patches.py", line 4, in calc_cond_batch
    return calc_cond_batch_original_tiled_diffusion_91e66834(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 228, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py", line 64, in apply_model_uncond_cleanup_wrapper
    return orig_apply_model(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/model_base.py", line 121, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 135, in forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 112, in forward_orig
    pe = self.pe_embedder(ids)
         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/layers.py", line 21, in forward
    [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/math.py", line 16, in rope
    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Other

No response

comfyanonymous · 2024-08-01T17:42:22Z

48eb139

can you check if this fixes it.

mhale1 · 2024-08-01T20:34:01Z

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

tombearx · 2024-08-01T23:56:00Z

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

How much RAM do you have?
For some reason both original and fp8 models are taking around 40+gb. Is it the same for you?

mhale1 · 2024-08-02T03:03:41Z

@tombearx I have a 64 GB M1 Mac and a 16 GB 3080 on my Windows machine. Use the Mac more at work so was trying there first.

ghogan42 · 2024-08-02T07:36:55Z

It probably won't help fix it. But when I enable preview, I can see that as the image is generating, it is adding new stripes to the top of the image and the actual image may be shifting down by a corresponding amount.

I also am running on an M3 Max with 128GB ram. Flux won't run at 8-bit at all, comfy gives an error. The T5 model runs at 8 or 16 but that doesn't help with this issue. I updated pytorch to the current daily build of 2.5.0 wich also did not help.

brkirch · 2024-08-02T09:20:50Z

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors):

twalderman · 2024-08-02T11:21:45Z

this workflow is working on my m3/128
https://civitai.com/models/617060/comfyui-workflow-for-flux-simple

QueryType · 2024-08-02T12:37:23Z

OK guys i pruned the weights, theyre now 11GB and no quality loss, it loads up faster, takes way less space in VRAM... Not sure why they were not relased pruned this way. They are loaded in 8bit still tho, i believe should be in 16, can fp16 be enabled in loader as well ? Cause when i tried to add fp16 on my own, i think it loaded as default and generation was very slow... compared to 8.

class UNETLoader: @classmethod def INPUT_TYPES(s): return {"required": { "unet_name": (folder_paths.get_filename_list("unet"), ), "weight_dtype": (["default", "fp16", "fp8_e4m3fn", "fp8_e5m2"],) }} RETURN_TYPES = ("MODEL",) FUNCTION = "load_unet"
CATEGORY = "advanced/loaders"

def load_unet(self, unet_name, weight_dtype):
    weight_dtype = {"default": None, 
                    "fp16": torch.float16,
                    "fp8_e4m3fn": torch.float8_e4m3fn, 
                    "fp8_e5m2": torch.float8_e4m3fn}[weight_dtype]
    unet_path = folder_paths.get_full_path("unet", unet_name)
    model = comfy.sd.load_unet(unet_path, dtype=weight_dtype)
    return (model,)

Can you explain how to prune it? Where to add? Sorry if it is noob question.

** Sorry, I git pulled and checked the code. All clear. Thanks!

QueryType · 2024-08-02T13:29:42Z

Well, first shot did not work. I am on torch 2.3.1, Mac M2, 24GB. I loaded the schnell, fp8_e4m3fn. As is seen it does not use MPS and triggered a 5GB swap. I think I will wait for fixes to flow in.

Requested to load Flux Loading 1 new model python(4803) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 0%| | 0/4 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0%| | 0/4 [00:04<?, ?it/s]
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.
Traceback (most recent call last):
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 612, in sample
samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 716, in sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 695, in inner_sample
samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 600, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/utils/contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/k_diffusion/sampling.py", line 143, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 299, in call
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 682, in call
return self.predict_noise(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 685, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 279, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/samplers.py", line 228, in calc_cond_batch
output = model.apply_model(input_x, timestep, **c).chunk(batch_chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/model_base.py", line 121, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ldm/flux/model.py", line 143, in forward
out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ldm/flux/model.py", line 101, in forward_orig
img = self.img_in(img)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/comfyui/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 63, in forward
return self.forward_comfy_cast_weights(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 58, in forward_comfy_cast_weights
weight, bias = cast_bias_weight(self, input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 39, in cast_bias_weight
bias = cast_to(s.bias, dtype, device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/d/apps/sdxl/comfi/ComfyUI/comfy/ops.py", line 24, in cast_to
return weight.to(device=device, dtype=dtype, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

Prompt executed in 218.28 seconds`

ghogan42 · 2024-08-02T17:42:14Z

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation.

Yep. This is the way. Downgrading to these versions fixes generation for me on my m3 max based macbook.

mhale1 · 2024-08-02T17:55:52Z

~~Still no luck yet on my M1 Max even after the torch downgrades.~~
I take that back. Just pulled latest from this morning (just the clip_l encoder change?), and that combined with the earlier torch downgrade did fix it.

twalderman · 2024-08-02T22:38:03Z

the latest MPS nightly is working for me.

QueryType · 2024-08-03T06:22:55Z

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

RefractAI · 2024-08-03T11:15:22Z

the latest MPS nightly is working for me.

Nightly is still broken for me. 2.3 downgrade works.

Adreitz · 2024-08-03T12:09:22Z

I tried latest nightly. It "works" when using the normal cfgguider node, but is extremely blurry. Using basic guider + flux guidance node leads to noise.

[Edit]
Confirmed that downgraded torch does work, though you need basic guider + flux guidance node. Cfgguider node still produces blurry output.

(Image pairs differ in scheduler between euler and bosh3 (custom ODE scheduler).)

twalderman · 2024-08-03T18:00:44Z

Has anyone seen value to the new guider for flux? If so I will downgrade to try it. With the nightly I'm getting nice output with guidance of 1.

tombearx · 2024-08-03T20:58:59Z

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

Adreitz · 2024-08-03T21:12:10Z

@twalderman ~~I just tested and there might be something wrong with the guidance. I'm not seeing any difference between scale 1.0 and scale 4.5. Literally zero, when I subtract one image from the other.~~ Nevermind, comfy messed up somehow. How exactly did you get things working with torch nightlies?

twalderman · 2024-08-03T23:37:43Z

I didnt do anything unusual. I tested with the nightly and had no issues so I didnt revert back again. I have been generating images all day.

Adreitz · 2024-08-03T23:48:31Z

@twalderman Weird. What OS version are you using?

Here is an example of the differences you could expect from changing the guidance scale (1.0 - 4.0 in steps of 0.5; 4.5 is above; all using bosh3 sampler).

RainbowBull · 2024-08-04T02:15:08Z

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

tombearx · 2024-08-04T03:40:06Z

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

I used workflow from previous picture. I have around 90-100s/it probably because bf16 is not supported directly and model using much more ram (and swap) than it should.

QueryType · 2024-08-04T12:32:02Z

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

twalderman · 2024-08-04T14:09:55Z

@Adreitz i am using the latest sequoia beta.

tombearx · 2024-08-04T14:31:14Z

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

Looks like RAM issue arise due to the fact that text encoders hasn't unloaded from RAM on MPS. I opened the issue: #4201

dreamrec · 2024-08-04T19:22:51Z

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors):

perfect solution !

RainbowBull · 2024-08-04T19:32:04Z

how long does it takes to generate 1 image? Mine takes 10 min

zwqjoy · 2024-08-15T13:28:07Z

downgrade torch as temp fix: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1

Not Worked for me, MacOS Sonoma14.6.1 on Macbook Pro M1 Max. 64g

still noise image.

Adreitz · 2024-08-15T14:15:20Z

@rhvaara has made strides in fixing the issue and found it in pytorch.

Do you have a link to a bug or PR? I couldn't find anything.

bghira · 2024-08-15T15:36:26Z

pytorch/pytorch#133520 (comment)

sorry i meant to come back and add a reference earlier as I was on mobile and couldn't find it either.

hvaara · 2024-08-15T17:13:26Z

Float64 is not supported in MPS. The same issue exists in the HF diffusers library. huggingface/diffusers#9133 proposes a fix.

godisboy0 · 2024-08-17T03:51:10Z

well, M2 32G. I downgraded pytourch, and changed

to default, anything runs as expecting now...1 picture will take me ten minutes..

godisboy0 · 2024-08-17T05:25:55Z

well, M2 32G. I downgraded pytourch, and changed to default, anything runs as expecting now...1 picture will take me ten minutes..

well... 2400 seconds actually... I can take a snap for each generation...

Homemaderobot · 2024-08-17T10:43:47Z

Have Flux dev fp16 working on M1 Max 32GB. Followed these steps:

• Ensured Comfy was up to date
• Downgraded torch: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1
• Followed these instructions https://comfyanonymous.github.io/ComfyUI_examples/flux/ and used the example workflow.

Worked beautifully but for a 1024px 20 step image it took an excrutiating 1824.25 seconds

bauerwer · 2024-08-21T13:04:17Z

works: downgrade torch as temp fix: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1

works: torch nightly (starting with torch-2.5.0.dev20240821 from today): pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

joneavila · 2024-08-22T01:16:36Z

works: downgrade torch as temp fix: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1

works: torch nightly (starting with torch-2.5.0.dev20240821 from today): pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Ah, it's working now. Using the example workflow Flux Dev fp16 took 29.26 minutes, and the Q8_0 quantized took 6.13 minutes (M2 Max, 32 GB).

blixt · 2024-08-31T10:58:03Z

I've tried latest stable torch, I've tried torch 2.3.1, and nightly torch (20240821) but they all give this error:

TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

I'm trying to run a Flux dev FP8 model in the latest ComfyUI. Is there anything else I can try?

blixt · 2024-08-31T11:35:53Z

Based on what I could find from other sources, MPS (used by my Apple MacBook Pro M3 Max) simply does not support FP8 in any version and BF16 or FP16 must be used.

bghira · 2024-09-01T00:38:10Z

or int8, or GGUF

rovo79 · 2024-09-23T10:03:31Z

the mac application "drawthing" has found a solution, I don't know how they did it but flux works on my 16 gig MBP

I was extremely skeptical, but finally gave it a try. It works amazingly on my 16GB Mac mini.
I don't think I'll ever be able to get use to the interface, but can easily generate Flux.1 Dev there. The resulting model file is: flux_1_dev_q5p.ckpt (9.37GB). I'm not sure it ends up as a safetensor at any point.

Also, been able to import custom models into Draw Things.

joneavila · 2024-09-26T17:38:17Z

or int8, or GGUF

GGUF works fine, e.g., using ComfyUI-GGUF custom nodes and flux1-dev-Q8_0.gguf.

joneavila · 2024-09-26T17:57:58Z

the mac application "drawthing" has found a solution, I don't know how they did it but flux works on my 16 gig MBP

I was extremely skeptical, but finally gave it a try. It works amazingly on my 16GB Mac mini. I don't think I'll ever be able to get use to the interface, but can easily generate Flux.1 Dev there. The resulting model file is: flux_1_dev_q5p.ckpt (9.37GB). I'm not sure it ends up as a safetensor at any point.

Also, been able to import custom models into Draw Things.

I've been using Draw Things to generate images with FLUX since it's significantly faster than ComfyUI. For example, using similar settings ComfyUI might take 388 seconds whereas Draw Things takes 180 seconds. I think it's due to Draw Things' "Speed-up w/ Guidance Embed" setting, which nearly doubles the speed (like the setting description mentions). Enabling this setting disables the "Guidance Embed" setting, so I wonder if something like this can be done in ComfyUI.

bghira · 2024-09-26T18:15:47Z

comfyUI uses pytorch for MPS support to access the GPU via shaders which is honestly a pretty awful design.

tinygrad for example uses Metal directly.

and DrawThings uses Metal directly.

using Metal gives DrawThings access to phillip turner's metal-flash-attn implementation which accelerates things and reduces memory consumption, substantially! it's the equivalent to nvidia's EFFICIENT_FLASH_ATTN for SDPA in pytorch.

MPS will probably never work as well as Metal, and it's not like I don't want it to succeed. the pytorch memory model is not built for unified architecture, so, we have no zero-copy support or any other cool stuff you can access with the 25 step ceremony of instantiating a Metal kernel to run an add op.

azrahello · 2024-09-26T18:22:49Z

I’m just throwing it out there… there are some interesting projects that use the MLX framework and work quite well with Flux, such as https://github.com/filipstrand/mflux , and https://github.com/argmaxinc/DiffusionKit , which are even more efficient than Draw Things (I think). To me, Draw Things is too chaotic, and I can’t be as creative as I am with ConfyUI. Those two solutions are more complex, but I’m able to generate images more efficiently. Unfortunately, though, they’re command-line based, or with Diffuser, which I just can’t figure out because it’s too complex for me. But I’m starting to think it’s the only more or less viable option after ConfyUI, which, from my point of view, would be perfect if it had better support for Apple. I’m not sure if that depends on the developers.

twalderman · 2024-09-26T18:29:11Z

check out https://github.com/filipstrand/mflux and build some shell scripts. It is really fast.

bghira · 2024-09-26T18:55:35Z

comfyUI requires too heavily on pytorch. this part of things is definitely not abstracted away enough to integrate any other backend. it's a tragedy because other backends can be fairly drop-in, but will never have a chance to work with extensions and essentially requires a rewrite of all the core nodes.

you can't pass torch Tensors through different methods if you're not using torch. you'd have to have support for MLX Tensors and Tinygrad tensors, which have different arguments and... quite honestly, the entire usage flow.

for example between tinygrad and pytorch, torch.cat([tensor1, tensor2]) becomes tinygrad.Tensor.cat(tensor1, tensor2) and from this pain you can imagine how the rest goes.

and @twalderman maybe don't spam your own projects.

twalderman · 2024-09-30T12:12:05Z

"> and @twalderman maybe don't spam your own projects."

I wish I was able to write this. Its not my project. I have enjoyed using a fast flux on my m3 and is purely a testimonial.

hvaara · 2024-10-02T11:25:18Z

The issue reported by OP was fixed in 48eb139 as @comfyanonymous pointed out already in the second comment. In fact it was fixed before the issue was opened.

There are a lot of other great projects that can run FLUX models, but this issue is about ComfyUI. If you need to run your FLUX workflows in ComfyUI, please update. If the error goes away, but you get noisy images, please update your PyTorch version. No other workarounds are needed. This issue should be closed since it's fixed.

If you see novel errors, please open a new issue. If you see the same error as OP, please check that you have updated your ComfyUI version. If that doesn't work, feel free to comment on this issue.

highfiiv · 2024-10-05T17:36:46Z

I believe I'm seeing the same type of error, but could be wrong.
Everything is up to date as of today Oct 5, 2024

# ComfyUI Error Report
## Error Details
- **Node Type:** SamplerCustomAdvanced
- **Exception Type:** TypeError
- **Exception Message:** Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

bghira · 2024-10-05T17:39:24Z

it just doesn't support fp8, never will. this issue isn't about fp8 at all, but fp64 for RoPE.

YakDriver · 2024-12-16T18:08:15Z

I'm new so apologies in advance. Is the issue that ComfyUI is Windows-centric, the bulk of the community in that world, and will never support macOS well? My perception is either that it's slow or doesn't work. As for my own dabbling, after fighting with it for several hours, Flux and Kijai's cool Hunyuan, I'm giving up on ComfyUI permanently.

Apple M3 Max, 64 GB, Sonoma 14.7.1

TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

I've done all the upgrades, downgrades, re-grades.

pip install torch==2.3.1 torchaudio==2.3.1 torchsde==0.2.6 torchvision==0.18.1
pip install tensorflow==2.17.0 tensorflow-metal==1.1.0 numpy==1.24.3
PYTORCH_ENABLE_MPS_FALLBACK=1 python main.py --force-fp32 --fp32-unet --fp32-vae --fp32-text-enc --reserve-vram 2

hvaara · 2024-12-16T21:12:33Z

@YakDriver This specific error should have been fixed already. Can you please try upgrading ComfyUI? If it still doesn't work do you mind also providing your ComfyUI workflow and I can take a quick look.

jurgenwerk · 2025-01-02T14:02:09Z

@hvaara I am also experiencing TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype. on an Apple M3 18GB. I have a fresh installation of ComfyUI (v3.10). Here's my workflow:

bghira · 2025-01-02T14:17:58Z

there's zero fp8 support on Mac. you will need int8 instead.

bghira · 2025-01-02T14:18:22Z

also 18GB probably isn't enough :[

alexgenovese added the Potential Bug User is reporting a bug. This should be tested. label Aug 1, 2024

RekarBotany mentioned this issue Aug 3, 2024

Image Generation Issue: Generated Images Became Abnormal Texture #4175

Open

QueryType mentioned this issue Aug 4, 2024

[FeatureRequest] support FLUX.1 text to image model ml-explore/mlx-examples#916

Closed

ThiagoSousa mentioned this issue Aug 22, 2024

XLabs Sampler Error on MacOS: Torch not compiled with CUDA enabled XLabs-AI/x-flux-comfyui#51

Open

ltdrdata added the MacOS MPS device related issues label Jan 2, 2025

FLUX Issue | MPS framework doesn't support float64 #4165

FLUX Issue | MPS framework doesn't support float64 #4165

Comments

alexgenovese commented Aug 1, 2024

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

comfyanonymous commented Aug 1, 2024

mhale1 commented Aug 1, 2024

tombearx commented Aug 1, 2024 • edited Loading

mhale1 commented Aug 2, 2024

ghogan42 commented Aug 2, 2024

brkirch commented Aug 2, 2024

twalderman commented Aug 2, 2024

QueryType commented Aug 2, 2024 • edited Loading

QueryType commented Aug 2, 2024 • edited Loading

ghogan42 commented Aug 2, 2024

mhale1 commented Aug 2, 2024 • edited Loading

twalderman commented Aug 2, 2024

QueryType commented Aug 3, 2024 • edited Loading

RefractAI commented Aug 3, 2024

Adreitz commented Aug 3, 2024 • edited Loading

twalderman commented Aug 3, 2024

tombearx commented Aug 3, 2024

Adreitz commented Aug 3, 2024 • edited Loading

twalderman commented Aug 3, 2024

Adreitz commented Aug 3, 2024

RainbowBull commented Aug 4, 2024 • edited Loading

tombearx commented Aug 4, 2024

QueryType commented Aug 4, 2024

twalderman commented Aug 4, 2024

tombearx commented Aug 4, 2024

dreamrec commented Aug 4, 2024

RainbowBull commented Aug 4, 2024

zwqjoy commented Aug 15, 2024

Adreitz commented Aug 15, 2024

bghira commented Aug 15, 2024 • edited Loading

hvaara commented Aug 15, 2024

godisboy0 commented Aug 17, 2024

godisboy0 commented Aug 17, 2024

Homemaderobot commented Aug 17, 2024

bauerwer commented Aug 21, 2024

joneavila commented Aug 22, 2024 • edited Loading

blixt commented Aug 31, 2024

blixt commented Aug 31, 2024

bghira commented Sep 1, 2024 • edited Loading

rovo79 commented Sep 23, 2024

joneavila commented Sep 26, 2024 • edited Loading

joneavila commented Sep 26, 2024

bghira commented Sep 26, 2024

azrahello commented Sep 26, 2024 • edited Loading

twalderman commented Sep 26, 2024 • edited Loading

bghira commented Sep 26, 2024

twalderman commented Sep 30, 2024 • edited Loading

hvaara commented Oct 2, 2024 • edited Loading

highfiiv commented Oct 5, 2024 • edited Loading

bghira commented Oct 5, 2024

YakDriver commented Dec 16, 2024

hvaara commented Dec 16, 2024

jurgenwerk commented Jan 2, 2025 • edited Loading

bghira commented Jan 2, 2025

bghira commented Jan 2, 2025

tombearx commented Aug 1, 2024 •

edited

Loading

QueryType commented Aug 2, 2024 •

edited

Loading

QueryType commented Aug 2, 2024 •

edited

Loading

mhale1 commented Aug 2, 2024 •

edited

Loading

QueryType commented Aug 3, 2024 •

edited

Loading

Adreitz commented Aug 3, 2024 •

edited

Loading

Adreitz commented Aug 3, 2024 •

edited

Loading

RainbowBull commented Aug 4, 2024 •

edited

Loading

bghira commented Aug 15, 2024 •

edited

Loading

joneavila commented Aug 22, 2024 •

edited

Loading

bghira commented Sep 1, 2024 •

edited

Loading

joneavila commented Sep 26, 2024 •

edited

Loading

azrahello commented Sep 26, 2024 •

edited

Loading

twalderman commented Sep 26, 2024 •

edited

Loading

twalderman commented Sep 30, 2024 •

edited

Loading

hvaara commented Oct 2, 2024 •

edited

Loading

highfiiv commented Oct 5, 2024 •

edited

Loading

jurgenwerk commented Jan 2, 2025 •

edited

Loading