Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update readme.md with macOS installation instructions #129

Closed
wants to merge 1 commit into from
Closed

Update readme.md with macOS installation instructions #129

wants to merge 1 commit into from

Conversation

jorge-campo
Copy link

Adds specific Mac M1/M2 installation instructions for Fooocus.

The same instructions for Linux work on Mac. The only prerequisite is the PyTorch installation, as described in the procedure.

On my M1 Mac, the installation and first run ran error-clean, and I could start generating images without any additional configuration.

Screenshot 2023-08-15 at 3 02 58@2x

@ponymushama
Copy link

ponymushama commented Aug 16, 2023

macOS does not have conda command.
Should install conda at first.

@dreamscapeai
Copy link

how many it/s can you have on the M1 pro ?

@jorge-campo
Copy link
Author

macOS does not have conda command.
Should install conda at first.

If you follow the Apple technical document in the procedure, it will instal miniconda3 from the Anaconda repo.

@jorge-campo
Copy link
Author

how many it/s can you have on the M1 pro ?

In my case, these 👇 are the statistics for the Base model + Refiner for two images using the same prompt. My computer has many other processes running in the background competing for RAM, so your mileage may vary.

Screenshot 2023-08-16 at 8 04 28@2x

@lllyasviel
Copy link
Owner

wow 8s/it - that is quite a bit waiting

@jorge-campo
Copy link
Author

wow 8s/it - that is quite a bit waiting

Yes, it is, but it's similar to what I get with ComfyUI or Automatic1111 using SDXL; SD1.5 is faster though. I don't think you can do better with M1 + SDXL (?)

I don't know what optimizations you included in Fooocus, but the image quality is vastly superior to ComfyUI or Automatic1111. Thanks for giving us the chance to play with this project! 😄

@thinkyhead
Copy link

thinkyhead commented Aug 19, 2023

Note that once Miniconda3 is installed and activated in the shell, the Linux instructions work perfectly on macOS.

@guo2048
Copy link

guo2048 commented Aug 29, 2023

After I install it locally with your steps, I got the same issue like described here. #286
the two result pictures are just empty pictures.

@yaroslav-ads
Copy link

After I install it locally with your steps, I got the same issue like described here. #286 the two result pictures are just empty pictures.

You just need to restart your computer

@eyaeya
Copy link

eyaeya commented Sep 2, 2023

It looks like I'm running into a problem with the environment, how can I get past this? I didn't see the instructions for this part in your Readme.

File "/Users/xiaoxiao/Pictures/Fooocus/modules/anisotropic.py", line 132, in adaptive_anisotropic_filter s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True) NotImplementedError: The operator 'aten::std_mean.correction' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variablePYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@jorge-campo

@kjslag
Copy link

kjslag commented Sep 6, 2023

I had to launch with PYTORCH_ENABLE_MPS_FALLBACK=1 python launch.py insead of python launch.py to deal with the same issue that @eyaeya had. After that, it ran with about 2.7s/it on an M1 Max macbook pro. But the output was just a blank color. I used the latest (as of now) Fooocus commit 09e0d1c from Sept 2 2023.

image image

@iwnfubb
Copy link

iwnfubb commented Sep 15, 2023

Works like a charm ! Is it normal the program just use too much RAM ~ 20GB is used ?

@jorge-campo
Copy link
Author

Please refer to the macOS installation guide in the README.md file.

@huameiwei-vc
Copy link

Last login: Sat Oct 14 11:02:20 on ttys000
(base) songchao@SongChaodeMacBook-Pro ~ % cd Fooocus

conda activate fooocus

python entry_with_update.py
Fast-forward merge
Update succeeded.
Python 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ]
Fooocus version: 2.1.60
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Total VRAM 16384 MB, total RAM 16384 MB
Set vram state to: SHARED
Device: mps
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
[Fooocus Smart Memory] Disabling smart memory, vram_inadequate = True, is_old_gpu_arch = True.
model_type EPS
adm 2560
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Refiner model loaded: /Users/songchao/Fooocus/models/checkpoints/sd_xl_refiner_1.0_0.9vae.safetensors
model_type EPS
adm 2816
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /Users/songchao/Fooocus/models/checkpoints/sd_xl_base_1.0_0.9vae.safetensors
Exception in thread Thread-2 (worker):
Traceback (most recent call last):
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/Users/songchao/Fooocus/modules/async_worker.py", line 18, in worker
import modules.default_pipeline as pipeline
File "/Users/songchao/Fooocus/modules/default_pipeline.py", line 252, in
refresh_everything(
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/songchao/Fooocus/modules/default_pipeline.py", line 226, in refresh_everything
refresh_loras(loras)
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/songchao/Fooocus/modules/default_pipeline.py", line 153, in refresh_loras
model = core.load_sd_lora(model, filename, strength_model=weight, strength_clip=weight)
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/songchao/Fooocus/modules/core.py", line 82, in load_sd_lora
lora = fcbh.utils.load_torch_file(lora_filename, safe_load=False)
File "/Users/songchao/Fooocus/backend/headless/fcbh/utils.py", line 13, in load_torch_file
sd = safetensors.torch.load_file(ckpt, device=device.type)
File "/Users/songchao/miniconda3/envs/fooocus/lib/python3.10/site-packages/safetensors/torch.py", line 259, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

No matter I download the model again, it's useless.

@lllyasviel
Copy link
Owner

MetadataIncompleteBuffer is corrupted files.

@huameiwei-vc
Copy link

image Why does this take so long? Need additional settings?

@huameiwei-vc
Copy link

image 为什么要这么久?需要其他设置?

mac m1 pro ,16g

@LeapLu
Copy link

LeapLu commented Oct 24, 2023

image

while stating the command this TypeError, then it worked, but keep Initializing... anyone can help?
macbook m1 pro

Update succeeded. Python 3.9.16 (main, Oct 18 2023, 16:24:00) [Clang 15.0.0 (clang-1500.0.40.1)] Fooocus version: 2.1.728 Running on local URL: http://127.0.0.1:7860
image

@Southmelon-Allen
Copy link

how many it/s can you have on the M1 pro ?

122s/it, macbook pro m2. ....so slow

@tiancool
Copy link

tiancool commented Nov 3, 2023

[Fooocus Model Management] Moving model (s) has taken 63.73 seconds, move the model once before each generation, too slow

@omioki23
Copy link

omioki23 commented Nov 4, 2023

image Hi! I'm a cpmplete noob to this. I'm using macbook pro m1 pro 16 gb and it takes around an hour to generate one image. Am I doing something wrong or is it just because I'm using mac? It also has an error that has been discussed here previously (The operator 'aten::std_mean.correction' is not currently supported on the MPS backend and will fall back to run on the CPU.), does it affect the waiting time?

@mashres15
Copy link

@omioki23 I also have the same issue. Seems like a Mac thing

@xplfly
Copy link

xplfly commented Nov 16, 2023

Machine information:m1 16g
image

I'm getting an error:
image

@jorge-campo

@colingoodman
Copy link

Setup was a breeze, but as others have mentioned generation is extremely slow. Unfortunate.

@badaramoni
Copy link

i did everything and i got url when i gave a prompt and tap generate it completed but i cant see any of images?
any help

@Shuffls
Copy link

Shuffls commented Nov 29, 2023

When trying to generate an image on my mac book m1 air, it gave the following error code:

RuntimeError: MPS backend out of memory (MPS allocated: 8.83 GB, other allocations: 231.34 MB, max allowed: 9.07 GB). Tried to allocate 25.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)

Clearly it is implying I do not have enough memory, though has anyone figured out how to rectify this please? Thanks

@ezzio-salas
Copy link

@Shuffls i tried
to update in launch.py
added
-os.environ["PYTORCH_MPS_HIGH_WATERMARK_RATIO"] = "0.0"
to disable upper limit for memory allocations

for similar issue and fixed my problem

@badaramoni
Copy link

RuntimeError: MPS backend out of memory (MPS allocated: 6.34 GB, other allocations: 430.54 MB, max allowed: 6.77 GB). Tried to allocate 10.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

@Deniffler
Copy link

Deniffler commented Mar 15, 2024

python entry_with_update.py --all-in-fp16 --attention-split,

Thanks bro!
I've tried all the variations of the startup commands. At first I had 100-110 sec/it , but your prompt was able to speed up the process to 5-7 sec/it !!!!
I have a MacBook Pro, M1 Pro, 16Gb, year 2021

@Deniffler
Copy link

For optimize the execution of your command in the command line and potentially speed up the process, we can focus on the parameters that most affect performance. However, given that your command already contains many parameters for optimizing memory and computational resource usage, the main directions for improvement may involve more efficient GPU usage and reducing the amount of data required for processing in each iteration.

Here's a modified version of your command considering possible optimizations:

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --unet-in-fp16 --vae-in-fp16 --clip-in-fp16

Explanation of changes:

Removed unsupported parameters: Parameters that caused an error due to their absence in the list of supported script parameters (--num-workers, --batch-size, --optimizer, --learning-rate, --precision-backend, --gradient-accumulation-steps) have been removed.

Clarification of FP16 usage: Explicit indications for using FP16 for different parts of the model (--unet-in-fp16, --vae-in-fp16, --clip-in-fp16) have been added. This suggests that your model may include components like U-Net, VAE (Variational Autoencoder), and CLIP. Using FP16 can speed up computations and reduce memory consumption, although it may also slightly affect the accuracy of the results.

Using asynchronous CUDA memory allocation: The --async-cuda-allocation parameter implies that the script will use asynchronous memory allocation, which can speed up data loading and the start of computations.

Additional tips:

Performance analysis: Use profiling tools to analyze CPU and GPU usage to identify bottlenecks.
Data loading optimization: If possible, optimize data loading to reduce IO wait times. This can include using faster data formats or buffering data in memory.
Library version checks: Ensure you are using the latest versions of all dependencies. Sometimes updates contain significant performance improvements.
Experiments with batch size: Although the --batch-size parameter is not supported by your current command, if there's an opportunity to adjust the batch size in the code, it can significantly impact performance. Larger batch sizes can increase performance at the expense of increased memory usage.
Remember, performance optimization often requires experimentation and fine-tuning, as the impact of changes can greatly depend on the specific details of your task and hardware.

@Deniffler
Copy link

Снимок экрана 2024-03-17 в 19 32 34

@Dkray
Copy link

Dkray commented Mar 17, 2024

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --unet-in-fp16 --vae-in-fp16 --clip-in-fp16

I try to use this and get message

/anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)

@Ibarton5317
Copy link

wow 8s/it - that is quite a bit waiting

I have 161.37s/it. can someone help me why?. Like how can i make my mac faster. Its. 2022 model so it has the m1 chip. But why is it this slow?

@cootshk
Copy link

cootshk commented May 15, 2024

Is it just quitting when trying to generate an image for anyone else? (M2 Mac Air)

@Infiexe
Copy link

Infiexe commented May 16, 2024

python entry_with_update.py
Update failed. No module named 'pygit2' Update succeeded. [System ARGV] ['entry_with_update.py'] Traceback (most recent call last): File "/Users/zeeshan/Fooocus/entry_with_update.py", line 46, in from launch import * File "/Users/zeeshan/Fooocus/launch.py", line 22, in from modules.launch_util import is_installed, run, python, run_pip, requirements_met File "/Users/zeeshan/Fooocus/modules/launch_util.py", line 9, in import packaging.version ModuleNotFoundError: No module named 'packaging' (fooocus)
Apple M1 Pro Sonoma 14.1.12

@Zeeshan-2k1 did you solve your issue? I had the exact same error shown. In my case, I had already a newer version of python (3.12) installed, with a link to it, so whenever I was doing commands with "python" it was linked to the newer version of course. However, when you follow the steps in the readme, it will install python 3.11 for you and you have to use this python also, as libraries such as pygit2 are installed in that framework as well (within Conda). Hope this helps!

Did you guys cd Fooocus and conda activate Fooocus before python entry_with_update.py?

@Infiexe
Copy link

Infiexe commented May 16, 2024

wow 8s/it - that is quite a bit waiting

I have 161.37s/it. can someone help me why?. Like how can i make my mac faster. Its. 2022 model so it has the m1 chip. But why is it this slow?

You're probably not using the optimization parameters mentioned right above your post.

@Infiexe
Copy link

Infiexe commented May 16, 2024

Here's a modified version of your command considering possible optimizations:

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --unet-in-fp16 --vae-in-fp16 --clip-in-fp16

Thank you, got it down to around 13-14 s/it on 2020 M1 MacBook Air 16GB. It starts with 10.5 tho, and slows down after a couple of steps. Fooocus still runs a bit slower than A1111 (7-8 s/it), but IMO still usable. I think it could be faster if it used both CPU and GPU cores. For now, it sits on about 96% with frequent dips to 80% GPU and only 10-17% CPU. Any way to change that? I want my whole machine to generate.

@nicksaintjohn
Copy link

Great work @Deniffler, you clearly spent more time and effort than I have, I was just glad to get it running fully off the GPU... Very glad that I helped set you on the right path, as you've now got us all running as past as possible. I'm much more productive now, many thanks.

@tjex
Copy link

tjex commented May 23, 2024

Posting here as described should be done. Can convert to an issue later if necessary.

The image input checkbox does not reveal the image input tabs / parameters for me.
And the prompt text box is huge...

On my fork I found the rows for the prompt box was set to 1024 and image_input_panel was hidden. The hidden panel could make sense considering the check box is probably what opens the panel? But the prompt box set to 1024 is odd.

It would also seem that image prompting is not working at all for me. I check "image prompt", place in two images (a landscape and an animal) and click generate. Fooocus then generates a random portrait image of a man/woman.

However, if I put an image into the describe tab, and click describe, it will indeed create a prompt from the image. So the tab/image handling seems to be working at least?

Anyone else having a similar problem?

@JaimeBulaGooten
Copy link

Here's a modified version of your command considering possible optimizations:
python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --unet-in-fp16 --vae-in-fp16 --clip-in-fp16

Thank you, got it down to around 13-14 s/it on 2020 M1 MacBook Air 16GB. It starts with 10.5 tho, and slows down after a couple of steps. Fooocus still runs a bit slower than A1111 (7-8 s/it), but IMO still usable. I think it could be faster if it used both CPU and GPU cores. For now, it sits on about 96% with frequent dips to 80% GPU and only 10-17% CPU. Any way to change that? I want my whole machine to generate.

3.51s/it On Mac Book Pro M3 with 36GB. Also, half of memory consumption after using this command.

image

@ElPatr0n75
Copy link

I get that when I try to install the requirements folder :
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

I got two error like that, I don't know how to solve that because then it doesn't run at all

@jpyudhi
Copy link

jpyudhi commented Aug 1, 2024

python entry_with_update.py

Update failed. No module named 'pygit2' Update succeeded. [System ARGV] ['entry_with_update.py'] Traceback (most recent call last): File "/Users/zeeshan/Fooocus/entry_with_update.py", line 46, in from launch import * File "/Users/zeeshan/Fooocus/launch.py", line 22, in from modules.launch_util import is_installed, run, python, run_pip, requirements_met File "/Users/zeeshan/Fooocus/modules/launch_util.py", line 9, in import packaging.version ModuleNotFoundError: No module named 'packaging' (fooocus)

Apple M1 Pro Sonoma 14.1.12

follow this it works : https://youtu.be/IebiL16lFyo?si=GSaczBlUuzjnP9TM

@ilgrandeanonimo
Copy link

ilgrandeanonimo commented Aug 4, 2024

I've tested some of the commands above

Results:
🐢
python entry_with_update.py --attention-pytorch
this is the worst, 120s/it, 60s to load model

🐱
python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2
25-20s/it, 45s to load model

🐆
python entry_with_update.py --unet-in-fp16 --attention-split
initially 20 then 14-15s/it, 40s to load model

🌟
python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --unet-in-fp16 --vae-in-fp16 --clip-in-fp16
This is the best, 10s/it and only 24s to load the model

My Configuration:
MacBook Air 13", M1 2020, 16GB, 8-Core CPU & 8 Core GPU with macOS Sonoma 14.6

ATTENTION
Ambient Temperature: 35°C
I used a cooling system to not melt the mac. (Ice bricks) Obviously I recommend finding a more suitable solution to cooling a computer or not cooling it with a maximum slowdown of 10s/it in my case.

@samerGMTM22
Copy link

I installed it successfully. Do I need to use the terminal every time to run it? or is there a way to create an execution file?

@deathblade287
Copy link

@samerGMTM22

Step 1 : Create a .sh file with the command python entry_with_update.py (or whatever version of it you use).

Step 2 (Option 1) : Follow this answer to convert it into a .app that you can then run as you would run any other app on mac os.

Step 2 (Option 2) : Select that file as one of the "login items" in your setting. Note that this way the server will always run in the background.

@kp9sunny
Copy link

Getting below error. Can someone please help @lllyasviel @jorge-campo @huameiwei-vc

Set vram state to: SHARED
Always offload VRAM
Device: mps
VAE dtype: torch.float32
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
Exception in thread Thread-3 (worker):
Traceback (most recent call last):
File "/Users/sunnydahiya/Fooocus/modules/patch.py", line 465, in loader
result = original_loader(*args, **kwargs)
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/site-packages/safetensors/torch.py", line 311, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/Users/sunnydahiya/Fooocus/modules/async_worker.py", line 181, in worker
import modules.default_pipeline as pipeline
File "/Users/sunnydahiya/Fooocus/modules/default_pipeline.py", line 270, in
refresh_everything(
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/sunnydahiya/Fooocus/modules/default_pipeline.py", line 250, in refresh_everything
refresh_base_model(base_model_name, vae_name)
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/sunnydahiya/Fooocus/modules/default_pipeline.py", line 74, in refresh_base_model
model_base = core.load_model(filename, vae_filename)
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/sunnydahiya/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/sunnydahiya/Fooocus/modules/core.py", line 147, in load_model
unet, clip, vae, vae_filename, clip_vision = load_checkpoint_guess_config(ckpt_filename, embedding_directory=path_embeddings,
File "/Users/sunnydahiya/Fooocus/ldm_patched/modules/sd.py", line 431, in load_checkpoint_guess_config
sd = ldm_patched.modules.utils.load_torch_file(ckpt_path)
File "/Users/sunnydahiya/Fooocus/ldm_patched/modules/utils.py", line 13, in load_torch_file
sd = safetensors.torch.load_file(ckpt, device=device.type)
File "/Users/sunnydahiya/Fooocus/modules/patch.py", line 481, in loader
raise ValueError(exp)
ValueError: Error while deserializing header: MetadataIncompleteBuffer
File corrupted: /Users/sunnydahiya/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors

@IPv6
Copy link

IPv6 commented Aug 15, 2024

realisticStockPhoto_v20
ValueError: Error while deserializing header: MetadataIncompleteBuffer

Your checkpoint is broken or has format unknown to Fooocus. redownload or try another checkpoint

BeEdward pushed a commit to diffus-me/Refocus that referenced this pull request Aug 22, 2024
Added option to run OBP Presets randomly
@medienbueroleipzig
Copy link

The operator is not yet supported by Apple, that's all. You can (want to) tinker with it as much as you want.
--disable-offload-from-vram only shortens the loading of the model, but has no effect on the lack of support for the pytorch operator aten::std_mean.correction.
Read more here: http://medienbüro-leipzig.de/index.php?action=faq&cat=10&id=24&artlang=en

Der Operator wird bis dato nicht von Apple unterstützt, dass ist alles. Da kann man noch so viel rumschrauben (wollen) wie man will.
--disable-offload-from-vram verkürzt nur das Laden des models, aber hat keine Auswirkungen auf die fehlende Unterstützung für den pytorch-operator aten::std_mean.correction.
Liest du mehr hier :http://medienbüro-leipzig.de/index.php?action=faq&cat=9&id=23&artlang=de

@lderugeriis
Copy link

With all the optimisation flags in place I'm still at 70s/it on an M2 Macbook PRO 1GB of RAM

@rellumgeluk
Copy link

After following all the steps to install, I got:

import packaging.version
ModuleNotFoundError: No module named 'packaging'.

How should I fix it? Thank in advance!

@medienbueroleipzig
Copy link

There is just a so called package missing in your fooocus-python environment. You can resolve this by installing the packaging module, but you have to be inside your environment ( If you installed Fooocus with helper programs such as Pinokio or Stability Matrix.: and If you dont know what it means: Every common AI -installation over pinokio or other helpers uses its own python environment for each application (Stable diffusion, SD-Forge, Fooocus etc.) on your computer to ensure that only working packages that are compatible with the base version of the needed python version are installed. You can find the whole installation of that python in the folder (mostly under /Fooocus/venv )to ensure that only working packages are installed that are compatible with the base version.)
It means you have to install the missing package modules in your environment (or in the main python installation if you not use a special environment).
If you’re using a virtual environment for Fooocus, make sure to activate it first. For example:
Open terminal. Insert "cd". and space. Drag from the finder the folder "fooocus" into the terminal windows, it appears the path, enter. Then activate the python environment in the terminal
source venv/bin/activate
it appears
(venv) user@MacBook-Pro Fooocus %
you are in the envoronment.
Then install the packaging by :
pip install packaging
After installation, check if the module is installed correctly:
python -c "import packaging; print(packaging.__version__)"
Run Fooocus again and verify if the error is resolved.
If you still encounter issues, ensure that:
You are using the correct Python environment.
Your pip is updated:
pip install --upgrade pip

@medienbueroleipzig
Copy link

With all the optimisation flags in place I'm still at 70s/it on an M2 Macbook PRO 1(6 sic!)GB of RAM

Apple regularly optimizes MPS support in PyTorch. Ensure you have the latest version of PyTorch for Metal support (this is a so called nightly version)(after activating the environment, see below this reply):
pip install torch torchvision --upgrade --pre --index-url https://download.pytorch.org/whl/nightly/cpu
or try to alterate this code in some of the python files:
http://medienbüro-leipzig.de/index.php?action=faq&cat=9&id=23&artlang=de]

@medienbueroleipzig
Copy link

Anpassung in supported_models.py
In der Datei supported_models.py wird torch.std folgendermaßen verwendet:

if torch.std(out, unbiased=False) > 0.09: # not sure how well this will actually work. I guess we will find out.
Lösung
Auch hier kann die Standardabweichung durch separate Mittelwert- und Varianzberechnungen ersetzt werden:

mean_out = torch.mean(out)
var_out = torch.mean((out - mean_out) ** 2)
std_out = torch.sqrt(var_out)
if std_out > 0.09:

@medienbueroleipzig
Copy link

medienbueroleipzig commented Jan 8, 2025

Here you find some changed files for fooocus:
https://github.com/medienbueroleipzig/fooocus-optimized

On a Mac Book Pro M3 Pro with 36GB shared RAM I get 5,2 s/it with SDXL 1.5 on a 1024x1024 with Lora's.
512x 512 in 1.7 s/it and no error messages anymore. But.... Foooocus is not the best solution for quick image generation on Mac.(Because of the lame implementation of the necessary libraries and packages on all sides: Apple, Torch and the developers of python AI-apps, which prefer the easy way with CUDA on windows (crap!). I use on Mac also DIffusion Bee with all models on an external drive (USB-C).

What I've done there:
Öffne die Datei anisotropic.py (in fooocus/modules/):
Finde die Zeile, in der torch.std_mean verwendet wird. In deiner Ausgabe steht die Zeile in etwa so:
s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)
Ersetze torch.std_mean durch separate torch.std und torch.mean Berechnungen:
Ändere die Zeile in zwei separate Aufrufe für torch.std und torch.mean, sodass sie wie folgt aussieht:
s = torch.std(g, dim=(1, 2, 3), keepdim=True)
m = torch.mean(g, dim=(1, 2, 3), keepdim=True)
Diese Änderung berechnet den Standardabweichungs- und den Mittelwert getrennt und vermeidet somit den CPU-Fallback, da torch.std und torch.mean auf MPS unterstützt werden.
Speichern und testen:
Speichere die Änderungen in anisotropic.py und starte Fooocus erneut.
Achte darauf, ob die Warnung für aten::std_mean.correction verschwindet und beobachte, ob sich die Inferenzgeschwindigkeit verbessert.

Berechnung für ro_pos

mean_pos = torch.mean(cond, dim=(1, 2, 3), keepdim=True)
var_pos = torch.mean((cond - mean_pos) ** 2, dim=(1, 2, 3), keepdim=True)
ro_pos = torch.sqrt(var_pos)

Berechnung für ro_cfg

mean_cfg = torch.mean(x_cfg, dim=(1, 2, 3), keepdim=True)
var_cfg = torch.mean((x_cfg - mean_cfg) ** 2, dim=(1, 2, 3), keepdim=True)
ro_cfg = torch.sqrt(var_cfg)
Hier sind spezifische Anpassungen, die du an den Stellen mit torch.std und torch.std_mean vornehmen kannst, um die CPU-Fallbacks zu vermeiden und die MPS-Leistung zu optimieren.

  1. Anpassung in anisotropic.py
    In der Funktion adaptive_anisotropic_filter wird torch.std_mean verwendet:

s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)
Lösung
Ersetze torch.std_mean durch separate Berechnungen für torch.mean und torch.var, um die Standardabweichung zu berechnen:

m = torch.mean(g, dim=(1, 2, 3), keepdim=True)
var = torch.mean((g - m) ** 2, dim=(1, 2, 3), keepdim=True)
s = torch.sqrt(var + 1e-5) # Kleine Konstante hinzufügen, um Division durch 0 zu vermeiden
Diese Änderung sollte dazu führen, dass die Operation vollständig auf der MPS-GPU bleibt.

  1. Anpassung in external_model_advanced.py (in /Fooocus/ldm_patched/contrib/)
    Hier finden wir zwei torch.std-Verwendungen in der Klasse RescaleCFG:

ro_pos = torch.std(cond, dim=(1, 2, 3), keepdim=True)
ro_cfg = torch.std(x_cfg, dim=(1, 2, 3), keepdim=True)
Lösung
Ersetze auch hier torch.std durch die Kombination aus Mittelwert und Varianz:

Berechnung für ro_pos

mean_pos = torch.mean(cond, dim=(1, 2, 3), keepdim=True)
var_pos = torch.mean((cond - mean_pos) ** 2, dim=(1, 2, 3), keepdim=True)
ro_pos = torch.sqrt(var_pos)

Berechnung für ro_cfg

mean_cfg = torch.mean(x_cfg, dim=(1, 2, 3), keepdim=True)
var_cfg = torch.mean((x_cfg - mean_cfg) ** 2, dim=(1, 2, 3), keepdim=True)
ro_cfg = torch.sqrt(var_cfg)

Anpassung in supported_models.py (in /Fooocus/ldm_patched/modules)
In der Datei supported_models.py wird torch.std folgendermaßen verwendet:

if torch.std(out, unbiased=False) > 0.09: # not sure how well this will actually work. I guess we will find out.
Lösung
Auch hier kann die Standardabweichung durch separate Mittelwert- und Varianzberechnungen ersetzt werden:

mean_out = torch.mean(out)
var_out = torch.mean((out - mean_out) ** 2)
std_out = torch.sqrt(var_out)
if std_out > 0.09:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.