High Memory Usage When Loading Flux Model in ComfyUI #4480

Govee-Chan · 2024-08-19T08:04:54Z

Feature Idea

Hello,

I am experiencing a significant memory usage issue when using the Flux model in ComfyUI. During the model loading phase, the memory consumption spikes to approximately 70GB. This seems excessively high and may not be feasible for many users.

Existing Solutions

No response

Other

No response

JorgeR81 · 2024-08-19T09:28:02Z

Same for me.
I only have 32 GB RAM, and my system needs to use the file page, while loading, even for the FT8 version.
#4239

I hope they can improve this.

The Q8_0 format looks almost as good as FT16, loads faster and requires less than 32 GB RAM while loading.
https://github.com/city96/ComfyUI-GGUF

The down side could be less compatibility with other features and less model finetunes, if the format does not gain popularity.

JorgeR81 · 2024-08-19T09:35:44Z

memory consumption spikes to approximately 70GB

So even users with 64 GB RAM need to use the page file !

By the way, is this with PP16 ?
How much RAM for FP8 ?

DivineOmega · 2024-08-19T12:34:13Z

Reverting to commit 3e52e03 seems to have resolved the issue for me.

git checkout 3e52e0364cf81764f58e5aa4f53f0b702f4d4a81

JorgeR81 · 2024-08-19T12:47:29Z

Reverting to commit 3e52e03 seems to have resolved the issue for me.

How much RAM do you need now ?

I needed above 32 GB, even before this commit.

DivineOmega · 2024-08-19T12:50:19Z

Reverting to commit 3e52e03 seems to have resolved the issue for me.

How much RAM do you need now ?

I needed above 32 GB, even before this commit.

I've not done exact measurement, but I have a 16 GB GeForce RTX 3060, and at that commit I am able to run the Flux dev FP8 with at least 1 Lora with no issues.

Edit: I have 32 GB RAM (realised you were asking about RAM, not VRAM)

JorgeR81 · 2024-08-19T13:06:33Z

Edit: I have 32 GB RAM (realised you were asking about RAM, not VRAM)

Yes, Comfy UI will use most of your available VRAM, while generating ( 8 GB in my case ), and the rest is offloaded to RAM.
So while the KSampler is running I use about 15 GB RAM, in FP8 mode.

The problem is that when the Flux mode is loading, it uses a lot of RAM.
With "only" 32 GB of RAM, if you don't get an OOM error, it's because your system uses the page file.

You can monitor RAM / VRAM usage in the Task Manager, while generating an image.

DivineOmega · 2024-08-19T13:25:41Z

It's also working fine for me at 83f3431.

DivineOmega · 2024-08-19T14:06:41Z

Okay. I've done some checks at different commits.

For me, the last commit which works is 14af129.

Commits beyond this (starting at bb222ce) cause out of memory issues (torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory).

If I'm understanding correcting, the issue may be being caused by the changes to the memory mangement code here: comfy/model_patcher.py. However, I'm familiar with code base so I might be looking at this wrong.

JorgeR81 · 2024-08-19T14:16:24Z

For me, the last commit which works is 14af129

So, you have a specific memory error and you can't generate images ?

I think @Govee-Chan was just talking about Flux requiring a lot of RAM while loading, but still being able to generate images.

DivineOmega · 2024-08-19T14:18:53Z

For me, the last commit which works is 14af129

So, you have a specific memory error and you can't generate images ?

I think @Govee-Chan was just talking about Flux requiring a lot of RAM while loading, but still being able to generate images.

Yes, beyond the commit I mentioned I get a standard CUDA out of memory error every other generation when using Flux. For full transparency, I'm using ComfyUI via SwarmUI.

JorgeR81 · 2024-08-19T14:44:01Z

I'm on the latest commit, with Comfy UI portable, and I don't have any errors.
Maybe it's a SwarmUI issue ?

Also you mention your error happens "every other generation", so that means the model was already loaded.

But I think @Govee-Chan refers to when the model is loaded on RAM for the first time ( on the first generation ).

YureP · 2024-08-19T15:32:04Z

I have OOM's too after the yesterday's commits. Not only with flux but, strangely, even using SD 1.5 checkpoints. I've an RTX 3060 12 GB VRAM and 80 GB system RAM, Linux.
Now i'm using a 2 days ago commit and have no problem generating with the 22.2 GB Flux-DEV (FP16 etc.) plus Lora.

D-Ogi · 2024-08-19T15:40:05Z

Same here. I use flux only. First generation is successful. Second fails even for 512x512 images. Third is successful again and so on. RTX 4090, 64GB RAM.

Chryseus · 2024-08-19T17:21:02Z

Getting OOM now after a few generations using Q8 quant, worked just fine a few days ago, 64GB RAM, 4060Ti 16GB.
Python 3.10.11, Windows 10, Pytorch 2.4.0 cu124, xformers 0.0.27.post2

YureP · 2024-08-19T17:32:07Z

Just updated and tested, but i'm getting always OOM's (tested only flux). Returned to 14af129, which is working well

comfyanonymous · 2024-08-19T21:56:06Z

Can you check if you still have those OOM issues on the latest commit?

dan4ik94 · 2024-08-19T22:43:37Z

Can you check if you still have those OOM issues on the latest commit?

I still have OOM problems every 2-3 generations. Happens mostly when I change the prompt, it becomes very slow like I'm loading the checkpoint for the first time, then OOM. (flux schnell, rtx 3060 12 gb, 64gb ram)

File "D:\ComfyUI_windows_portable\ComfyUI\comfy\float.py", line 57, in stochastic_rounding
    return manual_stochastic_round_to_float8(value, dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\comfy\float.py", line 40, in manual_stochastic_round_to_float8
    sign * (2.0 ** (exponent - EXPONENT_BIAS)) * (1.0 + mantissa),
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: Allocation on device
Got an OOM, unloading all loaded models.

comfyanonymous · 2024-08-19T23:04:54Z

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

Govee-Chan · 2024-08-20T02:59:56Z

Can you check if you still have those OOM issues on the latest commit?

the latest commit seems to solve my problem, the comfy thread occupied 20% ram at the peak(Ive got 64 intotal, so 13g seems normal), but I haven't try it on my AWS instance where I found the OOM originally. I suspect that the issue is due to my instance having too little memory(16g), but theoretically, 16G should be sufficient to run it, right?

Thx anyway, I will try --reserve-vram 0.6 on my instance and see if it works

Govee-Chan · 2024-08-20T03:02:21Z

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

i got no problem with the vram, I suspect there might be an issue during the transfer from memory to GPU memory while loading the model

Foul-Tarnished · 2024-08-20T04:08:20Z

IS it related to pytorch 2.4 ?
I tried sdwebui-forge with pytorch 2.4, and it also spike to ~70GB ram usage

Chryseus · 2024-08-20T11:46:08Z

I'm using Pytorch 2.4, RAM usage loading FP8 spikes to 38GB, switching model after this goes up to 58GB so maybe there is something that can be done to improve model switching, the latest updates seem to have fixed the OOM issue although I find it interesting how the VRAM usage creeps up with the first few runs of the text encoder, maybe something is not getting unloaded properly or maybe this is intended behaviour.

JorgeR81 · 2024-08-20T11:58:50Z

switching model after this goes up to 58GB

When you do switch, is it for the Flux FP16 version ?
I think the FP16 one requires more RAM while loading.

Chryseus · 2024-08-20T12:02:25Z

When you do switch, is it for the Flux FP16 version ? I think the FP16 one requires more RAM while loading.

I've tried switching between FP8 and the Q8 quant which are fairly similar on VRAM usage, Q8 is very slightly higher.

JorgeR81 · 2024-08-20T12:13:00Z

When I use Q8, I don't have RAM spikes while loading.
It never goes above 32 GB.

But I never tried to use it after FP8.

SchrodingersCatwalk · 2024-08-20T13:16:12Z

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

OOM errors resolved at 0.7, NVIDIA GeForce RTX 3080 Laptop GPU, 16GB, Linux, normal VRAM mode

DivineOmega · 2024-08-20T15:33:22Z

The latest updates mostly worked fine for me, but after trying to use Flux with >= 1 Lora, I was receiving OOM errors. Setting --reserve-vram to 0.7 resolved this.

YureP · 2024-08-20T17:15:19Z

OK, for me: commit d1a6bd6, at 0.6 I can generate using the full flux-dev model, but get an OOM using a lora (realism lora), and the same at 0.7. At 0.8 I can generate everything. I made a little stress test, generating several times with flux, then with XL, back to flux, alternating generation with the full model and the Q8, and so on, and had no OOMs.
The max VRAM load is 11.99/12 GB, and the max system RAM load is 46/80 GB.

screan · 2024-08-20T18:29:16Z

updated comfy and now getting OOM with Lora as well today, worked fine yesterday.

FIrst generations works fine, then OOM after.

Foul-Tarnished · 2024-08-20T23:23:35Z

Q6_K is not even 0.4% worse than Q8 (for perplexity of 13B LLMs)
And you gain +1gb vram

ErixStrong · 2024-08-21T01:03:09Z

Just updated and tested, but i'm getting always OOM's (tested only flux). Returned to 14af129, which is working well

How to return to an older commit ?

Ok I found how!

dan4ik94 · 2024-08-21T08:55:23Z

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

I can confirm reserving a portion of vram (0.7-1.0) helps, after 20 generations with 3 loras, no more OOMs on 3060.
🌞

RedDeltas · 2024-08-26T10:04:22Z

I had the same issue and the --reserve-vram flag didn't work for me, I tried values 0.6-1.0 and it didn't resolve the issue. Reverting back to 14af129 did fix it for me though.

ltdrdata · 2024-08-26T10:22:10Z

I had the same issue and the --reserve-vram flag didn't work for me, I tried values 0.6-1.0 and it didn't resolve the issue. Reverting back to 14af129 did fix it for me though.

try --disable-smart-memory

tobias-varden · 2024-08-26T12:53:56Z

I also got OOM with c6812947e98eb384250575d94108d9eb747765d9 so I had to revert back to 6ab1e6fd4a2f7cc5945310f0ecfc11617aa9a2cb which fixed the issue. I am using Flux fp8 together with two LORAs.

btln · 2024-08-27T14:40:37Z

--disable-smart-memory fixed it for me. Thank's!

CasualDev242 · 2024-08-29T15:09:01Z

Same issue. I have 64gb of RAM which ought to be plenty, and as of the recent updates the RAM usage has skyrocketed to the point where ComfyUI uses up to 70-80% of my RAM and I have to shut off the app to prevent issues

JorgeR81 · 2024-08-29T15:29:33Z

I think the full model it's being upcasted to FP32, while loading, so this would be about 45 GB ( without the T5 encoder ).

Could it be possible to upcast the Flux model, block by block ( instead of all at once ), keeping RAM usage lower ?

CasualDev242 · 2024-09-02T14:22:06Z

Why is this marked as "feature" and not "bug"? I had to revert to an earlier commit, and can now use ComfyUI. I can't use current versions due to the absurd RAM usage.

JorgeR81 · 2024-09-02T15:07:56Z

Why is this marked as "feature" and not "bug"?

I actually opened this, as a bug, a while back, but still not fixed.
#4239

JorgeR81 · 2024-09-02T15:20:37Z

I had to revert to an earlier commit, and can now use ComfyUI

As far as I know, the high RAM usage ( above 32 GB ), while loading, has been a problem since the beginning.

@CasualDev242, you may have a different issue. Do you always have high RAM usage or only while loading the Flux model ?

CasualDev242 · 2024-09-02T21:55:56Z

I had to revert to an earlier commit, and can now use ComfyUI

As far as I know, the high RAM usage ( above 32 GB ), while loading, has been a problem since the beginning.

@CasualDev242, you may have a different issue. Do you always have high RAM usage or only while loading the Flux model ?

Like I mentioned, the bug is only with recent commits, and yes, it's using Flux. I did not have high RAM usage prior to these commits. It hasn't been a problem since the beginning for me since an earlier commit fixes it and it didn't use to occur. Loading the same Flux model and Loras with an earlier commits doesn't cause the absurd RAM issue (remember, I have 64gb of RAM, and ComfyUI is using 70%+ of it? How is that not an issue with the code?)

comfyanonymous · 2024-09-02T22:33:20Z

If you are on windows it's perfectly normal for it to use up to 2x the memory of your largest safetensors file when loading. If you use the 22GB file + 10GB t5 it might peak at: 22 * 2 + 10 so 54GB ram usage when loading then drop down to 32GB ram usage.

For the fp8 checkpoint it's going to peak at 17.2 * 2 so ~35 GB.

That's an issue with the safetensors library on windows. Linux doesn't have this issue.

JorgeR81 · 2024-09-02T22:55:14Z

If you are on windows it's perfectly normal for it to use up to 2x the memory of your largest safetensors file when loading. If you use the 22GB file + 10GB t5 it might peak at: 22 * 2 + 10 so 54GB ram usage when loading then drop down to 32GB ram usage.

For the fp8 checkpoint it's going to peak at 17.2 * 2 so ~35 GB.

That's an issue with the safetensors library on windows. Linux doesn't have this issue.

So this in an issue with the safetensors file type.

I'm on windows 10.
This issue does not happen with the GGUF models ( e.g.: flux1-dev-Q8_0.gguf )
https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main

But there is also a full quality version there: flux1-dev-F16.gguf ( 22 GB )
Do you think using this flux1-dev-F16.gguf model could fix the problem ?

EDIT: Apparently not. With flux1-dev-F16.gguf, my RAM usage still goes from 3.8 GB to above 32 GB.

I also tried a native FP8 Flux model, ( 11 GB ), but it also requires above 32 GB RAM while loading.

keyvez · 2024-09-14T14:54:53Z

reserve

Thanks reserve-ram=0.6 worked for me, but I can't figure out how when I was able to run these workflows without issues. I even geneated 200 images for over an hour on 24gb vram, and then it broke in the middle of it. To be fair I was putting all those images in 1 large image, it would've been 400 1024x1024 images, which I think could have been the main cause, what I can't understand is how doing that broke the entire installation of comfyui that now i can't even generare 1 image with flux.

I also installed Crystool in the middle of that large image generation, but hadn't restarted the server and was waiting to restart once the task was over.

Govee-Chan added the Feature A new feature to add to ComfyUI. label Aug 19, 2024

JorgeR81 mentioned this issue Oct 22, 2024

Add GGUF versions for SD3.5 city96/ComfyUI-GGUF#136

Open

High Memory Usage When Loading Flux Model in ComfyUI #4480

High Memory Usage When Loading Flux Model in ComfyUI #4480

Comments

Govee-Chan commented Aug 19, 2024

Feature Idea

Existing Solutions

Other

JorgeR81 commented Aug 19, 2024 • edited Loading

JorgeR81 commented Aug 19, 2024

DivineOmega commented Aug 19, 2024 • edited Loading

JorgeR81 commented Aug 19, 2024

DivineOmega commented Aug 19, 2024 • edited Loading

JorgeR81 commented Aug 19, 2024

DivineOmega commented Aug 19, 2024

DivineOmega commented Aug 19, 2024

JorgeR81 commented Aug 19, 2024

DivineOmega commented Aug 19, 2024

JorgeR81 commented Aug 19, 2024

YureP commented Aug 19, 2024 • edited Loading

D-Ogi commented Aug 19, 2024

Chryseus commented Aug 19, 2024 • edited Loading

YureP commented Aug 19, 2024 • edited Loading

comfyanonymous commented Aug 19, 2024

dan4ik94 commented Aug 19, 2024 • edited Loading

comfyanonymous commented Aug 19, 2024

Govee-Chan commented Aug 20, 2024

Govee-Chan commented Aug 20, 2024

Foul-Tarnished commented Aug 20, 2024

Chryseus commented Aug 20, 2024

JorgeR81 commented Aug 20, 2024

Chryseus commented Aug 20, 2024 • edited Loading

JorgeR81 commented Aug 20, 2024

SchrodingersCatwalk commented Aug 20, 2024

DivineOmega commented Aug 20, 2024

YureP commented Aug 20, 2024

screan commented Aug 20, 2024 • edited Loading

Foul-Tarnished commented Aug 20, 2024 • edited Loading

ErixStrong commented Aug 21, 2024 • edited Loading

dan4ik94 commented Aug 21, 2024 • edited Loading

RedDeltas commented Aug 26, 2024

ltdrdata commented Aug 26, 2024

tobias-varden commented Aug 26, 2024

btln commented Aug 27, 2024

CasualDev242 commented Aug 29, 2024

JorgeR81 commented Aug 29, 2024

CasualDev242 commented Sep 2, 2024

JorgeR81 commented Sep 2, 2024 • edited Loading

JorgeR81 commented Sep 2, 2024

CasualDev242 commented Sep 2, 2024 • edited Loading

comfyanonymous commented Sep 2, 2024

JorgeR81 commented Sep 2, 2024 • edited Loading

keyvez commented Sep 14, 2024

JorgeR81 commented Aug 19, 2024 •

edited

Loading

DivineOmega commented Aug 19, 2024 •

edited

Loading

DivineOmega commented Aug 19, 2024 •

edited

Loading

YureP commented Aug 19, 2024 •

edited

Loading

Chryseus commented Aug 19, 2024 •

edited

Loading

YureP commented Aug 19, 2024 •

edited

Loading

dan4ik94 commented Aug 19, 2024 •

edited

Loading

Chryseus commented Aug 20, 2024 •

edited

Loading

screan commented Aug 20, 2024 •

edited

Loading

Foul-Tarnished commented Aug 20, 2024 •

edited

Loading

ErixStrong commented Aug 21, 2024 •

edited

Loading

dan4ik94 commented Aug 21, 2024 •

edited

Loading

JorgeR81 commented Sep 2, 2024 •

edited

Loading

CasualDev242 commented Sep 2, 2024 •

edited

Loading

JorgeR81 commented Sep 2, 2024 •

edited

Loading