-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Memory Usage When Loading Flux Model in ComfyUI #4480
Comments
Same for me. I hope they can improve this. The Q8_0 format looks almost as good as FT16, loads faster and requires less than 32 GB RAM while loading. The down side could be less compatibility with other features and less model finetunes, if the format does not gain popularity. |
So even users with 64 GB RAM need to use the page file ! By the way, is this with PP16 ? |
Reverting to commit 3e52e03 seems to have resolved the issue for me.
|
How much RAM do you need now ? I needed above 32 GB, even before this commit. |
I've not done exact measurement, but I have a 16 GB GeForce RTX 3060, and at that commit I am able to run the Flux dev FP8 with at least 1 Lora with no issues. Edit: I have 32 GB RAM (realised you were asking about RAM, not VRAM) |
Yes, Comfy UI will use most of your available VRAM, while generating ( 8 GB in my case ), and the rest is offloaded to RAM. The problem is that when the Flux mode is loading, it uses a lot of RAM. You can monitor RAM / VRAM usage in the Task Manager, while generating an image. |
It's also working fine for me at 83f3431. |
Okay. I've done some checks at different commits. For me, the last commit which works is 14af129. Commits beyond this (starting at bb222ce) cause out of memory issues (torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory). If I'm understanding correcting, the issue may be being caused by the changes to the memory mangement code here: comfy/model_patcher.py. However, I'm familiar with code base so I might be looking at this wrong. |
So, you have a specific memory error and you can't generate images ? I think @Govee-Chan was just talking about Flux requiring a lot of RAM while loading, but still being able to generate images. |
Yes, beyond the commit I mentioned I get a standard CUDA out of memory error every other generation when using Flux. For full transparency, I'm using ComfyUI via SwarmUI. |
I'm on the latest commit, with Comfy UI portable, and I don't have any errors. Also you mention your error happens "every other generation", so that means the model was already loaded. But I think @Govee-Chan refers to when the model is loaded on RAM for the first time ( on the first generation ). |
I have OOM's too after the yesterday's commits. Not only with flux but, strangely, even using SD 1.5 checkpoints. I've an RTX 3060 12 GB VRAM and 80 GB system RAM, Linux. |
Same here. I use flux only. First generation is successful. Second fails even for 512x512 images. Third is successful again and so on. RTX 4090, 64GB RAM. |
Getting OOM now after a few generations using Q8 quant, worked just fine a few days ago, 64GB RAM, 4060Ti 16GB. |
Just updated and tested, but i'm getting always OOM's (tested only flux). Returned to 14af129, which is working well |
Can you check if you still have those OOM issues on the latest commit? |
I still have OOM problems every 2-3 generations. Happens mostly when I change the prompt, it becomes very slow like I'm loading the checkpoint for the first time, then OOM. (flux schnell, rtx 3060 12 gb, 64gb ram)
|
If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is. |
the latest commit seems to solve my problem, the comfy thread occupied 20% ram at the peak(Ive got 64 intotal, so 13g seems normal), but I haven't try it on my AWS instance where I found the OOM originally. I suspect that the issue is due to my instance having too little memory(16g), but theoretically, 16G should be sufficient to run it, right? Thx anyway, I will try --reserve-vram 0.6 on my instance and see if it works |
i got no problem with the vram, I suspect there might be an issue during the transfer from memory to GPU memory while loading the model |
IS it related to pytorch 2.4 ? |
I'm using Pytorch 2.4, RAM usage loading FP8 spikes to 38GB, switching model after this goes up to 58GB so maybe there is something that can be done to improve model switching, the latest updates seem to have fixed the OOM issue although I find it interesting how the VRAM usage creeps up with the first few runs of the text encoder, maybe something is not getting unloaded properly or maybe this is intended behaviour. |
When you do switch, is it for the Flux FP16 version ? |
I've tried switching between FP8 and the Q8 quant which are fairly similar on VRAM usage, Q8 is very slightly higher. |
When I use Q8, I don't have RAM spikes while loading. But I never tried to use it after FP8. |
OOM errors resolved at 0.7, NVIDIA GeForce RTX 3080 Laptop GPU, 16GB, Linux, normal VRAM mode |
The latest updates mostly worked fine for me, but after trying to use Flux with >= 1 Lora, I was receiving OOM errors. Setting |
OK, for me: commit d1a6bd6, at 0.6 I can generate using the full flux-dev model, but get an OOM using a lora (realism lora), and the same at 0.7. At 0.8 I can generate everything. I made a little stress test, generating several times with flux, then with XL, back to flux, alternating generation with the full model and the Q8, and so on, and had no OOMs. |
updated comfy and now getting OOM with Lora as well today, worked fine yesterday. FIrst generations works fine, then OOM after. |
Q6_K is not even 0.4% worse than Q8 (for perplexity of 13B LLMs) |
How to return to an older commit ? Ok I found how! |
I can confirm reserving a portion of vram (0.7-1.0) helps, after 20 generations with 3 loras, no more OOMs on 3060. |
I had the same issue and the |
try |
I also got OOM with |
--disable-smart-memory fixed it for me. Thank's! |
Same issue. I have 64gb of RAM which ought to be plenty, and as of the recent updates the RAM usage has skyrocketed to the point where ComfyUI uses up to 70-80% of my RAM and I have to shut off the app to prevent issues |
I think the full model it's being upcasted to FP32, while loading, so this would be about 45 GB ( without the T5 encoder ). Could it be possible to upcast the Flux model, block by block ( instead of all at once ), keeping RAM usage lower ? |
Why is this marked as "feature" and not "bug"? I had to revert to an earlier commit, and can now use ComfyUI. I can't use current versions due to the absurd RAM usage. |
I actually opened this, as a bug, a while back, but still not fixed. |
As far as I know, the high RAM usage ( above 32 GB ), while loading, has been a problem since the beginning. @CasualDev242, you may have a different issue. Do you always have high RAM usage or only while loading the Flux model ? |
Like I mentioned, the bug is only with recent commits, and yes, it's using Flux. I did not have high RAM usage prior to these commits. It hasn't been a problem since the beginning for me since an earlier commit fixes it and it didn't use to occur. Loading the same Flux model and Loras with an earlier commits doesn't cause the absurd RAM issue (remember, I have 64gb of RAM, and ComfyUI is using 70%+ of it? How is that not an issue with the code?) |
If you are on windows it's perfectly normal for it to use up to 2x the memory of your largest safetensors file when loading. If you use the 22GB file + 10GB t5 it might peak at: 22 * 2 + 10 so 54GB ram usage when loading then drop down to 32GB ram usage. For the fp8 checkpoint it's going to peak at 17.2 * 2 so ~35 GB. That's an issue with the safetensors library on windows. Linux doesn't have this issue. |
So this in an issue with the safetensors file type. I'm on windows 10. But there is also a full quality version there: EDIT: Apparently not. With I also tried a native FP8 Flux model, ( 11 GB ), but it also requires above 32 GB RAM while loading. |
Thanks reserve-ram=0.6 worked for me, but I can't figure out how when I was able to run these workflows without issues. I even geneated 200 images for over an hour on 24gb vram, and then it broke in the middle of it. To be fair I was putting all those images in 1 large image, it would've been 400 1024x1024 images, which I think could have been the main cause, what I can't understand is how doing that broke the entire installation of comfyui that now i can't even generare 1 image with flux. I also installed Crystool in the middle of that large image generation, but hadn't restarted the server and was waiting to restart once the task was over. |
Feature Idea
Hello,
I am experiencing a significant memory usage issue when using the Flux model in ComfyUI. During the model loading phase, the memory consumption spikes to approximately 70GB. This seems excessively high and may not be feasible for many users.
Existing Solutions
No response
Other
No response
The text was updated successfully, but these errors were encountered: