Models caching does not work (sd_checkpoints_limit) #2176

psydok · 2024-10-24T20:31:04Z

Tested setting sd_checkpoints_keep_in_cpu: false, sd_checkpoints_limit: 3, sd_checkpoint_cache: 3. Nothing worked. Every request for a new model is long.

The text was updated successfully, but these errors were encountered:

psydok · 2024-10-24T21:17:55Z

In automatic1111 it worked. Why was it deleted here, but the fields were left?

Hugs288 · 2024-10-24T21:49:43Z

i think one of the updates broke model caching, it used to be perfect before but now sometimes after i dont generate for a couple mins or run generate after running hires fix it tries to load the whole model from disk again, pretty randomly aswell, idk.

s4130 · 2024-10-25T05:40:29Z

I suspect switching models is causing RAM usage to keep increasing, probably because these settings aren't taking effect.

psydok · 2024-10-25T17:26:50Z

I also noticed that if you send {"override_settings":{"sd_model_checkpoint": "flux1-dev-bnb-nf4-v2.safetensors", "forge_preset": "flux", "forge_additional_modules": []}}, but this model is used by default, then Forge still restarts loading checkpoints because of this inference longer than expected.

altoiddealer · 2024-10-25T19:21:38Z

I also noticed that if you send {"override_settings":{"sd_model_checkpoint": "flux1-dev-bnb-nf4-v2.safetensors", "forge_preset": "flux", "forge_additional_modules": []}}, but this model is used by default, then Forge still restarts loading checkpoints because of this inference longer than expected.

The way override_settings works, is that if a provided settings value is identical to the current stored value, then it is ignored.

With sd_model_checkpoint... you can "set" the value to a wide variety of accepted "checkpoint aliases", and I'm not quite sure at what point this happens, but the value will subsequently change to the "title" returned by the sd-models API endpoint.

So what is happening is you are passing the model_name value which is a valid value, but it is not equal to the current value so it is not ignoring it, it is setting it, and so model params are refreshing, etc.

I found a way to resolve this... will be pushing a PR soon.

altoiddealer · 2024-10-25T21:07:30Z

@psydok please check out this PR here which resolves the issue you mentioned in your comment here (Not your "main issue"). Works for me - if you get a chance to try it out, please leave a comment there. Thank you.

#2181

psydok · 2024-10-25T21:50:45Z

@altoiddealer Okay, I'll look at PR and test it tomorrow. Thank you for fix!

UPD: It's okay! Thanks! But issue will not close. I would like to restore work of these parameters in Forge: sd_checkpoints_keep_in_cpu: false, sd_checkpoints_limit: 3, sd_checkpoint_cache: 3.

psydok · 2024-10-28T22:05:07Z

I found commit where fatal changes were made. But the name of the commit does not give any information about why it was done.
@lllyasviel @DenOfEquity Does anyone know if this happened by accident for debug or if there was some kind of mistake that caused something not to work?

DenOfEquity · 2024-10-28T22:49:59Z

That's a very old commit, before I was using Forge. Possibly even before Forge was public? Probably caused (or had high potential to cause) issues after backend reworks by complicating memory management, but that's just speculation. Since then the backend is reworked again, with the Flux update.
There's quite a few relics in the code. A good way to check if settings are used is to search the repo: sd_checkpoints_keep_in_cpu, sd_checkpoints_limit, sd_checkpoint_cache are not referenced anywhere, not even in commented out code.

psydok · 2024-11-13T17:29:50Z

@DenOfEquity Thanks for explanation!

Another question has formed in my mind. I'm trying to reconstruct logic, but things have changed lot in forge and there are lot of wrapper classes.
Could you please tell me, maybe you understand what class should be stored in memory to store both flux (~12gb) and some version of sdxl (~8gb) (for example)?
To be able to quickly switch between models. I thought I needed to add the --sd-checkpoint-limit logic to memory_management.py. But I got confused by count of class reinitializations. They seem to be reinitialized all time, even if model_data.forge_hash matches (False - doesn't affect anything).
Either problem is that I'm debugging on very weak gpu (2gb).

what class should be saved and can it be moved to cpu and back somehow gracefully?
I don't think it will work without global changes...

psydok · 2024-11-13T20:30:49Z

I noticed that if you add --always-gpu when starting forge, it seems like checkpoint change doesn't take as long. I don't understand why though? The memory manager clears everything anyway, it seems.

DenOfEquity · 2024-11-14T00:57:41Z

I only know what I know as a result of poking around, so my understanding could be completely wrong.
Models are stored in 3 classes: JointTextEncoder, KModel, IntegratedAutoencoderKL. The latter 2 seem to be reused/reinitialised when a new model in loaded. The first doesn't get reused, potentially leading to the memory leak / excess Committed memory problem some users have.
I'd say Forge is fundamentally not designed to keep multiple models loaded anymore. (With modern models barely fitting into typical consumer hardware anyway, it's likely just too much extra complexity for too low value.)

psydok changed the title ~~Model caching does not work~~ Model caching does not work (sd_checkpoints_limit) Oct 27, 2024

psydok changed the title ~~Model caching does not work (sd_checkpoints_limit)~~ Models caching does not work (sd_checkpoints_limit) Oct 27, 2024

DenOfEquity mentioned this issue Oct 29, 2024

it seems that the "Maximum number of checkpoints loaded at the same time" setting doesnt work/do anything #2211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models caching does not work (sd_checkpoints_limit) #2176

Models caching does not work (sd_checkpoints_limit) #2176

psydok commented Oct 24, 2024

psydok commented Oct 24, 2024

Hugs288 commented Oct 24, 2024 •

edited

Loading

s4130 commented Oct 25, 2024

psydok commented Oct 25, 2024 •

edited

Loading

altoiddealer commented Oct 25, 2024

altoiddealer commented Oct 25, 2024 •

edited

Loading

psydok commented Oct 25, 2024 •

edited

Loading

psydok commented Oct 28, 2024

DenOfEquity commented Oct 28, 2024

psydok commented Nov 13, 2024 •

edited

Loading

psydok commented Nov 13, 2024

DenOfEquity commented Nov 14, 2024

Models caching does not work (sd_checkpoints_limit) #2176

Models caching does not work (sd_checkpoints_limit) #2176

Comments

psydok commented Oct 24, 2024

psydok commented Oct 24, 2024

Hugs288 commented Oct 24, 2024 • edited Loading

s4130 commented Oct 25, 2024

psydok commented Oct 25, 2024 • edited Loading

altoiddealer commented Oct 25, 2024

altoiddealer commented Oct 25, 2024 • edited Loading

psydok commented Oct 25, 2024 • edited Loading

psydok commented Oct 28, 2024

DenOfEquity commented Oct 28, 2024

psydok commented Nov 13, 2024 • edited Loading

psydok commented Nov 13, 2024

DenOfEquity commented Nov 14, 2024

Hugs288 commented Oct 24, 2024 •

edited

Loading

psydok commented Oct 25, 2024 •

edited

Loading

altoiddealer commented Oct 25, 2024 •

edited

Loading

psydok commented Oct 25, 2024 •

edited

Loading

psydok commented Nov 13, 2024 •

edited

Loading