T5 hasn't offloaded on MPS #4201

tombearx · 2024-08-04T14:25:32Z

Expected Behavior

During inference SD3 or FLUX on MPS I expected that T5 text encoder would be offloaded out of ram after a prompt was encoded.

Actual Behavior

T5 encoder is staying in the memory the whole time during and after inference.

Steps to Reproduce

Used official FLUX workflow https://comfyanonymous.github.io/ComfyUI_examples/flux/

Debug Logs

During inference with PYTORCH_DEBUG_MPS_ALLOCATOR=1 of fp16 model (24gb) with fp16 t5, gpu-only flag and all clip devices = mps:

Attempting to release cached buffers (MPS allocated: 34.20 GB, other allocations: 1.60 GB)
Attempting to release cached buffers (MPS allocated: 33.36 GB, other allocations: 2.87 GB)
Attempting to release cached buffers (MPS allocated: 32.34 GB, other allocations: 3.53 GB)

If t5 is not used for SD3 inference, allocated RAM is lower exactly by 10gb.

Other

I'm not good at programming, but looks like the problem is connected to the fact that all text encoders are offloaded to cpu if only-gpu flag is not used (in the case of MPS that's the same memory as gpu's), but not unloaded fully.

Adreitz · 2024-08-05T04:12:15Z

MPS only has unified memory, unlike PCs with separate RAM and VRAM. If T5 is unloaded on MPS, then it is flushed from memory entirely. When you send a new prompt it will need to be reloaded from disk, which is slow. I think it is better to keep it in memory

tombearx · 2024-08-11T02:23:05Z

MPS only has unified memory, unlike PCs with separate RAM and VRAM. If T5 is unloaded on MPS, then it is flushed from memory entirely. When you send a new prompt it will need to be reloaded from disk, which is slow. I think it is better to keep it in memory

That's the point, right now I'm unable to run Flux because T5 encoder is loaded in memory. If there would be opportunity to unload it somehow, it would be possible to at least run Flux on MPS. Slow running is better than nothing.

tombearx · 2024-08-11T02:33:28Z

@comfyanonymous is it possible to unload T5 from memory (not offload to cpu) on mps if the --lowvram is used?

ltdrdata · 2024-08-11T06:21:26Z

@comfyanonymous is it possible to unload T5 from memory (not offload to cpu) on mps if the --lowvram is used?

Are you trying to perform text encoding only once and not change the prompt afterwards?

In that case, you can set up a method using an independent workflow that caches the result of CLIPTextEncode through the Backend Cache nodes of the Inspire Pack and doesn't use the clip loader.

tombearx · 2024-08-11T18:15:17Z

@comfyanonymous is it possible to unload T5 from memory (not offload to cpu) on mps if the --lowvram is used?

Are you trying to perform text encoding only once and not change the prompt afterwards?

In that case, you can set up a method using an independent workflow that caches the result of CLIPTextEncode through the Backend Cache nodes of the Inspire Pack and doesn't use the clip loader.

Cool! Thanks, looks like that solves the problem!

tombearx added the Potential Bug User is reporting a bug. This should be tested. label Aug 4, 2024

tombearx mentioned this issue Aug 4, 2024

FLUX Issue | MPS framework doesn't support float64 #4165

Open

tombearx closed this as completed Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5 hasn't offloaded on MPS #4201

T5 hasn't offloaded on MPS #4201

tombearx commented Aug 4, 2024 •

edited

Loading

Adreitz commented Aug 5, 2024

tombearx commented Aug 11, 2024

tombearx commented Aug 11, 2024

ltdrdata commented Aug 11, 2024

tombearx commented Aug 11, 2024

T5 hasn't offloaded on MPS #4201

T5 hasn't offloaded on MPS #4201

Comments

tombearx commented Aug 4, 2024 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

Adreitz commented Aug 5, 2024

tombearx commented Aug 11, 2024

tombearx commented Aug 11, 2024

ltdrdata commented Aug 11, 2024

tombearx commented Aug 11, 2024

tombearx commented Aug 4, 2024 •

edited

Loading