Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T5 hasn't offloaded on MPS #4201

Closed
tombearx opened this issue Aug 4, 2024 · 5 comments
Closed

T5 hasn't offloaded on MPS #4201

tombearx opened this issue Aug 4, 2024 · 5 comments
Labels
Potential Bug User is reporting a bug. This should be tested.

Comments

@tombearx
Copy link

tombearx commented Aug 4, 2024

Expected Behavior

During inference SD3 or FLUX on MPS I expected that T5 text encoder would be offloaded out of ram after a prompt was encoded.

Actual Behavior

T5 encoder is staying in the memory the whole time during and after inference.

Steps to Reproduce

Used official FLUX workflow https://comfyanonymous.github.io/ComfyUI_examples/flux/

Debug Logs

During inference with PYTORCH_DEBUG_MPS_ALLOCATOR=1 of fp16 model (24gb) with fp16 t5, gpu-only flag and all clip devices = mps:

Attempting to release cached buffers (MPS allocated: 34.20 GB, other allocations: 1.60 GB)
Attempting to release cached buffers (MPS allocated: 33.36 GB, other allocations: 2.87 GB)
Attempting to release cached buffers (MPS allocated: 32.34 GB, other allocations: 3.53 GB)

If t5 is not used for SD3 inference, allocated RAM is lower exactly by 10gb.

Other

I'm not good at programming, but looks like the problem is connected to the fact that all text encoders are offloaded to cpu if only-gpu flag is not used (in the case of MPS that's the same memory as gpu's), but not unloaded fully.

@tombearx tombearx added the Potential Bug User is reporting a bug. This should be tested. label Aug 4, 2024
@Adreitz
Copy link

Adreitz commented Aug 5, 2024

MPS only has unified memory, unlike PCs with separate RAM and VRAM. If T5 is unloaded on MPS, then it is flushed from memory entirely. When you send a new prompt it will need to be reloaded from disk, which is slow. I think it is better to keep it in memory

@tombearx
Copy link
Author

MPS only has unified memory, unlike PCs with separate RAM and VRAM. If T5 is unloaded on MPS, then it is flushed from memory entirely. When you send a new prompt it will need to be reloaded from disk, which is slow. I think it is better to keep it in memory

That's the point, right now I'm unable to run Flux because T5 encoder is loaded in memory. If there would be opportunity to unload it somehow, it would be possible to at least run Flux on MPS. Slow running is better than nothing.

@tombearx
Copy link
Author

@comfyanonymous is it possible to unload T5 from memory (not offload to cpu) on mps if the --lowvram is used?

@ltdrdata
Copy link
Collaborator

@comfyanonymous is it possible to unload T5 from memory (not offload to cpu) on mps if the --lowvram is used?

Are you trying to perform text encoding only once and not change the prompt afterwards?

In that case, you can set up a method using an independent workflow that caches the result of CLIPTextEncode through the Backend Cache nodes of the Inspire Pack and doesn't use the clip loader.

@tombearx
Copy link
Author

@comfyanonymous is it possible to unload T5 from memory (not offload to cpu) on mps if the --lowvram is used?

Are you trying to perform text encoding only once and not change the prompt afterwards?

In that case, you can set up a method using an independent workflow that caches the result of CLIPTextEncode through the Backend Cache nodes of the Inspire Pack and doesn't use the clip loader.

Cool! Thanks, looks like that solves the problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Potential Bug User is reporting a bug. This should be tested.
Projects
None yet
Development

No branches or pull requests

3 participants