Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of running text encoder on CPU? #23

Closed
congdm opened this issue Apr 10, 2024 · 2 comments
Closed

Possibility of running text encoder on CPU? #23

congdm opened this issue Apr 10, 2024 · 2 comments

Comments

@congdm
Copy link

congdm commented Apr 10, 2024

From inference.py I can see that the T5Encoder is loaded into GPU with float16 format:
t5_encoder = T5TextEmbedder().to(pipe.device, dtype=torch.float16)
And during the inference step, the output embeddings from T5Encoder are converted into the same format as SD pipeline:
prompt_embeds = t5_encoder(prompt, max_length=128).to(pipe.device, pipe.dtype)

So in order to save VRAM, I tried experimenting with let the T5 model stay on CPU, by changing the load model line:
t5_encoder = T5TextEmbedder()

It ran fine, however the result was totally different, the prompt wasn't working well. So it turns out that running the model in FP32, then converting the embeddings to FP16 is not the same thing as running the model directly in FP16.
Also when I tried loading pipeline in BF16, but still keeping the text encoder in FP16, the result was also different too.

So in order to use this ella-sd1.5-tsc-t5xl model properly, both the SD model and the T5Encoder must be in FP16, am I understanding right?

@budui
Copy link
Collaborator

budui commented Apr 11, 2024

Yes. We conducted the vast majority of experiments on V100, which does not support bf16, so we had to use the fp16 T5 for training. I tested and found that the output difference between the fp16 T5 and the bf16 T5 cannot be ignored, resulting in obvious differences in the generated images. Maybe it may be a reasonable strategy to put T5 on the GPU first and move it back to the CPU after the embedding is generated.

@congdm
Copy link
Author

congdm commented Apr 11, 2024

Yes. We conducted the vast majority of experiments on V100, which does not support bf16, so we had to use the fp16 T5 for training. I tested and found that the output difference between the fp16 T5 and the bf16 T5 cannot be ignored, resulting in obvious differences in the generated images. Maybe it may be a reasonable strategy to put T5 on the GPU first and move it back to the CPU after the embedding is generated.

I see, thanks a lot.

Maybe it may be a reasonable strategy to put T5 on the GPU first and move it back to the CPU after the embedding is generated.

Yes, that is what I'm doing to cope with when generating highres. Another strategy would be running the encoder on another GPU (dual GPU setup).

@budui budui closed this as completed Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants