Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features: float16 if GPU available, float32 if CPU only #30

Closed
woctezuma opened this issue Jan 31, 2021 · 3 comments
Closed

Features: float16 if GPU available, float32 if CPU only #30

woctezuma opened this issue Jan 31, 2021 · 3 comments

Comments

@woctezuma
Copy link

woctezuma commented Jan 31, 2021

Hello,

I have noticed a discrepancy between the dtype of the features (both for images and texts) depending on the availability of a GPU.

  • if I run the code with CPU only on Colab, image_features.dtype returns torch.float32.
    This happens if do not install the package properly, and then do not pay attention that device is set to cpu.
%pip install git+https://github.com/openai/CLIP.git
  • if I run the code with GPU on Colab, image_features.dtype returns torch.float16.
    This happens if I follow the installation process properly and install proper versions of PyTorch (1.7.1+cu101) for Colab:
torch==1.7.1+cu101
torchvision==0.8.2+cu101

Q1: Is there a good reason why both versions do not return the same dtype? Is it due to AMP with GPU?

Moreover, if I wanted to store normalized features, float16 would allow me to cut in half the file size, so I would like to ensure that casting the float32 results (obtained with CPU only) to float16 would not actually lead to a loss of precision.

Q2: Would casting the results to float16 be totally safe? Or would it be safer to cast to float32 instead?

Finally, the discrepancy can be slightly confusing for people who would pre-compute features on a machine with GPU, and then use the pre-computed features along with features computed on the fly in a web app with CPU only. This is how I noticed the discrepancy when running this line:

logits = 100. * image_features @ zeroshot_weights

where image_features were computed on the fly (float32) and zeroshot_weights had been pre-computed (float16).

woctezuma added a commit to woctezuma/match-steam-banners that referenced this issue Jan 31, 2021
woctezuma added a commit to woctezuma/match-steam-banners that referenced this issue Jan 31, 2021
@jongwook
Copy link
Collaborator

jongwook commented Feb 1, 2021

Q1: This part of the loader code patches all fp16 operations to fp32 when loading on CPU. The main reason for this is because PyTorch does not support all fp16 operations in CPU mode.

Q2: In general, there can be a slight loss of accuracy compared to fp32, and training with fp16 weights can become unstable. Empirically, we have been doing CLIP inference in fp16 without much problem, and that's how the model was trained for anyway.

AFAIK, there's no runtime performance benefit of using FP16 operations on CPUs even if it's supported, because they are cast to fp32 anyway. BF16 SIMD was introduced in Cooper Lake last year, so this might change in the future.

@woctezuma
Copy link
Author

Thank you for the precision! :)

@ProGamerGov
Copy link
Contributor

ProGamerGov commented Feb 3, 2022

For anyone else who comes across this feature and wants to use torch.float32 with the GPU: You can load CLIP models on to the GPU by using torch.device("cpu") for the device argument, and then placing the model onto the GPU after it's been downloaded and loaded:

import clip
clip_model = clip.load('RN50x4', jit=False, device=torch.device("cpu"))[0].to(device)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants