-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16 #30914
Comments
Hi @mosama1994 , use this context manager before your training block in the file that you mentioned: Cheers! |
I had seen this already. How to use this? just wrap the model loading code only? Also, why is this issue still there, in the PEFT repo it is mentioned that this should have been solved a year ago. |
@mosama1994 Basically context manager is used in with forward() and is closed before backward().
I just checked and found that the solution wasn't merged with the main branch: look here in the commit history so, that's why this error still exists. So, if you want to use autocast then you have to do so in the trainer or it would be simple to use the PR solution which was mentioned in the previous message. Modify your |
Hi @mosama1994 |
Hi @younesbelkada the solution does work. I wrapped the code in the autocast and it is working. I already have the bf16 as True in the TrainingArguments. That is why the issue is arising. When i do fp16 as True and bf16 as False, the code runs. However, with bf16 as True, the issue arises. The solution does work so I am closing but this should be mentioned in the documentation or rectified. Thanks |
System Info
transformers version = 4.40.0
python = 3.10.2
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The code at this link: https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/scripts/run_fsdp_qlora.py
is the one I am running to train llama3.
line 101: torch_dtype = torch.bfloat16
line 102: quant_storage_dtype = torch.bfloat16
When I use just float16, it runs fine. But when I use bfloat16 it gives me this error:
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
Expected behavior
Using bfloat16 for loading is causing the issue here in this code, not sure why. Please help ASAP.
The text was updated successfully, but these errors were encountered: