-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] deepspeed-inference seems not working correctly with torch.half on Pascal GPU #2194
Comments
This might be related to the weird output I see with |
@zcrypt0 the 350m model is one you shouldn't use. I believe it's not fully trained and will therefore output garbage. The model page was recently updated and HF suggests you use the 560m version: |
@wkkautas Thanks for reporting your issue. I'll try to reproduce what you're seeing and report back |
@mrwyattii I saw weird output with other models too, including the inference tutorial. I switched over to Volta now and everything is working well. |
@wkkautas I am able to reproduce this error on P40. I also noted that MP>1 with FP16 is breaking here (but FP32 seems to work). We are working on a solution |
Just removing these #if seems good.
Is there any reason to set cuda arch higher than 700? (P40 would be 610) Thank you, |
If you are still seeing this issue, please reopen. |
Describe the bug
Thanks for releasing deepspeed-inference.
I'm following tutorial, https://www.deepspeed.ai/tutorials/inference-tutorial/#end-to-end-gpt-neo-27b-inference and I want to do inference with half-precision by setting
dtype=torch.half
.However, when using Tesla P40, it seems not working correctly generating unmeaningful text such as
[{'generated_text': 'DeepSpeed is that one one S\'s of more it his B in B I it a I and an- two The an high B it all.. or old in a D of B T the,\n F and the " S S The'}]
.As a side note, when I switched GPU to Tesla T4 with same environment setting and script, this issue was not observed (attached log in Additional context).
Would be Pascal GPU not supported in deepspeed-inference?
To Reproduce
Expected behavior
generated_text should be some meaningful text.
ds_report output
System info (please complete the following information):
Additional context
When I switched GPU to Tesla T4, this issue was not observed.
The text was updated successfully, but these errors were encountered: