-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSpeed inference support for int8 parameters on BLOOM? #330
Comments
@pai4451 |
As an alternative, you can use it in HuggingFace too. I haven't tried it either though. |
@pai4451 #328 (comment) |
@mayank31398 I am running my server without Internet available so I can’t use |
Quantization with int8 requires knowledge distillation and might need significant compute. |
Also, can you provide me the ds config you use to run on 16 gpus? |
Recently, HuggingFace
transformers
has a new feature on int8 quantization for all HuggingFace models. This feature could reduce the size of the large models by up to 2 without a high loss in performance. Is it possible for DeepSpeed inference to support int8 quantization for BLOOM? According to the DeepSpeed inference tutorial, DeepSpeed inference supports fp32, fp16, and int8 parameters. But when I tried BLOOM with the inference script and changeddtype=torch.int8
on line 194, an error will be raised.Any chance on DeepSpeed inference to support int8 quantization for BLOOM?
The text was updated successfully, but these errors were encountered: