-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added 8-bit weight compression for OVModel #415
Conversation
The documentation is not available anymore as the PR was closed or merged. |
### Changes Extended data free int8 weight compression algorithm for OpenVINO backend Example (WeightsModel): ![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c) PR to optimum huggingface/optimum-intel#415 ### Reason for changes Optimize the model footprint and performance of large models where the size of weights is relatively larger than the size of activations ### Related tickets 117412 ### Tests `tests/openvino/native/quantization/test_weights_compression.py` swin transformer support verified Results Task: lambada_openai | Model |Metric|Value | |Stderr| |--------------|------|-----:|---|-----:| |dolly-v2-3b_original| ppl |5.0144|± |0.1510| | |acc |0.6297|± |0.0067| |dolly-v2-3b_compressed|ppl |4.9868|± |0.1498| | |acc |0.6313|± |0.0067| |Llama-2-7b-chat-hf_original|ppl |3.2788|± |0.0866| | |acc |0.7058|± |0.0063| |Llama-2-7b-chat-hf_compressed|ppl |3.2856|± |0.0869| | |acc |0.7054|± |0.0064|
…kit#2059) ### Changes Extended data free int8 weight compression algorithm for OpenVINO backend Example (WeightsModel): ![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c) PR to optimum huggingface/optimum-intel#415 ### Reason for changes Optimize the model footprint and performance of large models where the size of weights is relatively larger than the size of activations ### Related tickets 117412 ### Tests `tests/openvino/native/quantization/test_weights_compression.py` swin transformer support verified Results Task: lambada_openai | Model |Metric|Value | |Stderr| |--------------|------|-----:|---|-----:| |dolly-v2-3b_original| ppl |5.0144|± |0.1510| | |acc |0.6297|± |0.0067| |dolly-v2-3b_compressed|ppl |4.9868|± |0.1498| | |acc |0.6313|± |0.0067| |Llama-2-7b-chat-hf_original|ppl |3.2788|± |0.0866| | |acc |0.7058|± |0.0063| |Llama-2-7b-chat-hf_compressed|ppl |3.2856|± |0.0869| | |acc |0.7054|± |0.0064|
I am ok with the PR but failed tests look strange |
f919ccc
to
2338c7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the addition @l-bat
2338c7f
to
6cd4a85
Compare
@echarlaix there are some failed tests not related to my changes |
(OVModelForSequenceClassification, "hf-internal-testing/tiny-random-bert", 70, 35), | ||
(OVModelForCausalLM, "hf-internal-testing/tiny-random-gpt2", 45, 22), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference in the quantization applied on the two models (depending on whether this is a pytorch or an openvino model) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on it, the PR looks good so will merge, also added a question on the changes applied in 6cd4a85
What does this PR do?
Introduced data-free weight compression to 8 bits for
OVBaseModel
andOVBaseDecoderModel
PR to nncf openvinotoolkit/nncf#2059 was merged
Before submitting