Added 8-bit weight compression for OVModel #415

l-bat · 2023-08-24T10:23:10Z

What does this PR do?

Introduced data-free weight compression to 8 bits for OVBaseModel and OVBaseDecoderModel
PR to nncf openvinotoolkit/nncf#2059 was merged

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-08-29T14:27:36Z

The documentation is not available anymore as the PR was closed or merged.

### Changes Extended data free int8 weight compression algorithm for OpenVINO backend Example (WeightsModel): ![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c) PR to optimum huggingface/optimum-intel#415 ### Reason for changes Optimize the model footprint and performance of large models where the size of weights is relatively larger than the size of activations ### Related tickets 117412 ### Tests `tests/openvino/native/quantization/test_weights_compression.py` swin transformer support verified Results Task: lambada_openai | Model |Metric|Value | |Stderr| |--------------|------|-----:|---|-----:| |dolly-v2-3b_original| ppl |5.0144|± |0.1510| | |acc |0.6297|± |0.0067| |dolly-v2-3b_compressed|ppl |4.9868|± |0.1498| | |acc |0.6313|± |0.0067| |Llama-2-7b-chat-hf_original|ppl |3.2788|± |0.0866| | |acc |0.7058|± |0.0063| |Llama-2-7b-chat-hf_compressed|ppl |3.2856|± |0.0869| | |acc |0.7054|± |0.0064|

…kit#2059) ### Changes Extended data free int8 weight compression algorithm for OpenVINO backend Example (WeightsModel): ![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c) PR to optimum huggingface/optimum-intel#415 ### Reason for changes Optimize the model footprint and performance of large models where the size of weights is relatively larger than the size of activations ### Related tickets 117412 ### Tests `tests/openvino/native/quantization/test_weights_compression.py` swin transformer support verified Results Task: lambada_openai | Model |Metric|Value | |Stderr| |--------------|------|-----:|---|-----:| |dolly-v2-3b_original| ppl |5.0144|± |0.1510| | |acc |0.6297|± |0.0067| |dolly-v2-3b_compressed|ppl |4.9868|± |0.1498| | |acc |0.6313|± |0.0067| |Llama-2-7b-chat-hf_original|ppl |3.2788|± |0.0866| | |acc |0.7058|± |0.0063| |Llama-2-7b-chat-hf_compressed|ppl |3.2856|± |0.0869| | |acc |0.7054|± |0.0064|

AlexKoff88 · 2023-09-05T15:46:26Z

I am ok with the PR but failed tests look strange

echarlaix

LGTM, thanks for the addition @l-bat

l-bat · 2023-09-20T10:46:56Z

@echarlaix there are some failed tests not related to my changes

echarlaix · 2023-09-20T16:29:40Z

tests/openvino/test_quantization.py

+        (OVModelForSequenceClassification, "hf-internal-testing/tiny-random-bert", 70, 35),
+        (OVModelForCausalLM, "hf-internal-testing/tiny-random-gpt2", 45, 22),


What is the difference in the quantization applied on the two models (depending on whether this is a pytorch or an openvino model) ?

echarlaix

Thanks for iterating on it, the PR looks good so will merge, also added a question on the changes applied in 6cd4a85

l-bat marked this pull request as draft August 24, 2023 10:23

l-bat mentioned this pull request Aug 24, 2023

Added weight compression algorithm for OpenVINO backend openvinotoolkit/nncf#2059

Merged

AlexKoff88 approved these changes Sep 5, 2023

View reviewed changes

nikita-savelyevv mentioned this pull request Sep 15, 2023

Added weight compression for Dolly 2.0 openvinotoolkit/openvino_notebooks#1319

Merged

l-bat force-pushed the lt/ov_compress_weights branch from f919ccc to 2338c7f Compare September 19, 2023 07:54

l-bat marked this pull request as ready for review September 19, 2023 07:55

echarlaix approved these changes Sep 19, 2023

View reviewed changes

l-bat force-pushed the lt/ov_compress_weights branch from 2338c7f to 6cd4a85 Compare September 20, 2023 09:56

l-bat added 2 commits September 20, 2023 13:00

Added 8-bit weight compression for OVModel

6cd4a85

fix test

92fd2d1

echarlaix reviewed Sep 20, 2023

View reviewed changes

echarlaix approved these changes Sep 20, 2023

View reviewed changes

echarlaix merged commit 673484b into huggingface:main Sep 20, 2023

echarlaix mentioned this pull request Sep 26, 2023

Add OpenVINO weights compression to docs #435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added 8-bit weight compression for OVModel #415

Added 8-bit weight compression for OVModel #415

l-bat commented Aug 24, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 29, 2023 •

edited

Loading

AlexKoff88 commented Sep 5, 2023

echarlaix left a comment

l-bat commented Sep 20, 2023

echarlaix Sep 20, 2023

echarlaix left a comment

		(OVModelForSequenceClassification, "hf-internal-testing/tiny-random-bert", 70, 35),
		(OVModelForCausalLM, "hf-internal-testing/tiny-random-gpt2", 45, 22),

Added 8-bit weight compression for OVModel #415

Added 8-bit weight compression for OVModel #415

Conversation

l-bat commented Aug 24, 2023 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Aug 29, 2023 • edited Loading

AlexKoff88 commented Sep 5, 2023

echarlaix left a comment

Choose a reason for hiding this comment

l-bat commented Sep 20, 2023

echarlaix Sep 20, 2023

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

l-bat commented Aug 24, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 29, 2023 •

edited

Loading