Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 8-bit weight compression for OVModel #415

Merged
merged 2 commits into from
Sep 20, 2023

Conversation

l-bat
Copy link
Contributor

@l-bat l-bat commented Aug 24, 2023

What does this PR do?

Introduced data-free weight compression to 8 bits for OVBaseModel and OVBaseDecoderModel
PR to nncf openvinotoolkit/nncf#2059 was merged

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 29, 2023

The documentation is not available anymore as the PR was closed or merged.

alexsu52 pushed a commit to openvinotoolkit/nncf that referenced this pull request Sep 1, 2023
### Changes

Extended data free int8 weight compression algorithm for OpenVINO
backend

Example (WeightsModel):

![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c)

PR to optimum huggingface/optimum-intel#415

### Reason for changes

Optimize the model footprint and performance of large models where the
size of weights is relatively larger than the size of activations

### Related tickets

117412

### Tests

`tests/openvino/native/quantization/test_weights_compression.py`
swin transformer support verified

Results
Task: lambada_openai
|     Model |Metric|Value |   |Stderr|
|--------------|------|-----:|---|-----:|
|dolly-v2-3b_original| ppl   |5.0144|±  |0.1510|
|              |acc   |0.6297|±  |0.0067|
|dolly-v2-3b_compressed|ppl   |4.9868|±  |0.1498|
|                |acc  |0.6313|±  |0.0067|
|Llama-2-7b-chat-hf_original|ppl   |3.2788|±  |0.0866|
|       |acc   |0.7058|±  |0.0063|
|Llama-2-7b-chat-hf_compressed|ppl   |3.2856|±  |0.0869|
|       |acc   |0.7054|±  |0.0064|
l-bat added a commit to l-bat/nncf that referenced this pull request Sep 1, 2023
…kit#2059)

### Changes

Extended data free int8 weight compression algorithm for OpenVINO
backend

Example (WeightsModel):

![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c)

PR to optimum huggingface/optimum-intel#415

### Reason for changes

Optimize the model footprint and performance of large models where the
size of weights is relatively larger than the size of activations

### Related tickets

117412

### Tests

`tests/openvino/native/quantization/test_weights_compression.py`
swin transformer support verified

Results
Task: lambada_openai
|     Model |Metric|Value |   |Stderr|
|--------------|------|-----:|---|-----:|
|dolly-v2-3b_original| ppl   |5.0144|±  |0.1510|
|              |acc   |0.6297|±  |0.0067|
|dolly-v2-3b_compressed|ppl   |4.9868|±  |0.1498|
|                |acc  |0.6313|±  |0.0067|
|Llama-2-7b-chat-hf_original|ppl   |3.2788|±  |0.0866|
|       |acc   |0.7058|±  |0.0063|
|Llama-2-7b-chat-hf_compressed|ppl   |3.2856|±  |0.0869|
|       |acc   |0.7054|±  |0.0064|
@AlexKoff88
Copy link
Collaborator

I am ok with the PR but failed tests look strange

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the addition @l-bat

@l-bat l-bat force-pushed the lt/ov_compress_weights branch from 2338c7f to 6cd4a85 Compare September 20, 2023 09:56
@l-bat
Copy link
Contributor Author

l-bat commented Sep 20, 2023

@echarlaix there are some failed tests not related to my changes

Comment on lines +149 to +150
(OVModelForSequenceClassification, "hf-internal-testing/tiny-random-bert", 70, 35),
(OVModelForCausalLM, "hf-internal-testing/tiny-random-gpt2", 45, 22),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference in the quantization applied on the two models (depending on whether this is a pytorch or an openvino model) ?

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on it, the PR looks good so will merge, also added a question on the changes applied in 6cd4a85

@echarlaix echarlaix merged commit 673484b into huggingface:main Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants