Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: Added INT4 weight compression description #20812

Merged
merged 23 commits into from
Nov 8, 2023

Conversation

AlexKoff88
Copy link
Contributor

@AlexKoff88 AlexKoff88 commented Nov 1, 2023

  • Added docs for INT4

@AlexKoff88 AlexKoff88 requested a review from a team as a code owner November 1, 2023 10:35
@AlexKoff88 AlexKoff88 requested review from bstankix and removed request for a team November 1, 2023 10:35
@github-actions github-actions bot added the category: docs OpenVINO documentation label Nov 1, 2023
@yury-gorbachev
Copy link
Contributor

LGTM

docs/articles_en/openvino_workflow/gen_ai.md Outdated Show resolved Hide resolved
docs/articles_en/openvino_workflow/gen_ai.md Outdated Show resolved Hide resolved
docs/articles_en/openvino_workflow/gen_ai.md Outdated Show resolved Hide resolved

OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. There is no need to do an extra step of model optimization in this case because
model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? By loading the model with OVModelForCausalLM and everything else happens automagically? Is that documented somewhere, and if so perhaps a link to there from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works as it is described, i.e. with OVModelForCausalLM . Happy to see a proposal if it is not clear here.

@AlexKoff88 AlexKoff88 requested a review from tsavina November 2, 2023 09:07
Comment on lines 54 to 56
OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. There is no need to do an extra step of model optimization in this case because
model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. There is no need to do an extra step of model optimization in this case because
model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it.
OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. Those models can be loaded and converted directly with the `from_pretrained()` methods of the `Optimum Intel <https://huggingface.co/docs/optimum/main/en/intel/inference>`__ wrappers for Hugging Face models. Model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it.

@AlexKoff88
Copy link
Contributor Author

I think we should proceed with the merge. @yury-gorbachev, please vote if you agree.

@AlexKoff88 AlexKoff88 merged commit 0f260c2 into openvinotoolkit:master Nov 8, 2023
11 checks passed
@AlexKoff88
Copy link
Contributor Author

@tsavina, we need to have this in the release branch as well.

allnes pushed a commit to allnes/openvino that referenced this pull request Nov 23, 2023
* Added INT4 information into weight compression doc

* Added GPTQ info. Fixed comments

* Fixed list

* Fixed issues. Updated Gen.AI doc

* Applied comments

* Added additional infor about GPTQ support

* Fixed typos

* Update docs/articles_en/openvino_workflow/gen_ai.md

Co-authored-by: Nico Galoppo <[email protected]>

* Update docs/articles_en/openvino_workflow/gen_ai.md

Co-authored-by: Nico Galoppo <[email protected]>

* Update docs/optimization_guide/nncf/code/weight_compression_openvino.py

Co-authored-by: Nico Galoppo <[email protected]>

* Applied changes

* Update docs/articles_en/openvino_workflow/gen_ai.md

Co-authored-by: Tatiana Savina <[email protected]>

* Update docs/articles_en/openvino_workflow/gen_ai.md

Co-authored-by: Tatiana Savina <[email protected]>

* Update docs/articles_en/openvino_workflow/gen_ai.md

Co-authored-by: Tatiana Savina <[email protected]>

* Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md

Co-authored-by: Tatiana Savina <[email protected]>

* Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md

Co-authored-by: Tatiana Savina <[email protected]>

* Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md

Co-authored-by: Tatiana Savina <[email protected]>

* Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md

Co-authored-by: Tatiana Savina <[email protected]>

* Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md

Co-authored-by: Tatiana Savina <[email protected]>

* Added table with results

* One more comment

---------

Co-authored-by: Nico Galoppo <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants