-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC]: Added INT4 weight compression description #20812
Conversation
AlexKoff88
commented
Nov 1, 2023
•
edited
Loading
edited
- Added docs for INT4
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Show resolved
Hide resolved
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
LGTM |
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
|
||
OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized | ||
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. There is no need to do an extra step of model optimization in this case because | ||
model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? By loading the model with OVModelForCausalLM
and everything else happens automagically? Is that documented somewhere, and if so perhaps a link to there from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works as it is described, i.e. with OVModelForCausalLM
. Happy to see a proposal if it is not clear here.
docs/optimization_guide/nncf/code/weight_compression_openvino.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Nico Galoppo <[email protected]>
Co-authored-by: Nico Galoppo <[email protected]>
Co-authored-by: Nico Galoppo <[email protected]>
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md
Outdated
Show resolved
Hide resolved
OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized | ||
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. There is no need to do an extra step of model optimization in this case because | ||
model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized | |
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. There is no need to do an extra step of model optimization in this case because | |
model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it. | |
OpenVINO also supports models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__ library optimized | |
with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. Those models can be loaded and converted directly with the `from_pretrained()` methods of the `Optimum Intel <https://huggingface.co/docs/optimum/main/en/intel/inference>`__ wrappers for Hugging Face models. Model conversion will ensure that int4 optimization results are preserved and model inference will benefit from it. |
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
…ight_compression.md Co-authored-by: Tatiana Savina <[email protected]>
…ight_compression.md Co-authored-by: Tatiana Savina <[email protected]>
…ight_compression.md Co-authored-by: Tatiana Savina <[email protected]>
…ight_compression.md Co-authored-by: Tatiana Savina <[email protected]>
…ight_compression.md Co-authored-by: Tatiana Savina <[email protected]>
I think we should proceed with the merge. @yury-gorbachev, please vote if you agree. |
@tsavina, we need to have this in the release branch as well. |
* Added INT4 information into weight compression doc * Added GPTQ info. Fixed comments * Fixed list * Fixed issues. Updated Gen.AI doc * Applied comments * Added additional infor about GPTQ support * Fixed typos * Update docs/articles_en/openvino_workflow/gen_ai.md Co-authored-by: Nico Galoppo <[email protected]> * Update docs/articles_en/openvino_workflow/gen_ai.md Co-authored-by: Nico Galoppo <[email protected]> * Update docs/optimization_guide/nncf/code/weight_compression_openvino.py Co-authored-by: Nico Galoppo <[email protected]> * Applied changes * Update docs/articles_en/openvino_workflow/gen_ai.md Co-authored-by: Tatiana Savina <[email protected]> * Update docs/articles_en/openvino_workflow/gen_ai.md Co-authored-by: Tatiana Savina <[email protected]> * Update docs/articles_en/openvino_workflow/gen_ai.md Co-authored-by: Tatiana Savina <[email protected]> * Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md Co-authored-by: Tatiana Savina <[email protected]> * Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md Co-authored-by: Tatiana Savina <[email protected]> * Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md Co-authored-by: Tatiana Savina <[email protected]> * Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md Co-authored-by: Tatiana Savina <[email protected]> * Update docs/articles_en/openvino_workflow/model_optimization_guide/weight_compression.md Co-authored-by: Tatiana Savina <[email protected]> * Added table with results * One more comment --------- Co-authored-by: Nico Galoppo <[email protected]> Co-authored-by: Tatiana Savina <[email protected]>