Skip to content

Commit

Permalink
[DOCS] Fixing code snippet in Weight Compression (openvinotoolkit#24306)
Browse files Browse the repository at this point in the history
Fixing code snippet for 4-bit compression in `Weight Compression`
article.
  • Loading branch information
sgolebiewski-intel authored Apr 29, 2024
1 parent 4a1363d commit abc40f3
Showing 1 changed file with 12 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,21 @@ where INT4 is considered as the primary precision and INT8 is the backup one.
It usually results in a smaller model size and lower inference latency, although the accuracy
degradation could be higher, depending on the model.

The code snippet below shows how to do 4-bit quantization of the model weights represented in OpenVINO IR using NNCF:

.. tab-set::

.. tab-item:: OpenVINO
:sync: openvino

.. doxygensnippet:: docs/optimization_guide/nncf/code/weight_compression_openvino.py
:language: python
:fragment: [compression_4bit]


The table below summarizes the benefits and trade-offs for each compression type in terms of
memory reduction, speed gain, and accuracy loss.

.. code-block:: python
.. list-table::
:widths: 25 20 20 20
:header-rows: 1
Expand Down

0 comments on commit abc40f3

Please sign in to comment.