Skip to content

Commit

Permalink
[Docs][torch.compile] NNCF quantization/compression
Browse files Browse the repository at this point in the history
  • Loading branch information
daniil-lyakhov committed Dec 5, 2024
1 parent 9559b42 commit 212a24a
Showing 1 changed file with 21 additions and 28 deletions.
49 changes: 21 additions & 28 deletions docs/articles_en/openvino-workflow/torch-compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -310,42 +310,35 @@ officially. However, it can be accessed by running the following instructions:
if sys.version_info >= (3, 11):
`raise RuntimeError("Python 3.11+ not yet supported for torch.compile")
Support for PyTorch 2 export quantization (Preview)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

PyTorch 2 export quantization is supported by OpenVINO backend in ``torch.compile``. To be able
to access this feature, follow the steps provided in
`PyTorch 2 Export Post Training Quantization with X86 Backend through Inductor <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_x86_inductor.html>`__
and update the provided sample as explained below.

1. If you are using PyTorch version 2.3.0 or later, disable constant folding in quantization to
be able to benefit from the optimization in the OpenVINO backend. This can be done by passing
``fold_quantize=False`` parameter into the ``convert_pt2e`` function. To do so, change this
line:

.. code-block:: python
converted_model = convert_pt2e(prepared_model)
NNCF Model Optimization Support (Preview)
+++++++++++++++++++++++++++++++++++++++++++++

to the following:
Neural Network Compression Framework(`NNCF <https://github.com/daniil-lyakhov/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18>`__)
implements advanced quantization and compression algorihtms which could be applied to ``torch.fx.GraphModule`` to speed up inference
and decrease memory consumption:

.. code-block:: python
converted_model = convert_pt2e(prepared_model, fold_quantize=False)
2. Set ``torch.compile`` backend as OpenVINO and execute the model.
.. code-block:: python
Update this line below:
import nncf
import openvino.torch
import torch
.. code-block:: python
calibration_loader = torch.utils.data.DataLoader(...)
optimized_model = torch.compile(converted_model)
def transform_fn(data_item):
images, _ = data_item
return images
As below:
# Model quantization
quantized_model = nncf.quantize(model, calibration_dataset)
# or compression
# compressed_model = nncf.compress_model(model)
.. code-block:: python
quantized_model = torch.compile(quantized_model, backend="openvino")
# compressed_model = torch.compile(compressed_model, backend="openvino")
optimized_model = torch.compile(converted_model, backend="openvino")
For further details, please see the `documentation <https://docs.openvino.ai/2024/openvino-workflow/model-optimization.html>`__
and an `tutorial <https://github.com/daniil-lyakhov/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18>`__.

TorchServe Integration
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Expand Down

0 comments on commit 212a24a

Please sign in to comment.