From 212a24a0f8e7e4104ddc146f1212aa24424e381b Mon Sep 17 00:00:00 2001 From: dlyakhov Date: Thu, 5 Dec 2024 13:27:49 +0100 Subject: [PATCH] [Docs][torch.compile] NNCF quantization/compression --- .../openvino-workflow/torch-compile.rst | 49 ++++++++----------- 1 file changed, 21 insertions(+), 28 deletions(-) diff --git a/docs/articles_en/openvino-workflow/torch-compile.rst b/docs/articles_en/openvino-workflow/torch-compile.rst index e5bc0ca901a5aa..3cf602e557efb9 100644 --- a/docs/articles_en/openvino-workflow/torch-compile.rst +++ b/docs/articles_en/openvino-workflow/torch-compile.rst @@ -310,42 +310,35 @@ officially. However, it can be accessed by running the following instructions: if sys.version_info >= (3, 11): `raise RuntimeError("Python 3.11+ not yet supported for torch.compile") -Support for PyTorch 2 export quantization (Preview) -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - -PyTorch 2 export quantization is supported by OpenVINO backend in ``torch.compile``. To be able -to access this feature, follow the steps provided in -`PyTorch 2 Export Post Training Quantization with X86 Backend through Inductor `__ -and update the provided sample as explained below. - -1. If you are using PyTorch version 2.3.0 or later, disable constant folding in quantization to - be able to benefit from the optimization in the OpenVINO backend. This can be done by passing - ``fold_quantize=False`` parameter into the ``convert_pt2e`` function. To do so, change this - line: - - .. code-block:: python - - converted_model = convert_pt2e(prepared_model) +NNCF Model Optimization Support (Preview) ++++++++++++++++++++++++++++++++++++++++++++++ - to the following: +Neural Network Compression Framework(`NNCF `__) +implements advanced quantization and compression algorihtms which could be applied to ``torch.fx.GraphModule`` to speed up inference +and decrease memory consumption: - .. code-block:: python - - converted_model = convert_pt2e(prepared_model, fold_quantize=False) - -2. Set ``torch.compile`` backend as OpenVINO and execute the model. +.. code-block:: python - Update this line below: + import nncf + import openvino.torch + import torch - .. code-block:: python + calibration_loader = torch.utils.data.DataLoader(...) - optimized_model = torch.compile(converted_model) + def transform_fn(data_item): + images, _ = data_item + return images - As below: + # Model quantization + quantized_model = nncf.quantize(model, calibration_dataset) + # or compression + # compressed_model = nncf.compress_model(model) - .. code-block:: python + quantized_model = torch.compile(quantized_model, backend="openvino") + # compressed_model = torch.compile(compressed_model, backend="openvino") - optimized_model = torch.compile(converted_model, backend="openvino") +For further details, please see the `documentation `__ +and an `tutorial `__. TorchServe Integration +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++