From 212a24a0f8e7e4104ddc146f1212aa24424e381b Mon Sep 17 00:00:00 2001
From: dlyakhov <daniil.lyakhov@intel.com>
Date: Thu, 5 Dec 2024 13:27:49 +0100
Subject: [PATCH] [Docs][torch.compile] NNCF quantization/compression

---
 .../openvino-workflow/torch-compile.rst       | 49 ++++++++-----------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/docs/articles_en/openvino-workflow/torch-compile.rst b/docs/articles_en/openvino-workflow/torch-compile.rst
index e5bc0ca901a5aa..3cf602e557efb9 100644
--- a/docs/articles_en/openvino-workflow/torch-compile.rst
+++ b/docs/articles_en/openvino-workflow/torch-compile.rst
@@ -310,42 +310,35 @@ officially. However, it can be accessed by running the following instructions:
        if sys.version_info >= (3, 11):
            `raise RuntimeError("Python 3.11+ not yet supported for torch.compile")
 
-Support for PyTorch 2 export quantization (Preview)
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-PyTorch 2 export quantization is supported by OpenVINO backend in ``torch.compile``. To be able
-to access this feature, follow the steps provided in
-`PyTorch 2 Export Post Training Quantization with X86 Backend through Inductor <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_x86_inductor.html>`__
-and update the provided sample as explained below.
-
-1. If you are using PyTorch version 2.3.0 or later, disable constant folding in quantization to
-   be able to benefit from the optimization in the OpenVINO backend. This can be done by passing
-   ``fold_quantize=False`` parameter into the ``convert_pt2e`` function. To do so, change this
-   line:
-
-   .. code-block:: python
-
-      converted_model = convert_pt2e(prepared_model)
+NNCF Model Optimization Support (Preview)
++++++++++++++++++++++++++++++++++++++++++++++
 
-   to the following:
+Neural Network Compression Framework(`NNCF <https://github.com/daniil-lyakhov/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18>`__)
+implements advanced quantization and compression algorihtms which could be applied to ``torch.fx.GraphModule`` to speed up inference
+and decrease memory consumption:
 
-   .. code-block:: python
-
-      converted_model = convert_pt2e(prepared_model, fold_quantize=False)
-
-2. Set ``torch.compile`` backend as OpenVINO and execute the model.
+.. code-block:: python
 
-   Update this line below:
+   import nncf
+   import openvino.torch
+   import torch
 
-   .. code-block:: python
+   calibration_loader = torch.utils.data.DataLoader(...)
 
-      optimized_model = torch.compile(converted_model)
+   def transform_fn(data_item):
+       images, _ = data_item
+       return images
 
-   As below:
+   # Model quantization
+   quantized_model = nncf.quantize(model, calibration_dataset)
+   # or compression
+   # compressed_model = nncf.compress_model(model)
 
-   .. code-block:: python
+   quantized_model = torch.compile(quantized_model, backend="openvino")
+   # compressed_model = torch.compile(compressed_model, backend="openvino")
 
-      optimized_model = torch.compile(converted_model, backend="openvino")
+For further details, please see the `documentation <https://docs.openvino.ai/2024/openvino-workflow/model-optimization.html>`__
+and an `tutorial <https://github.com/daniil-lyakhov/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18>`__.
 
 TorchServe Integration
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++