[Docs][TorchFX] NNCF quantization/compression

openvinotoolkit · Dec 17, 2024 · 7cb0446 · 7cb0446
1 parent 5ce6157
commit 7cb0446
Showing 1 changed file with 75 additions and 19 deletions.
diff --git a/docs/articles_en/openvino-workflow/torch-compile.rst b/docs/articles_en/openvino-workflow/torch-compile.rst
@@ -310,10 +310,84 @@ officially. However, it can be accessed by running the following instructions:
        if sys.version_info >= (3, 11):
            `raise RuntimeError("Python 3.11+ not yet supported for torch.compile")
 
+TorchServe Integration
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+TorchServe is a performant, flexible, and easy to use tool for serving PyTorch models in production. For more information on the details of TorchServe,
+you can refer to `TorchServe github repository. <https://github.com/pytorch/serve>`__. With OpenVINO ``torch.compile`` integration into TorchServe you can serve
+PyTorch models in production and accelerate them with OpenVINO on various Intel hardware. Detailed instructions on how to use OpenVINO with TorchServe are
+available in `TorchServe examples. <https://github.com/pytorch/serve/tree/master/examples/pt2/torch_compile_openvino>`__ and in a `use case app <https://github.com/pytorch/serve/tree/master/examples/usecases/llm_diffusion_serving_app>`__.
+
+Support for Automatic1111 Stable Diffusion WebUI
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Automatic1111 Stable Diffusion WebUI is an open-source repository that hosts a browser-based interface for the Stable Diffusion
+based image generation. It allows users to create realistic and creative images from text prompts.
+Stable Diffusion WebUI is supported on Intel CPUs, Intel integrated GPUs, and Intel discrete GPUs by leveraging OpenVINO
+``torch.compile`` capability. Detailed instructions are available in
+`Stable Diffusion WebUI repository. <https://github.com/openvinotoolkit/stable-diffusion-webui/wiki/Installation-on-Intel-Silicon>`__
+
+
+Model Quantization and Weights Compression
+#############################################
+
+Model quantization and compression are effective methods for accelerating model inference and reducing memory consumption, with minimal impact on model accuracy. The `torch.compile` OpenVINO backend supports two key model optimization APIs:
+
+1. Neural Network Compression Framework (`NNCF <https://docs.openvino.ai/2024/openvino-workflow/model-optimization.html>`__). NNCF offers advanced algorithms for post-training quantization and weights compression in the OpenVINO toolkit.
+
+2. PyTorch 2 export quantization. A general-purpose API designed for quantizing models captured by ``torch.export``.
+
+NNCF is the recommended approach for model quantization and weights compression. NNCF specifically optimizes models for the OpenVINO backend, providing optimal results in terms of inference speed and accuracy.
+
+
+NNCF Model Optimization Support (Preview)
++++++++++++++++++++++++++++++++++++++++++++++
+
+The Neural Network Compression Framework (`NNCF <https://docs.openvino.ai/2024/openvino-workflow/model-optimization.html>`__) implements advanced quantization and weights compression algorithms, which can be applied to ``torch.fx.GraphModule`` to speed up inference
+and decrease memory consumption.
+
+Model quantization example:
+
+.. code-block:: python
+
+   import nncf
+   import openvino.torch
+   import torch
+
+   calibration_loader = torch.utils.data.DataLoader(...)
+
+   def transform_fn(data_item):
+       images, _ = data_item
+       return images
+
+   # Model quantization
+   quantized_model = nncf.quantize(model, calibration_dataset)
+
+   quantized_model = torch.compile(quantized_model, backend="openvino")
+
+Model weights compression example:
+
+.. code-block:: python
+
+   import nncf
+   import openvino.torch
+   import torch
+
+   # Weights compression
+   compressed_model = nncf.compress_model(model)
+
+   compressed_model = torch.compile(compressed_model, backend="openvino")
+
+NNCF unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
+Advanced algorithms like ``SmoothQuant`` or ``BiasCorrection`` allow further metrics improvement while minimizing the outputs discrepancies between the original and compressed models.
+For further details, please see the `documentation <https://docs.openvino.ai/2024/openvino-workflow/model-optimization.html>`__
+and a `tutorial <https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18>`__.
+
 Support for PyTorch 2 export quantization (Preview)
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
-PyTorch 2 export quantization is supported by OpenVINO backend in ``torch.compile``. To be able
+NNCF is the default way to compress models for the OpenVINO backend, however
+PyTorch 2 export quantization is supported by OpenVINO backend in ``torch.compile`` as well. To be able
 to access this feature, follow the steps provided in
 `PyTorch 2 Export Post Training Quantization with X86 Backend through Inductor <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_x86_inductor.html>`__
 and update the provided sample as explained below.
@@ -347,24 +421,6 @@ and update the provided sample as explained below.
 
       optimized_model = torch.compile(converted_model, backend="openvino")
 
-TorchServe Integration
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-TorchServe is a performant, flexible, and easy to use tool for serving PyTorch models in production. For more information on the details of TorchServe,
-you can refer to `TorchServe github repository. <https://github.com/pytorch/serve>`__. With OpenVINO ``torch.compile`` integration into TorchServe you can serve
-PyTorch models in production and accelerate them with OpenVINO on various Intel hardware. Detailed instructions on how to use OpenVINO with TorchServe are
-available in `TorchServe examples. <https://github.com/pytorch/serve/tree/master/examples/pt2/torch_compile_openvino>`__
-
-Support for Automatic1111 Stable Diffusion WebUI
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-Automatic1111 Stable Diffusion WebUI is an open-source repository that hosts a browser-based interface for the Stable Diffusion
-based image generation. It allows users to create realistic and creative images from text prompts.
-Stable Diffusion WebUI is supported on Intel CPUs, Intel integrated GPUs, and Intel discrete GPUs by leveraging OpenVINO
-``torch.compile`` capability. Detailed instructions are available in
-`Stable Diffusion WebUI repository. <https://github.com/openvinotoolkit/stable-diffusion-webui/wiki/Installation-on-Intel-Silicon>`__
-
-
 Architecture
 #################