openvinotoolkit · KodiaqQ · Nov 19, 2024 · Nov 11, 2024 · Nov 11, 2024 · Nov 11, 2024
@@ -1,5 +1,56 @@
 # Release Notes
 
+## New in Release 2.14.0
+
+Post-training Quantization:
+
+- Features:
+  - Introduced `backup_mode` optional parameter in `nncf.compress_weights()` to specify the data type for embeddings, convolutions and last linear layers during 4-bit weights compression. Available options are INT8_ASYM by default, INT8_SYM, and NONE which retains the original floating-point precision of the model weights.
+  - Added the `quantizer_propagation_rule` parameter, providing fine-grained control over quantizer propagation. This advanced option is designed to improve accuracy for models where quantizers with different granularity could be merged to per-tensor, potentially affecting model accuracy.
+  - Introduced `nncf.data.generate_text_data` API method that utilizes LLM to generate data for further data-aware optimization. See the [example](examples/llm_compression/openvino/tiny_llama_synthetic_data/) for details.
+  - (OpenVINO) Extended support of data-free and data-aware weight compression methods for `nncf.compress_weights()` with NF4 per-channel quantization, which makes compressed LLMs more accurate and faster on NPU.
+  - (OpenVINO) Introduced a new option `statistics_path` to cache and reuse statistics for `nncf.compress_weights()`, reducing the time required to find optimal compression configurations. See the [TinyLlama example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama_find_hyperparams) for details.
+  - (TorchFX, Experimental) Added support for quantization and weight compression of [Torch FX](https://pytorch.org/docs/stable/fx.html) models. The compressed models can be directly executed via `torch.compile(compressed_model, backend="openvino")` (see details [here](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html)). Added [INT8 quantization example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18). The list of supported features:
+    - INT8 quantization with SmoothQuant, MinMax, FastBiasCorrection, and BiasCorrection algorithms via `nncf.quantize()`.
+    - Data-free INT8, INT4, and mixed-precision weights compression with `nncf.compress_weights()`.
+  - (PyTorch, Experimental) Added model tracing and execution pre-post hooks based on TorchFunctionMode.
+- Fixes:
+  - Resolved an issue with redundant quantizer insertion before elementwise operations, reducing noise introduced by quantization.
+  - Fixed type mismatch issue for `nncf.quantize_with_accuracy_control()`.
+  - Fixed BiasCorrection algorithm for specific branching cases.
+  - (OpenVINO) Fixed GPTQ weight compression method for Stable Diffusion models.
+  - (OpenVINO) Fixed issue with the variational statistics processing for `nncf.compress_weights()`.
+  - (PyTorch, ONNX) Scaled dot product attention pattern quantization setup is aligned with OpenVINO.
+- Improvements:
+  - Reduction in peak memory by 30-50% for data-aware `nncf.compress_weights()` with AWQ, ScaleEstimation, LoRA and mixed-precision algorithms.
+  - Reduction in compression time by 10-20% for `nncf.compress_weights()` with AWQ algorithm.
+  - Aligned behavior for ignored subgraph between different `networkx` versions.
+  - Extended ignored patterns with RoPE block for `nncf.ModelType.TRANSFORMER` scheme.
+  - (OpenVINO) Extended to the ignored scope for `nncf.ModelType.TRANSFORMER` scheme with GroupNorm metatype.
+  - (ONNX) SE-block ignored pattern variant for `torchvision` mobilenet_v3 has been extended.
+- Tutorials:
+  - [Post-Training Optimization of Llama-3.2-11B-Vision Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/mllama-3.2/mllama-3.2.ipynb)
+  - [Post-Training Optimization of YOLOv11 Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov11-optimization/yolov11-object-detection.ipynb)
+  - [Post-Training Optimization of Whisper in Automatic speech recognition with OpenVINO Generate API](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb)
+  - [Post-Training Optimization of Pixtral Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/pixtral/pixtral.ipynb)
+  - [Post-Training Optimization of LLM ReAct Agent Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-agent-react/llm-agent-react.ipynb)
+  - [Post-Training Optimization of CatVTON Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/catvton/catvton.ipynb)
+- Known issues:
+  - (ONNX) `nncf.quantize()` method can generate inaccurate INT8 results for MobileNet models with the BiasCorrection algorithm.  
+
+Deprecations/Removals:
+
+- The main installation method was changed from `setup.py` to `pyproject.toml` approach. The installation through `setup.py` does not work anymore. No impact on the installation from PyPI and Conda.
+- Removed support for Python 3.8.
+- (PyTorch) `nncf.torch.create_compressed_model()` function has been deprecated.
+
+Requirements:
+
+- Updated ONNX (1.17.0) and ONNXRuntime (1.19.2) versions.
+- Updated PyTorch (2.5.1) and Torchvision (0.20.1) versions.
+- Updated NumPy version (<2.2.0).
+- Updated Ultralytics (8.3.22).
+
 ## New in Release 2.13.0
 
 Post-training Quantization:
@@ -26,6 +77,7 @@ Post-training Quantization:
   - [Post-Training Optimization of Qwen2Audio Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/qwen2-audio/qwen2-audio.ipynb)
   - [Post-Training Optimization of NuExtract Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/nuextract-structure-extraction/nuextract-structure-extraction.ipynb)
   - [Post-Training Optimization of MiniCPM-V2 Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/minicpm-v-multimodal-chatbot/minicpm-v-multimodal-chatbot.ipynb)
+  - [Post-Training Optimization of Stable Diffusion v3 Model in Torch FX format](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/stable-diffusion-v3/stable-diffusion-v3-torch-fx.ipynb)
 
 Compression-aware training: