Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release_v2140] Update ReleaseNotes.md #3071

Merged
Merged
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,56 @@
# Release Notes

## New in Release 2.14.0

Post-training Quantization:

- Features:
- Introduced `backup_mode` optional parameter in `nncf.compress_weights()` to specify the data type for embeddings, convolutions and last linear layers during 4-bit weights compression. Available options are INT8_ASYM by default, INT8_SYM, and NONE which retains the original floating-point precision of the model weights.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- Added the `quantizer_propagation_rule` parameter, providing fine-grained control over quantizer propagation. This advanced option is designed to improve accuracy for models where quantizers with different granularity could be merged to per-tensor, potentially affecting model accuracy.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- Introduced `nncf.data.generate_text_data` API method that utilizes LLM to generate data for further data-aware optimization. See the [example](examples/llm_compression/openvino/tiny_llama_synthetic_data/) for details.
- (OpenVINO) Extended support of data-free and data-aware weight compression methods for `nncf.compress_weights()` with NF4 per-channel quantization, which makes compressed LLMs more accurate and faster on NPU.
- (OpenVINO) Introduced a new option `statistics_path` to cache and reuse statistics for `nncf.compress_weights()`, reducing the time required to find optimal compression configurations. See the [TinyLlama example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama_find_hyperparams) for details.
- (TorchFX, Experimental) Added support for quantization and weight compression of [Torch FX](https://pytorch.org/docs/stable/fx.html) models. The compressed models can be directly executed via `torch.compile(compressed_model, backend="openvino")` (see details [here](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html)). Added [INT8 quantization example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18). The list of supported features:
- INT8 quantization with SmoothQuant, MinMax, FastBiasCorrection, and BiasCorrection algorithms via `nncf.quantize()`.
- Data-free INT8, INT4, and mixed-precision weights compression with `nncf.compress_weights()`.
- (PyTorch, Experimental) Added model tracing and execution pre-post hooks based on TorchFunctionMode.
- Fixes:
- Resolved an issue with redundant quantizer insertion before elementwise operations, reducing noise introduced by quantization.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- Fixed type mismatch issue for `nncf.quantize_with_accuracy_control()`.
- Fixed BiasCorrection algorithm for specific branching cases.
- (OpenVINO) Fixed GPTQ weight compression method for Stable Diffusion models.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- (OpenVINO) Fixed issue with the variational statistics processing for `nncf.compress_weights()`.
- (PyTorch, ONNX) Scaled dot product attention pattern quantization setup is aligned with OpenVINO.
- Improvements:
- Reduction in peak memory by 30-50% for data-aware `nncf.compress_weights()` with AWQ, ScaleEstimation, LoRA and mixed-precision algorithms.
- Reduction in compression time by 10-20% for `nncf.compress_weights()` with AWQ algorithm.
- Aligned behavior for ignored subgraph between different `networkx` versions.
- Extended ignored patterns with RoPE block for `nncf.ModelType.TRANSFORMER` scheme.
- (OpenVINO) Extended to the ignored scope for `nncf.ModelType.TRANSFORMER` scheme with GroupNorm metatype.
- (ONNX) SE-block ignored pattern variant for `torchvision` mobilenet_v3 has been extended.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- Tutorials:
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- [Post-Training Optimization of Llama-3.2-11B-Vision Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/mllama-3.2/mllama-3.2.ipynb)
- [Post-Training Optimization of YOLOv11 Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov11-optimization/yolov11-object-detection.ipynb)
- [Post-Training Optimization of Whisper in Automatic speech recognition with OpenVINO Generate API](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb)
- [Post-Training Optimization of Pixtral Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/pixtral/pixtral.ipynb)
- [Post-Training Optimization of LLM ReAct Agent Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-agent-react/llm-agent-react.ipynb)
- [Post-Training Optimization of CatVTON Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/catvton/catvton.ipynb)
alexsu52 marked this conversation as resolved.
Show resolved Hide resolved
- Known issues:
- (ONNX) `nncf.quantize()` method can generate inaccurate INT8 results for MobileNet models with the BiasCorrection algorithm.

Deprecations/Removals:

- The main installation method was changed from `setup.py` to `pyproject.toml` approach. The installation through `setup.py` does not work anymore. No impact on the installation from PyPI and Conda.
- Removed support for Python 3.8.
- (PyTorch) `nncf.torch.create_compressed_model()` function has been deprecated.

Requirements:

- Updated ONNX (1.17.0) and ONNXRuntime (1.19.2) versions.
- Updated PyTorch (2.5.1) and Torchvision (0.20.1) versions.
- Updated NumPy version (<2.2.0).
- Updated Ultralytics (8.3.22).

## New in Release 2.13.0

Post-training Quantization:
Expand All @@ -26,6 +77,7 @@ Post-training Quantization:
- [Post-Training Optimization of Qwen2Audio Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/qwen2-audio/qwen2-audio.ipynb)
- [Post-Training Optimization of NuExtract Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/nuextract-structure-extraction/nuextract-structure-extraction.ipynb)
- [Post-Training Optimization of MiniCPM-V2 Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/minicpm-v-multimodal-chatbot/minicpm-v-multimodal-chatbot.ipynb)
- [Post-Training Optimization of Stable Diffusion v3 Model in Torch FX format](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/stable-diffusion-v3/stable-diffusion-v3-torch-fx.ipynb)

Compression-aware training:

Expand Down
Loading