Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release_v2140] Update ReleaseNotes.md #3071

Merged
Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,64 @@
# Release Notes

## New in Release 2.14.0

Post-training Quantization:

- Breaking changes:
- ...
- General:
- ...
- Features:
- (OpenVINO) Extended support of data-free and data-aware weights compression methods ([nncf.compress_weights()](docs/usage/post_training_compression/weights_compression/Usage.md#user-guide) API) with NF4 per-channel quantization, which makes compressed LLMs more accurate and faster on NPU.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- Introduced `backup_mode` optional parameter in `nncf.compress_weights()` to specify the data type for embeddings, convolutions and last linear layers during 4-bit weights compression. Available options are INT8_ASYM by default, INT8_SYM, and NONE which retains the original floating-point precision of the model weights.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- Added preview support for the optimization of models in [Torch FX](https://pytorch.org/docs/stable/fx.html) format, nncf.quantize() and nncf.compress_weights() methods. After the optimization such models can be directly executed via [torch.compile()](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html). See [int8 quantization example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18) for more details.
alexsu52 marked this conversation as resolved.
Show resolved Hide resolved
- ...
- Fixes:
- (OpenVINO) Fixed GPTQ weight compression method for Stable Diffusion models.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- (Torch, ONNX) Scaled dot product attention pattern quantization setup is aligned with OpenVINO.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- ...
- Improvements:
- The `ultralytics` version has been updated to 8.3.22.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- Reduction in peak memory by 30-50% for data-aware weight compression with AWQ, SE, LoRA and mixed precision algorithms.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Reduction in compression time by 10-20% for weight compression with AWQ algorithm.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- ...
- Tutorials:
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
- [Post-Training Optimization of Llama-3.2-11B-Vision Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/mllama-3.2/mllama-3.2.ipynb)
- [Post-Training Optimization of YOLOv11 Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov11-optimization/yolov11-object-detection.ipynb)
- [Post-Training Optimization of Whisper Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb)
- [Post-Training Optimization of Pixtral Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/pixtral/pixtral.ipynb)
- [Post-Training Optimization of LLM ReAct Agent Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-agent-react/llm-agent-react.ipynb)
- Known issues:
- ...

Compression-aware training:

- Breaking changes:
- ...
- General:
- ...
- Features:
- ...
- Fixes:
- ...
- Improvements:
- ...
- Tutorials:
- ...
- Known issues:
- ...

Deprecations/Removals:

- nncf.torch.create_compressed_model() function has been deprecated for PyTorch backend.
- Removed support for python 3.8.
- The `tensorflow_addons` has been removed from the dependencies.
- ...

Requirements:

- ...

## New in Release 2.13.0

Post-training Quantization:
Expand Down
Loading