diff --git a/ReleaseNotes.md b/ReleaseNotes.md index 4d9f3d83b6a..f90fff64e6d 100644 --- a/ReleaseNotes.md +++ b/ReleaseNotes.md @@ -1,5 +1,52 @@ # Release Notes +## New in Release 2.7.0 + +Post-training Quantization: + +- Features: + - (OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (`compress_weights(…)` pipeline). + - (OpenVINO) Added support for [IF operation](https://docs.openvino.ai/latest/openvino_docs_ops_infrastructure_If_8.html) quantization. + - (OpenVINO) Added `dump_intermediate_model` parameter support for AccuracyAwareAlgorithm (`quantize_with_accuracy_control(…)` pipeline). + - (OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (`quantize_with_tune_hyperparams(…)` pipeline). + - (PyTorch) Post-training Quantization is now supported with `quantize(…)` pipeline and the common implementation of quantization algorithms. Deprecated `create_compressed_model()` method for Post-training Quantization. + - Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for `ModelType.Transformer` scheme. + - `QuantizationPreset.Mixed` was set as the default for `ModelType.Transformer` scheme. +- Fixes: + - (OpenVINO, ONNX, PyTorch) Aligned/added patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance/metrics. + - Fixed patterns for `ModelType.Transformer` to align with the [quantization scheme](https://docs.openvino.ai/latest/openvino_docs_OV_UG_lpt.html). +- Improvements: + - Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates. + - (OpenVINO) Optimized WeightsCompression algorithm (`compress_weights(…)` pipeline) execution time for LLM's quantization, added ignored scope support. + - (OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (`quantize_with_accuracy_control(…)` pipeline). + - (OpenVINO) Added [extract_ov_subgraph tool](tools/extract_ov_subgraph.py) for large IR subgraph extraction. + - (ONNX) Optimized quantization pipeline (up to 1.15x speed up). +- Tutorials: + - [Post-Training Optimization of BLIP Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/233-blip-visual-language-processing) + - [Post-Training Optimization of DeepFloyd IF Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/238-deepfloyd-if) + - [Post-Training Optimization of Grammatical Error Correction Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/214-grammar-correction) + - [Post-Training Optimization of Dolly 2.0 Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/240-dolly-2-instruction-following) + - [Post-Training Optimization of Massively Multilingual Speech Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/255-mms-massively-multilingual-speech) + - [Post-Training Optimization of OneFormer Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/249-oneformer-segmentation) + - [Post-Training Optimization of InstructPix2Pix Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/231-instruct-pix2pix-image-editing) + - [Post-Training Optimization of LLaVA Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/257-llava-multimodal-chatbot) + - [Post-Training Optimization of Latent Consistency Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/263-latent-consistency-models-image-generation) + - [Post-Training Optimization of Distil-Whisper Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/267-distil-whisper-asr) + - [Post-Training Optimization of FastSAM Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/261-fast-segment-anything) +- Known issues: + - (ONNX) `quantize(...)` method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use the `do_constant_folding=True` option during export from PyTorch to ONNX. + +Compression-aware training: + +- Fixes: + - (PyTorch) Fixed Hessian trace calculation to solve [#2155](https://github.com/openvinotoolkit/nncf/issues/2155) issue. +- Requirements: + - Updated PyTorch version (2.1.0). + - Updated numpy version (<1.27). +- Deprecations/Removals: + - (PyTorch) Removed legacy external quantizer storage names. + - (PyTorch) Removed torch < 2.0 version support. + ## New in Release 2.6.0 Post-training Quantization: diff --git a/docs/Installation.md b/docs/Installation.md index 17063a177f0..e1a9624ce4f 100644 --- a/docs/Installation.md +++ b/docs/Installation.md @@ -69,7 +69,8 @@ as well as the supported versions of Python: | NNCF | OpenVINO | PyTorch | ONNX | TensorFlow | Python | |-----------|------------|----------|----------|------------|--------| -| `develop` | `2023.1.0` | `2.1` | `1.13.1` | `2.12.0` | `3.8` | +| `develop` | `2023.2.0` | `2.1` | `1.13.1` | `2.12.0` | `3.8` | +| `2.7.0` | `2023.2.0` | `2.1` | `1.13.1` | `2.12.0` | `3.8` | | `2.6.0` | `2023.1.0` | `2.0.1` | `1.13.1` | `2.12.0` | `3.8` | | `2.5.0` | `2023.0.0` | `1.13.1` | `1.13.1` | `2.11.1` | `3.8` | | `2.4.0` | `2022.1.0` | `1.12.1` | `1.12.0` | `2.8.2` | `3.8` | diff --git a/nncf/version.py b/nncf/version.py index 7e34bdba420..fd30864ff0e 100644 --- a/nncf/version.py +++ b/nncf/version.py @@ -9,7 +9,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "2.6.0" +__version__ = "2.7.0" BKC_TORCH_VERSION = "2.1.0" BKC_TORCHVISION_VERSION = "0.16.0"