v2.9.0
Post-training Quantization:
Features:
- (OpenVINO) Added modified AWQ algorithm for 4-bit data-aware weights compression. This algorithm applied only for patterns
MatMul->Multiply->Matmul
. For thatawq
optional parameter has been added tonncf.compress_weights()
and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time). - (ONNX) Introduced support for the ONNX backend in the
nncf.quantize_with_accuracy_control()
method. Users can now perform quantization with accuracy control foronnx.ModelProto
. By leveraging this feature, users can enhance the accuracy of quantized models while minimizing performance impact. - (ONNX) Added an example based on the YOLOv8n-seg model for demonstrating the usage of quantization with accuracy control for the ONNX backend.
- (PT) Added SmoothQuant algorithm for PyTorch backend in
nncf.quantize()
. - (OpenVINO) Added an example with the hyperparameters tuning for the TinyLLama model.
- Introduced the
nncf.AdvancedAccuracyRestorerParameters
. - Introduced the
subset_size
option for thenncf.compress_weights()
. - Introduced
TargetDevice.NPU
as the replacement forTargetDevice.VPU
.
Fixes:
- Fixed API Enums serialization/deserialization issue.
- Fixed issue with required arguments for
revert_operations_to_floating_point_precision
method.
Improvements:
- (ONNX) Aligned statistics collection with OpenVINO and PyTorch backends.
- Extended
nncf.compress_weights()
with Convolution & Embeddings compression in order to reduce memory footprint.
Deprecations/Removals:
- (OpenVINO) Removed outdated examples with
nncf.quantize()
for BERT and YOLOv5 models. - (OpenVINO) Removed outdated example with
nncf.quantize_with_accuracy_control()
for SSD MobileNetV1 FPN model. - (PyTorch) Deprecated the
binarization
algorithm. - Removed Post-training Optimization Tool as OpenVINO backend.
- Removed Dockerfiles.
TargetDevice.VPU
was replaced byTargetDevice.NPU
.
Tutorials:
- Post-Training Optimization of Stable Diffusion v2 Model
- Post-Training Optimization of DeciDiffusion Model
- Post-Training Optimization of DepthAnything Model
- Post-Training Optimization of Stable Diffusion ControlNet Model
Compression-aware training:
Fixes
- (PyTorch) Fixed issue with
NNCFNetworkInterface.get_clean_shallow_copy
missed arguments.
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@AishwaryaDekhane
@UsingtcNower
@Om-Doiphode