v2.10.0
Post-training Quantization:
Features:
- Introduced the subgraph defining functionality for the nncf.IgnoredScope() option.
- Introduced limited support for the batch size of more than 1. MobilenetV2 PyTorch example was updated with batch support.
Fixes:
- Fixed issue with the nncf.OverflowFix parameter absence in some scenarios.
- Aligned the list of correctable layers for the FastBiasCorrection algorithm between PyTorch, OpenVINO and ONNX backends.
- Fixed issue with the nncf.QuantizationMode parameters combination.
- Fixed MobilenetV2 (PyTorch, ONNX, OpenVINO) examples for the Windows platform.
- (OpenVINO) Fixed Anomaly Classification example for the Windows platform.
- (PyTorch) Fixed bias shift magnitude calculation for fused layers.
- (OpenVINO) Fixed removing the ShapeOf graph which led to an error in the nncf.quantize_with_accuracy_control() method.
- Improvements:
- OverflowFix, AdvancedSmoothQuantParameters and AdvancedBiasCorrectionParameters were exposed into the nncf.* namespace.
- (OpenVINO, PyTorch) Introduced scale compression to FP16 for weights in nncf.compress_weights() method, regardless of model weights precision.
- (PyTorch) Modules that NNCF inserted were excluded from parameter tracing.
- (OpenVINO) Extended the list of correctable layers for the BiasCorrection algorithm.
- (ONNX) Aligned BiasCorrection algorithm behaviour with OpenVINO in specific cases.
Tutorials:
- Post-Training Optimization of PhotoMaker Model
- Post-Training Optimization of Stable Diffusion XL Model
- Post-Training Optimization of KerasCV Stable Diffusion Model
- Post-Training Optimization of Paint By Example Model
- Post-Training Optimization of aMUSEd Model
- Post-Training Optimization of InstantID Model
- Post-Training Optimization of LLaVA Next Model
- Post-Training Optimization of AnimateAnyone Model
- Post-Training Optimization of YOLOv8-OBB Model
- Post-Training Optimization of LLM Agent
Compression-aware training:
Features:
- (PyTorch) nncf.quantize method now may be used as quantization initialization for Quantization-Aware Training. Added a Resnet18-based example with the transition from the Post-Training Quantization to a Quantization-Aware Training algorithm.
- (PyTorch) Introduced extractors for the fused Convolution, Batch-/GroupNorm, and Linear functions.
Fixes:
- (PyTorch) Fixed apply_args_defaults function issue.
- (PyTorch) Fixed dtype handling for the compressed torch.nn.Parameter.
- (PyTorch) Fixed is_shared parameter propagation.
Improvements:
- (PyTorch) Updated command creation behaviour to reduce the number of adapters.
- (PyTorch) Added option to insert point for models that wrapped with replace_modules=False.
Deprecations/Removals:
- (PyTorch) Removed the binarization algorithm.
- NNCF installation via pip install nncf[] option is now deprecated.
Requirements:
- Updated PyTorch (2.2.1) and CUDA (12.1) versions.
- Updated ONNX (1.16.0) and ONNXRuntime (1.17.1) versions.
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@Candyzorua
@clinty
@UsingtcNower
@DaniAffCH