v2.13.0
Post-training Quantization:
Features:
- (OpenVINO) Added support for combining GPTQ with AWQ and Scale Estimation (SE) algorithms in
nncf.compress_weights()
for more accurate weight compression of LLMs. Thus, the following combinations with GPTQ are now supported: AWQ+GPTQ+SE, AWQ+GPTQ, GPTQ+SE, GPTQ. - (OpenVINO) Added LoRA Correction Algorithm to further improve the accuracy of int4 compressed models on top of other algorithms - AWQ and Scale Estimation. It can be enabled via the optional
lora_correction
parameter of thenncf.compress_weights()
API. The algorithm increases compression time and incurs a negligible model size overhead. Refer to accuracy/footprint trade-off for different int4 compression methods. - (PyTorch) Added implementation of the experimental Post-training Activation Pruning algorithm. Refer to Activation Sparsity for details.
- Added a memory monitoring tool for logging the memory a piece of python code or a script allocates. Refer to NNCF tools for details.
Fixes:
- (OpenVINO) Fixed the quantization of Convolution and LSTMSequence operations in cases where some inputs are part of a ShapeOF subgraph.
- (OpenVINO) Fixed issue with the FakeConvert duplication for FP8.
- Fixed Smooth Quant algorithm issue in case of the incorrect shapes.
- Fixed non-deterministic layer-wise scheduling.
Improvements:
- (OpenVINO) Increased hardware-fused pattern coverage.
- Improved progress bar logic during weights compression for more accurate remaining time estimation.
- Extended Scale estimation bitness range support for the
nncf.compress_weights()
. - Removed extra logging for the algorithm-generated ignored scope.
Tutorials:
- Post-Training Optimization of Flux.1 Model
- Post-Training Optimization of PixArt-α Model
- Post-Training Optimization of InternVL2 Model
- Post-Training Optimization of Qwen2Audio Model
- Post-Training Optimization of NuExtract Model
- Post-Training Optimization of MiniCPM-V2 Model
Compression-aware training:
Fixes:
- (PyTorch) Fixed some scenarios of NNCF patching interfering with
torch.compile
.
Requirements:
- Updated PyTorch (2.4.0) and Torchvision (0.19.0) versions.
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@rk119