Skip to content

v2.13.0

Compare
Choose a tag to compare
@KodiaqQ KodiaqQ released this 19 Sep 10:24
· 2297 commits to develop since this release

Post-training Quantization:

Features:

  • (OpenVINO) Added support for combining GPTQ with AWQ and Scale Estimation (SE) algorithms in nncf.compress_weights() for more accurate weight compression of LLMs. Thus, the following combinations with GPTQ are now supported: AWQ+GPTQ+SE, AWQ+GPTQ, GPTQ+SE, GPTQ.
  • (OpenVINO) Added LoRA Correction Algorithm to further improve the accuracy of int4 compressed models on top of other algorithms - AWQ and Scale Estimation. It can be enabled via the optional lora_correction parameter of the nncf.compress_weights() API. The algorithm increases compression time and incurs a negligible model size overhead. Refer to accuracy/footprint trade-off for different int4 compression methods.
  • (PyTorch) Added implementation of the experimental Post-training Activation Pruning algorithm. Refer to Activation Sparsity for details.
  • Added a memory monitoring tool for logging the memory a piece of python code or a script allocates. Refer to NNCF tools for details.

Fixes:

  • (OpenVINO) Fixed the quantization of Convolution and LSTMSequence operations in cases where some inputs are part of a ShapeOF subgraph.
  • (OpenVINO) Fixed issue with the FakeConvert duplication for FP8.
  • Fixed Smooth Quant algorithm issue in case of the incorrect shapes.
  • Fixed non-deterministic layer-wise scheduling.

Improvements:

  • (OpenVINO) Increased hardware-fused pattern coverage.
  • Improved progress bar logic during weights compression for more accurate remaining time estimation.
  • Extended Scale estimation bitness range support for the nncf.compress_weights().
  • Removed extra logging for the algorithm-generated ignored scope.

Tutorials:

Compression-aware training:

Fixes:

  • (PyTorch) Fixed some scenarios of NNCF patching interfering with torch.compile.

Requirements:

  • Updated PyTorch (2.4.0) and Torchvision (0.19.0) versions.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@rk119