You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (compress_weights(…) pipeline).
(OpenVINO) Added support for IF operation quantization.
(OpenVINO) Added dump_intermediate_model parameter support for AccuracyAwareAlgorithm (quantize_with_accuracy_control(…) pipeline).
(OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (quantize_with_tune_hyperparams(…) pipeline).
(PyTorch) Post-training Quantization is now supported with quantize(…) pipeline and the common implementation of quantization algorithms. Deprecated create_compressed_model() method for Post-training Quantization.
Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for ModelType.Transformer scheme.
QuantizationPreset.Mixed was set as the default for ModelType.Transformer scheme.
Fixes:
(OpenVINO, ONNX, PyTorch) Aligned/added patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance/metrics.
Fixed patterns for ModelType.Transformer to align with the quantization scheme.
Improvements:
Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates.
(OpenVINO) Optimized WeightsCompression algorithm (compress_weights(…) pipeline) execution time for LLM's quantization, added ignored scope support.
(OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (quantize_with_accuracy_control(…) pipeline).
(ONNX) quantize(...) method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use the do_constant_folding=True option during export from PyTorch to ONNX.
Compression-aware training:
Fixes:
(PyTorch) Fixed Hessian trace calculation to solve #2155 issue.