From f1df8c93e2a547a50fb417ba65d1ad9510512b37 Mon Sep 17 00:00:00 2001
From: Liubov Talamanova <liubov.talamanova@intel.com>
Date: Tue, 23 Jul 2024 14:24:53 +0100
Subject: [PATCH] Fix incorrect scales in SmoothQuant algo (#2830)

### Changes

Incorrect processing of

https://github.com/openvinotoolkit/nncf/blob/49e98205975e46fc81a227ca4b2e66ca043ef91e/nncf/quantization/algorithms/smooth_quant/algorithm.py#L141-L143

### Reason for changes

* To avoid error: operands could not be broadcast together with shapes,
when we try to apply scales from previous group to node. SQ should be
ignored for such nodes with an appropriate message: `DEBUG:nncf:Skipped
SmoothQuant for nodes after <node_name> because of the empty scale`.

### Related tickets

* 147043

### Tests

<!--- How was the correctness of changes tested and whether new tests
were added -->
---
 .../post_training_compression/weights_compression/Usage.md      | 2 +-
 nncf/quantization/algorithms/smooth_quant/algorithm.py          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/usage/post_training_compression/weights_compression/Usage.md b/docs/usage/post_training_compression/weights_compression/Usage.md
index faa8ad43473..f90cefbfe16 100644
--- a/docs/usage/post_training_compression/weights_compression/Usage.md
+++ b/docs/usage/post_training_compression/weights_compression/Usage.md
@@ -9,7 +9,7 @@ The Weights Compression algorithm is aimed at compressing the weights of the mod
 #### Supported modes
 
 By default, weights are compressed asymmetrically to 8-bit integer data type - "INT8_ASYM" mode.
-OpenVINO backend also supports 3 modes of mixed precision weight quantization with a 4-bit data type as a primary precision - INT4_SYM, INT4_ASYM, NF4, E2M1. The primary precision in case of INT4_SYM mode is signed 4-bit integer and weights are quantized to it [symmetrically](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#symmetric-quantization) without zero point. In case of INT4_ASYM mode - unsigned 4-bit integer and weight are quantized to it [asymmetrically](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#asymmetric-quantization) with a typical non-fixed zero point. In case of NF4 mode - [nf4](https://arxiv.org/pdf/2305.14314v1.pdf) data type without zero point. In case of E2M1 mode - [e2m1](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) data type without zero point and has 8bit [E8M0](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) scale.
+OpenVINO backend also supports 4 modes of mixed precision weight quantization with a 4-bit data type as a primary precision - INT4_SYM, INT4_ASYM, NF4, E2M1. The primary precision in case of INT4_SYM mode is signed 4-bit integer and weights are quantized to it [symmetrically](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#symmetric-quantization) without zero point. In case of INT4_ASYM mode - unsigned 4-bit integer and weight are quantized to it [asymmetrically](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#asymmetric-quantization) with a typical non-fixed zero point. In case of NF4 mode - [nf4](https://arxiv.org/pdf/2305.14314v1.pdf) data type without zero point. In case of E2M1 mode - [e2m1](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) data type without zero point and has 8bit [E8M0](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) scale.
 All 4-bit modes have a grouped quantization support, when small group of weights (e.g. 128) in the channel dimension share quantization parameters (scale).
 All embeddings, convolutions and last linear layers are always compressed to 8-bit integer data type. To quantize embeddings and last linear layers to 4-bit, use `all_layers=True`.
 Percent of the rest layers compressed to 4-bit can be configured by "ratio" parameter. E.g. ratio=0.9 means 90% of layers compressed to the corresponding 4-bit data type and the rest to 8-bit asymmetric integer data type.
diff --git a/nncf/quantization/algorithms/smooth_quant/algorithm.py b/nncf/quantization/algorithms/smooth_quant/algorithm.py
index 1702c203709..1176c82ac22 100644
--- a/nncf/quantization/algorithms/smooth_quant/algorithm.py
+++ b/nncf/quantization/algorithms/smooth_quant/algorithm.py
@@ -108,8 +108,8 @@ def apply(
 
         node_groups = self._group_nodes_by_source(nodes_to_smooth_data, graph)
 
-        best_scale = None
         for group_id, nodes in track(node_groups.items(), description="Applying Smooth Quant"):
+            best_scale = None
             best_ratio = 0.0
             empty_statistic = False
             for node_to_smooth in nodes: