Scale estimation/rectification for int4 compression #2549

andreyanufr · 2024-03-06T08:57:44Z

Changes

Added scale estimation for compression which minimizes L2 error between original MatMul and compressed one.

Reason for changes

Increases accuracy for compressed to 4 bit models.

Related tickets

CVS-129177

Tests

In process

codecov · 2024-03-06T08:59:39Z

Codecov Report

Attention: Patch coverage is 8.36653% with 230 lines in your changes are missing coverage. Please review.

Project coverage is 29.95%. Comparing base (17a5b65) to head (f06095e).
Report is 5 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #2549       +/-   ##
============================================
- Coverage    91.19%   29.95%   -61.24%     
============================================
  Files          493      494        +1     
  Lines        45468    45775      +307     
============================================
- Hits         41464    13713    -27751     
- Misses        4004    32062    +28058

Files	Coverage Δ
nncf/quantization/advanced_parameters.py	`84.06% <100.00%> (-7.91%)`	⬇️
...ntization/algorithms/weight_compression/backend.py	`0.00% <ø> (-100.00%)`	⬇️
nncf/openvino/quantization/quantize_model.py	`0.00% <0.00%> (-61.30%)`	⬇️
...ion/algorithms/weight_compression/torch_backend.py	`0.00% <0.00%> (-84.11%)`	⬇️
nncf/torch/quantization/quantize_model.py	`0.00% <0.00%> (-92.50%)`	⬇️
nncf/quantization/quantize_model.py	`34.78% <12.50%> (-42.67%)`	⬇️
...ization/algorithms/weight_compression/algorithm.py	`0.00% <0.00%> (-96.49%)`	⬇️
.../quantization/algorithms/weight_compression/awq.py	`0.00% <0.00%> (-93.34%)`	⬇️
...n/algorithms/weight_compression/weight_lowering.py	`0.00% <0.00%> (-97.71%)`	⬇️
.../algorithms/weight_compression/openvino_backend.py	`0.00% <0.00%> (-98.34%)`	⬇️
... and 1 more

... and 319 files with indirect coverage changes

Flag	Coverage Δ
COMMON	`?`
ONNX	`?`
OPENVINO	`?`
TENSORFLOW	`29.95% <8.36%> (-0.16%)`	⬇️
TORCH	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`76.35% <ø> (-17.42%)`	⬇️
torch	`0.01% <0.00%> (-93.59%)`	⬇️
tensorflow	`93.74% <ø> (ø)`
onnx	`0.00% <ø> (-93.07%)`	⬇️
openvino	`0.00% <0.00%> (-94.19%)`	⬇️
ptq	`15.26% <8.43%> (-74.80%)`	⬇️

andreyanufr · 2024-03-15T09:39:39Z

lambada-openai
model	precision	acc	ppl

stabilityai_stablelm-2-zephyr-1_6b	fp32	0.5925	6.3024
stabilityai_stablelm-2-zephyr-1_6b	CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale	0.5696	7.4355
stabilityai_stablelm-2-zephyr-1_6b	CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn	0.5467	7.9706
stabilityai_stablelm-2-zephyr-1_6b	int4_sym_r10_gs64_max_activation_variance	0.5428	8.5844

stabilityai_stablelm-3b-4e1t	fp16	0.7132	3.8192
stabilityai_stablelm-3b-4e1t	CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale	0.6936	4.0961
stabilityai_stablelm-3b-4e1t	int4_sym_r10_gs64_max_activation_variance	0.685	4.324
stabilityai_stablelm-3b-4e1t	CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn	0.6798	4.4316

stable-zephyr-3b-dpo	fp16	0.6099	6.7151
stable-zephyr-3b-dpo	CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale	0.5921	7.0513
stable-zephyr-3b-dpo	CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn	0.5736	8.3502
stable-zephyr-3b-dpo	int4_sym_r10_gs64_max_activation_variance	0.5618	9.3011

llama-2-7b-chat	fp16	0.7108	3.262
llama-2-7b-chat	CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale	0.6911	3.5074
llama-2-7b-chat	int4_sym_r10_gs128_max_activation_variance	0.6885	3.5719
llama-2-7b-chat	CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn	0.6798	3.6947

zephyr-7b-beta	fp16	0.7345	3.1783
zephyr-7b-beta	CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale	0.7297	3.2551
zephyr-7b-beta	CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn	0.7074	3.4549
zephyr-7b-beta	int4_sym_r10_gs128_max_activation_variance	0.707	3.5021

nncf/quantization/algorithms/weight_compression/openvino_backend.py

nncf/quantization/algorithms/weight_compression/awq.py

nncf/quantization/quantize_model.py

nncf/torch/quantization/quantize_model.py

nncf/openvino/quantization/quantize_model.py

nncf/quantization/advanced_parameters.py

nncf/quantization/algorithms/weight_compression/algorithm.py

nncf/quantization/algorithms/weight_compression/scale_estimation.py

nncf/quantization/algorithms/weight_compression/weight_lowering.py

2) Added docstrings.

…stimation_pr

2) Updated OV compression/decompression pieline.

nncf/quantization/advanced_parameters.py

nncf/quantization/algorithms/weight_compression/algorithm.py

alexsu52 · 2024-04-18T10:04:59Z

nncf/quantization/algorithms/weight_compression/algorithm.py

+                model,
+                self._backend_entity.name_to_node_mapping,
+                all_weight_params,
+                nodes_to_compress,
+                activations,
+                awq_params.subset_size,
+                awq_params.percent_to_apply,
+                awq_params.alpha_min,
+                awq_params.alpha_max,
+                awq_params.steps,
            )


It looks like this is out of scope of the PR, but my opinion is __init__ parameters should be look like this:

Suggested change

model,

self._backend_entity.name_to_node_mapping,

all_weight_params,

nodes_to_compress,

activations,

awq_params.subset_size,

awq_params.percent_to_apply,

awq_params.alpha_min,

awq_params.alpha_max,

awq_params.steps,

)

awq_params.subset_size,

awq_params.percent_to_apply,

awq_params.alpha_min,

awq_params.alpha_max,

awq_params.steps,

)

This comment is something to think about.

nncf/quantization/algorithms/weight_compression/scale_estimation.py

alexsu52 · 2024-04-18T11:08:25Z

nncf/quantization/algorithms/weight_compression/openvino_backend.py


    @staticmethod
    def dump_parameters(
        model: ov.Model, parameters: Dict, algo_name: Optional[str] = "quantization", path: Optional[List] = None
    ) -> None:
        dump_parameters(model, parameters, algo_name, path)

+    @staticmethod
+    def get_compress_decompress_pipeline(


IMHO:

Suggested change

def get_compress_decompress_pipeline(

def create_compress_decompress_fn(

alexsu52 · 2024-04-18T11:09:20Z

nncf/quantization/algorithms/weight_compression/openvino_backend.py

+        return lambda w, s, zp: compiled_model([w, s, zp])[0]
+
+    @staticmethod
+    def get_compress_pipeline(


IMHO:

Suggested change

def get_compress_pipeline(

def create_compress_fn(

nncf/quantization/algorithms/weight_compression/scale_estimation.py

nncf/quantization/algorithms/weight_compression/openvino_backend.py

nncf/quantization/algorithms/weight_compression/torch_backend.py

…stimation_pr_revert

ljaljushkin · 2024-04-19T19:17:45Z

Scale estimation algorithm doesn't work for group_size=-1 and fails with no clear message:

in the short term, error about not supported parameter for scale estimation can be enough.
BTW, AWQ works fine with group_size=-1

alexsu52 · 2024-04-25T07:06:51Z

nncf/quantization/algorithms/weight_compression/weight_lowering.py

+    return compressed_weights, scale, zero_point
+
+
+def do_integer_quantization_with_fixed_scale(


Could you explain this refactoring? Let's discuss it offline.

I don't have blocking comments.

…stimation_pr

daniil-lyakhov · 2024-04-25T13:53:39Z

nncf/quantization/algorithms/weight_compression/scale_estimation.py

+    def apply(
+        self,
+        model: TModel,
+        graph: NNCFGraph,
+        statistic_points: Optional[StatisticPointsContainer] = None,
+        dataset: Optional[Dataset] = None,
+    ) -> TModel:
+        """


This function is 180 lines long and implements several critical steps at once (weights reshaping/preparing, rectification of initial scale, rectification of scale based on grid search). I believe stages separation in private functions with minimal description will not only improve readability, but allow to unit test separate parts independently

@daniil-lyakhov Are you expecting something like this andreyanufr@2c99248 ?

More like this fb883db, but looks like such refactoring is risky without comprehensive testing

I suggest to create an issue for that

daniil-lyakhov

Is there a plan to add unit tests to this feature?

nncf/quantization/quantize_model.py

daniil-lyakhov · 2024-04-26T09:59:53Z

nncf/quantization/algorithms/weight_compression/scale_estimation.py

+    def apply(
+        self,
+        model: TModel,
+        graph: NNCFGraph,
+        statistic_points: Optional[StatisticPointsContainer] = None,
+        dataset: Optional[Dataset] = None,
+    ) -> TModel:
+        """


More like this fb883db, but looks like such refactoring is risky without comprehensive testing

Co-authored-by: Daniil Lyakhov <[email protected]>

nncf/quantization/algorithms/weight_compression/weight_lowering.py

andreyanufr added 3 commits March 4, 2024 20:43

Scale estimation for 4bit compression.

9f29c53

Fixed name to node mapping sharing.

2b657f3

Added algo flag and removed debug information.

00c8d12

github-actions bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Mar 6, 2024

openvino-nncf-ci added the API Public API-impacting changes label Mar 6, 2024

andreyanufr added 6 commits March 12, 2024 14:55

Cache set of used ov networks for compression/decompression

eab7f49

Added test for scale estimation.

3908ef9

Changed variable names.

8961142

Added adwanced parametrs for compression.

3bc4ec2

Added advanced parameters to compression algo.

0292b2a

Changed AWQ logick.

7cec4ce

github-actions bot added the NNCF PT Pull requests that updates NNCF PyTorch label Mar 15, 2024

andreyanufr changed the title ~~Andreyan/scale estimation pr~~ Scale estimation/rectification for int4 compression Mar 15, 2024

andreyanufr marked this pull request as ready for review March 15, 2024 08:29

andreyanufr requested a review from a team as a code owner March 15, 2024 08:29

andreyanufr requested review from alexsu52 and daniil-lyakhov March 15, 2024 14:53

daniil-lyakhov reviewed Mar 15, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/openvino_backend.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/awq.py Outdated Show resolved Hide resolved

andreyanufr added 3 commits March 18, 2024 11:44

Added conformance test.

18313df

Fixed conformance test

9d4c122

Removed code dublicate.

b35de99

alexsu52 previously requested changes Mar 20, 2024

View reviewed changes

1) Fixed bugs with parameter.

c69d8eb

2) Added docstrings.

daniil-lyakhov self-requested a review March 20, 2024 10:31

andreyanufr added 3 commits March 22, 2024 09:00

Merge remote-tracking branch 'upstream/develop' into andreyan/scale_e…

7999c6b

…stimation_pr

Stateless scale estimation test.

f5111a8

1) Added conformance metrics for scale estimation.

bd5c1e1

2) Updated OV compression/decompression pieline.

andreyanufr added 3 commits April 16, 2024 10:39

Resolve conflict.

680939b

Changed PT backend compression according to OV bckend cahnges.

1cd2c7f

Updated conformance test references.

2fb43bd

andreyanufr requested a review from alexsu52 April 17, 2024 12:58

Update references in conformance tests.

4371295

alexsu52 reviewed Apr 18, 2024

View reviewed changes

andreyanufr added 8 commits April 19, 2024 10:46

Revert change related to inplace compression in sacle estimation.

d41e29c

Merge remote-tracking branch 'upstream/develop' into andreyan/scale_e…

6a42c35

…stimation_pr_revert

Added precomputed scale as new parameter.

413eec2

Updated docstrings for compression algo hyperparameters.

065e4f4

Fixed pylint error.

007d92d

Added extra check for compression parameters combination.

ad81ac0

Reduce OV test scope.

6771faf

Reduce OV test scope.

8d2a842

Added exception for AWQ and SE in the case gruop_size==-1.

0c2a1f7

andreyanufr requested review from alexsu52 and ljaljushkin April 22, 2024 15:07

ljaljushkin approved these changes Apr 24, 2024

View reviewed changes

alexsu52 reviewed Apr 25, 2024

View reviewed changes

andreyanufr added 2 commits April 25, 2024 14:55

Refactoring.

afcf3b3

Merge remote-tracking branch 'upstream/develop' into andreyan/scale_e…

140c31f

…stimation_pr

daniil-lyakhov reviewed Apr 25, 2024

View reviewed changes

daniil-lyakhov reviewed Apr 26, 2024

View reviewed changes

andreyanufr and others added 2 commits April 26, 2024 14:38

Update nncf/quantization/quantize_model.py

795742d

Co-authored-by: Daniil Lyakhov <[email protected]>

Fixed return type.

f06095e

daniil-lyakhov approved these changes Apr 29, 2024

View reviewed changes

alexsu52 reviewed Apr 30, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/weight_lowering.py Show resolved Hide resolved

alexsu52 merged commit 9c00000 into openvinotoolkit:develop Apr 30, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale estimation/rectification for int4 compression #2549

Scale estimation/rectification for int4 compression #2549

andreyanufr commented Mar 6, 2024

codecov bot commented Mar 6, 2024 •

edited

Loading

andreyanufr commented Mar 15, 2024

alexsu52 Apr 18, 2024

alexsu52 Apr 18, 2024

alexsu52 Apr 18, 2024

ljaljushkin commented Apr 19, 2024

alexsu52 Apr 25, 2024

andreyanufr Apr 25, 2024

daniil-lyakhov Apr 25, 2024

andreyanufr Apr 25, 2024

daniil-lyakhov Apr 26, 2024

daniil-lyakhov Apr 26, 2024

daniil-lyakhov left a comment

daniil-lyakhov Apr 26, 2024

	def get_compress_decompress_pipeline(
	def create_compress_decompress_fn(

		return compressed_weights, scale, zero_point


		def do_integer_quantization_with_fixed_scale(

Scale estimation/rectification for int4 compression #2549

Scale estimation/rectification for int4 compression #2549

Conversation

andreyanufr commented Mar 6, 2024

Changes

Reason for changes

Related tickets

Tests

codecov bot commented Mar 6, 2024 • edited Loading

Codecov Report

andreyanufr commented Mar 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ljaljushkin commented Apr 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniil-lyakhov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 6, 2024 •

edited

Loading