Represent symmetrically quantized weights in signed data type #2434

l-bat · 2024-01-29T12:31:51Z

Changes

Represent symmetrically quantized weights in signed data type with no zero point

Reason for changes

To detect the quantization type without analyzing zero-point values
Signed data type for symmetrically quantized weights will lead to a smaller footprint, especially in case of grouped quantization.

Related tickets

130625

Tests

Updated: tests/torch/ptq/test_weights_compression.py and tests/openvino/native/quantization/test_weights_compression.py

Merge after: openvinotoolkit/openvino#24457

Model	Backend	Metric name	Metric value	Metric diff	Num int4	Num int8	RAM MiB	Compr. time	Total time
tinyllama_data_aware_awq_scale_estimation	OV	Similarity	0.84048	-0.15952	94	124	35560	0:06:31	0:08:36
tinyllama_data_aware_awq_scale_estimation_stateful	OV	Similarity	0.84048	-0.15952	94	124	36612	0:06:12	0:07:40
tinyllama_data_aware_awq_stateful	OV	Similarity	0.85259	-0.14741	94	124	34824	0:01:50	0:03:17
tinyllama_data_aware	OV	Similarity	0.83853	-0.16147	94	124	30604	0:01:25	0:03:30
tinyllama_data_aware_gptq	OV	Similarity	0.82187	-0.17813	94	124	39624	0:25:09	0:27:10
tinyllama_data_free	OV	Similarity	0.72057	-0.27943	114	84	6671	0:00:42	0:02:46
tinyllama_int8_data_free	TORCH	Similarity	0.95624	-0.04376	0	312	30161	0:00:09	0:02:54

codecov · 2024-01-29T12:34:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.18%. Comparing base (d06b174) to head (3e6c649).
Report is 2 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #2434       +/-   ##
============================================
+ Coverage    47.70%   91.18%   +43.47%     
============================================
  Files          483      483               
  Lines        46305    46363       +58     
============================================
+ Hits         22090    42274    +20184     
+ Misses       24215     4089    -20126

Files	Coverage Δ
nncf/parameters.py	`100.00% <ø> (ø)`
...ization/algorithms/weight_compression/algorithm.py	`97.68% <ø> (+1.38%)`	⬆️
...quantization/algorithms/weight_compression/gptq.py	`94.87% <100.00%> (-0.07%)`	⬇️
...n/algorithms/weight_compression/mixed_precision.py	`98.11% <100.00%> (+0.01%)`	⬆️
.../algorithms/weight_compression/openvino_backend.py	`98.80% <100.00%> (+0.10%)`	⬆️
.../algorithms/weight_compression/scale_estimation.py	`92.52% <100.00%> (+0.75%)`	⬆️
...ion/algorithms/weight_compression/torch_backend.py	`84.71% <100.00%> (+84.71%)`	⬆️
...n/algorithms/weight_compression/weight_lowering.py	`95.13% <100.00%> (+0.21%)`	⬆️
nncf/quantization/quantize_model.py	`80.55% <ø> (+11.80%)`	⬆️
nncf/torch/quantization/layers.py	`95.97% <100.00%> (+57.64%)`	⬆️
... and 1 more

... and 296 files with indirect coverage changes

Flag	Coverage Δ
COMMON	`42.02% <0.00%> (-0.13%)`	⬇️
ONNX	`34.19% <7.20%> (-0.03%)`	⬇️
OPENVINO	`40.85% <84.00%> (+0.03%)`	⬆️
TENSORFLOW	`29.42% <0.00%> (?)`
TORCH	`65.42% <41.60%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`93.55% <ø> (+24.22%)`	⬆️
torch	`93.65% <100.00%> (+60.59%)`	⬆️
tensorflow	`93.26% <ø> (+93.26%)`	⬆️
onnx	`93.06% <ø> (ø)`
openvino	`94.47% <100.00%> (+0.02%)`	⬆️
ptq	`90.44% <100.00%> (+9.47%)`	⬆️

xiao1228 · 2024-01-30T14:58:33Z

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well?
for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

l-bat · 2024-01-31T10:47:47Z

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

xiao1228 · 2024-01-31T11:36:57Z

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

l-bat · 2024-02-01T14:06:07Z

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

I created ticket 131500 to support symmetrically quantized weights in signed data type for GPTQ

ljaljushkin

Would be great to see the performance numbers similar to the ones from comment:
#2537 (comment)

tests/post_training/data/wc_reference_data.yaml

l-bat · 2024-05-22T13:39:47Z

Develop

Model	Backend	Metric name	Metric value	Metric diff	Num int4	Num int8	RAM MiB	Compr. time	Total time
tinyllama_data_aware_awq_scale_estimation	OV	Similarity	0.8404	-0.1596	188	124	37513	0:05:50	0:07:52
tinyllama_data_aware_awq_scale_estimation_stateful	OV	Similarity	0.8404	-0.1596	188	124	39704	0:05:43	0:07:10
tinyllama_data_aware_awq_stateful	OV	Similarity	0.85259	-0.14741	188	124	29770	0:01:39	0:03:04
tinyllama_data_aware	OV	Similarity	0.83853	-0.16147	188	124	30731	0:01:09	0:03:11
tinyllama_data_free	OV	Similarity	0.72057	-0.27943	228	84	6588	0:00:31	0:02:33
tinyllama_int8_data_free	TORCH	Similarity	0.95624	-0.04376	0	312	33420	0:00:05	0:02:49

l-bat:lt/wc_sym_signed

Model	Backend	Metric name	Metric value	Metric diff	Num int4	Num int8	RAM MiB	Compr. time	Total time
tinyllama_data_aware_awq_scale_estimation	OV	Similarity	0.84048	-0.15952	94	124	37752	0:06:03	0:08:06
tinyllama_data_aware_awq_scale_estimation_stateful	OV	Similarity	0.84048	-0.15952	94	124	39638	0:05:52	0:07:19
tinyllama_data_aware_awq_stateful	OV	Similarity	0.85259	-0.14741	94	124	29995	0:01:47	0:03:13
tinyllama_data_aware	OV	Similarity	0.83853	-0.16147	94	124	30762	0:01:16	0:03:19
tinyllama_data_free	OV	Similarity	0.72057	-0.27943	114	84	6574	0:00:37	0:02:38
tinyllama_int8_data_free	TORCH	Similarity	0.95624	-0.04376	0	312	33338	0:00:05	0:02:47

nncf/quantization/algorithms/weight_compression/openvino_backend.py

nncf/quantization/algorithms/weight_compression/torch_backend.py

nncf/torch/quantization/layers.py

nncf/quantization/algorithms/weight_compression/weight_lowering.py

tests/post_training/data/wc_reference_data.yaml

l-bat · 2024-06-07T11:49:54Z

ci job: 23

nncf/torch/quantization/quantize_functions.py

AlexKoff88 · 2024-06-10T10:40:38Z

nncf/quantization/algorithms/weight_compression/scale_estimation.py

+    return target, zero_mask
+
+
+def get_near_to_ideal_scale(weight, target, zero_mask, importance):


Can we change this funny name to something more earthly) For example, estimate_scales, tune_scales, etc.

andreyanufr · 2024-06-11T08:50:41Z

nncf/quantization/algorithms/weight_compression/scale_estimation.py

@@ -165,8 +165,6 @@ def apply(
            original_weight = fns.zeros_like(weight) + weight

            compressed_weights, scale, zp = do_integer_quantization(original_weight, reduction_axis, config)
-            zp = zp.astype(scale.dtype)


if zp is not None:
zp = zp.astype(scale.dtype)
this conversion is important for performance

nncf/quantization/algorithms/weight_compression/scale_estimation.py

alexsu52

LGTM

l-bat requested a review from a team as a code owner January 29, 2024 12:31

l-bat marked this pull request as draft January 29, 2024 12:32

github-actions bot added documentation Improvements or additions to documentation NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jan 29, 2024

l-bat marked this pull request as ready for review January 29, 2024 12:32

l-bat force-pushed the lt/wc_sym_signed branch from e892fec to 3b891ea Compare January 30, 2024 08:28

l-bat force-pushed the lt/wc_sym_signed branch from 3b891ea to 8c6a339 Compare April 24, 2024 12:24

openvino-nncf-ci added the API Public API-impacting changes label Apr 24, 2024

l-bat force-pushed the lt/wc_sym_signed branch from 8c6a339 to 8eadc60 Compare May 16, 2024 09:03

l-bat requested review from ljaljushkin, AlexKoff88 and andreyanufr May 16, 2024 09:09

l-bat added the do not merge Should not be merged yet label May 17, 2024

l-bat force-pushed the lt/wc_sym_signed branch from b78bc23 to a2badb2 Compare May 20, 2024 08:28

ljaljushkin requested changes May 21, 2024

View reviewed changes

tests/post_training/data/wc_reference_data.yaml Show resolved Hide resolved

ljaljushkin approved these changes May 23, 2024

View reviewed changes

l-bat removed the do not merge Should not be merged yet label Jun 5, 2024

l-bat force-pushed the lt/wc_sym_signed branch from a2badb2 to ff98763 Compare June 5, 2024 09:54

alexsu52 reviewed Jun 6, 2024

View reviewed changes

ljaljushkin requested changes Jun 6, 2024

View reviewed changes

tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved

tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved

l-bat requested review from alexsu52 and ljaljushkin June 6, 2024 13:00

ljaljushkin reviewed Jun 6, 2024

View reviewed changes

tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved

l-bat force-pushed the lt/wc_sym_signed branch from c8d2fb9 to cf5b843 Compare June 7, 2024 11:56

alexsu52 reviewed Jun 10, 2024

View reviewed changes

nncf/torch/quantization/quantize_functions.py Outdated Show resolved Hide resolved

l-bat force-pushed the lt/wc_sym_signed branch from cf5b843 to fceef1b Compare June 10, 2024 08:43

ljaljushkin approved these changes Jun 10, 2024

View reviewed changes

AlexKoff88 reviewed Jun 10, 2024

View reviewed changes

AlexKoff88 approved these changes Jun 10, 2024

View reviewed changes

l-bat requested a review from alexsu52 June 11, 2024 07:17

andreyanufr reviewed Jun 11, 2024

View reviewed changes

andreyanufr approved these changes Jun 11, 2024

View reviewed changes

alexsu52 reviewed Jun 12, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/scale_estimation.py Show resolved Hide resolved

l-bat force-pushed the lt/wc_sym_signed branch from d754a97 to 3e6c649 Compare June 13, 2024 07:05

alexsu52 approved these changes Jun 13, 2024

View reviewed changes

alexsu52 merged commit 85b3263 into openvinotoolkit:develop Jun 13, 2024
12 checks passed

l-bat added 10 commits June 13, 2024 09:58

Represent symmetrically quantized weights in signed data type

7090898

Fix tests

1d4be37

minor fixes

9fb5862

Update wc conformance tests

2e00a3a

fix test

38fd973

Apply comments

6cc5a06

update wc_reference_data.yaml

d15e460

rename function

ddaf50d

Reduce computation time for ASYM mode

f81cc85

docstring

3e6c649

l-bat mentioned this pull request Jul 4, 2024

Fix scale calculation #2788

Closed

l-bat mentioned this pull request Jul 26, 2024

Update ReleaseNotes.md #2837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Represent symmetrically quantized weights in signed data type #2434

Represent symmetrically quantized weights in signed data type #2434

l-bat commented Jan 29, 2024 •

edited

Loading

codecov bot commented Jan 29, 2024 •

edited

Loading

xiao1228 commented Jan 30, 2024

l-bat commented Jan 31, 2024

xiao1228 commented Jan 31, 2024

l-bat commented Feb 1, 2024

ljaljushkin left a comment

l-bat commented May 22, 2024

l-bat commented Jun 7, 2024

AlexKoff88 Jun 10, 2024

andreyanufr Jun 11, 2024

alexsu52 left a comment

		return target, zero_mask


		def get_near_to_ideal_scale(weight, target, zero_mask, importance):

Represent symmetrically quantized weights in signed data type #2434

Represent symmetrically quantized weights in signed data type #2434

Conversation

l-bat commented Jan 29, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

codecov bot commented Jan 29, 2024 • edited Loading

Codecov Report

xiao1228 commented Jan 30, 2024

l-bat commented Jan 31, 2024

xiao1228 commented Jan 31, 2024

l-bat commented Feb 1, 2024

ljaljushkin left a comment

Choose a reason for hiding this comment

l-bat commented May 22, 2024

l-bat commented Jun 7, 2024

AlexKoff88 Jun 10, 2024

Choose a reason for hiding this comment

andreyanufr Jun 11, 2024

Choose a reason for hiding this comment

alexsu52 left a comment

Choose a reason for hiding this comment

l-bat commented Jan 29, 2024 •

edited

Loading

codecov bot commented Jan 29, 2024 •

edited

Loading