Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent symmetrically quantized weights in signed data type #2434

Merged
merged 10 commits into from
Jun 13, 2024

Conversation

l-bat
Copy link
Collaborator

@l-bat l-bat commented Jan 29, 2024

Changes

Represent symmetrically quantized weights in signed data type with no zero point

Reason for changes

  • To detect the quantization type without analyzing zero-point values
  • Signed data type for symmetrically quantized weights will lead to a smaller footprint, especially in case of grouped quantization.

Related tickets

130625

Tests

Updated: tests/torch/ptq/test_weights_compression.py and tests/openvino/native/quantization/test_weights_compression.py

Merge after: openvinotoolkit/openvino#24457

Model Backend Metric name Metric value Metric diff Num int4 Num int8 RAM MiB Compr. time Total time
tinyllama_data_aware_awq_scale_estimation OV Similarity 0.84048 -0.15952 94 124 35560 0:06:31 0:08:36
tinyllama_data_aware_awq_scale_estimation_stateful OV Similarity 0.84048 -0.15952 94 124 36612 0:06:12 0:07:40
tinyllama_data_aware_awq_stateful OV Similarity 0.85259 -0.14741 94 124 34824 0:01:50 0:03:17
tinyllama_data_aware OV Similarity 0.83853 -0.16147 94 124 30604 0:01:25 0:03:30
tinyllama_data_aware_gptq OV Similarity 0.82187 -0.17813 94 124 39624 0:25:09 0:27:10
tinyllama_data_free OV Similarity 0.72057 -0.27943 114 84 6671 0:00:42 0:02:46
tinyllama_int8_data_free TORCH Similarity 0.95624 -0.04376 0 312 30161 0:00:09 0:02:54

@l-bat l-bat requested a review from a team as a code owner January 29, 2024 12:31
@l-bat l-bat marked this pull request as draft January 29, 2024 12:32
@github-actions github-actions bot added documentation Improvements or additions to documentation NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jan 29, 2024
@l-bat l-bat marked this pull request as ready for review January 29, 2024 12:32
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.18%. Comparing base (d06b174) to head (3e6c649).
Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2434       +/-   ##
============================================
+ Coverage    47.70%   91.18%   +43.47%     
============================================
  Files          483      483               
  Lines        46305    46363       +58     
============================================
+ Hits         22090    42274    +20184     
+ Misses       24215     4089    -20126     
Files Coverage Δ
nncf/parameters.py 100.00% <ø> (ø)
...ization/algorithms/weight_compression/algorithm.py 97.68% <ø> (+1.38%) ⬆️
...quantization/algorithms/weight_compression/gptq.py 94.87% <100.00%> (-0.07%) ⬇️
...n/algorithms/weight_compression/mixed_precision.py 98.11% <100.00%> (+0.01%) ⬆️
.../algorithms/weight_compression/openvino_backend.py 98.80% <100.00%> (+0.10%) ⬆️
.../algorithms/weight_compression/scale_estimation.py 92.52% <100.00%> (+0.75%) ⬆️
...ion/algorithms/weight_compression/torch_backend.py 84.71% <100.00%> (+84.71%) ⬆️
...n/algorithms/weight_compression/weight_lowering.py 95.13% <100.00%> (+0.21%) ⬆️
nncf/quantization/quantize_model.py 80.55% <ø> (+11.80%) ⬆️
nncf/torch/quantization/layers.py 95.97% <100.00%> (+57.64%) ⬆️
... and 1 more

... and 296 files with indirect coverage changes

Flag Coverage Δ
COMMON 42.02% <0.00%> (-0.13%) ⬇️
ONNX 34.19% <7.20%> (-0.03%) ⬇️
OPENVINO 40.85% <84.00%> (+0.03%) ⬆️
TENSORFLOW 29.42% <0.00%> (?)
TORCH 65.42% <41.60%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 93.55% <ø> (+24.22%) ⬆️
torch 93.65% <100.00%> (+60.59%) ⬆️
tensorflow 93.26% <ø> (+93.26%) ⬆️
onnx 93.06% <ø> (ø)
openvino 94.47% <100.00%> (+0.02%) ⬆️
ptq 90.44% <100.00%> (+9.47%) ⬆️

@xiao1228
Copy link
Contributor

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well?
for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

@l-bat
Copy link
Collaborator Author

l-bat commented Jan 31, 2024

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

@xiao1228
Copy link
Contributor

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

@l-bat
Copy link
Collaborator Author

l-bat commented Feb 1, 2024

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

I created ticket 131500 to support symmetrically quantized weights in signed data type for GPTQ

@l-bat l-bat force-pushed the lt/wc_sym_signed branch from 3b891ea to 8c6a339 Compare April 24, 2024 12:24
@openvino-nncf-ci openvino-nncf-ci added the API Public API-impacting changes label Apr 24, 2024
@l-bat l-bat force-pushed the lt/wc_sym_signed branch from 8c6a339 to 8eadc60 Compare May 16, 2024 09:03
@l-bat l-bat added the do not merge Should not be merged yet label May 17, 2024
@l-bat l-bat force-pushed the lt/wc_sym_signed branch from b78bc23 to a2badb2 Compare May 20, 2024 08:28
Copy link
Contributor

@ljaljushkin ljaljushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to see the performance numbers similar to the ones from comment:
#2537 (comment)

@l-bat
Copy link
Collaborator Author

l-bat commented May 22, 2024

Develop

Model Backend Metric name Metric value Metric diff Num int4 Num int8 RAM MiB Compr. time Total time
tinyllama_data_aware_awq_scale_estimation OV Similarity 0.8404 -0.1596 188 124 37513 0:05:50 0:07:52
tinyllama_data_aware_awq_scale_estimation_stateful OV Similarity 0.8404 -0.1596 188 124 39704 0:05:43 0:07:10
tinyllama_data_aware_awq_stateful OV Similarity 0.85259 -0.14741 188 124 29770 0:01:39 0:03:04
tinyllama_data_aware OV Similarity 0.83853 -0.16147 188 124 30731 0:01:09 0:03:11
tinyllama_data_free OV Similarity 0.72057 -0.27943 228 84 6588 0:00:31 0:02:33
tinyllama_int8_data_free TORCH Similarity 0.95624 -0.04376 0 312 33420 0:00:05 0:02:49

l-bat:lt/wc_sym_signed

Model Backend Metric name Metric value Metric diff Num int4 Num int8 RAM MiB Compr. time Total time
tinyllama_data_aware_awq_scale_estimation OV Similarity 0.84048 -0.15952 94 124 37752 0:06:03 0:08:06
tinyllama_data_aware_awq_scale_estimation_stateful OV Similarity 0.84048 -0.15952 94 124 39638 0:05:52 0:07:19
tinyllama_data_aware_awq_stateful OV Similarity 0.85259 -0.14741 94 124 29995 0:01:47 0:03:13
tinyllama_data_aware OV Similarity 0.83853 -0.16147 94 124 30762 0:01:16 0:03:19
tinyllama_data_free OV Similarity 0.72057 -0.27943 114 84 6574 0:00:37 0:02:38
tinyllama_int8_data_free TORCH Similarity 0.95624 -0.04376 0 312 33338 0:00:05 0:02:47

@l-bat l-bat removed the do not merge Should not be merged yet label Jun 5, 2024
@l-bat l-bat force-pushed the lt/wc_sym_signed branch from a2badb2 to ff98763 Compare June 5, 2024 09:54
tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved
tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved
@l-bat l-bat requested review from alexsu52 and ljaljushkin June 6, 2024 13:00
@l-bat
Copy link
Collaborator Author

l-bat commented Jun 7, 2024

ci job: 23

@l-bat l-bat force-pushed the lt/wc_sym_signed branch from c8d2fb9 to cf5b843 Compare June 7, 2024 11:56
@l-bat l-bat force-pushed the lt/wc_sym_signed branch from cf5b843 to fceef1b Compare June 10, 2024 08:43
return target, zero_mask


def get_near_to_ideal_scale(weight, target, zero_mask, importance):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this funny name to something more earthly) For example, estimate_scales, tune_scales, etc.

@l-bat l-bat requested a review from alexsu52 June 11, 2024 07:17
@@ -165,8 +165,6 @@ def apply(
original_weight = fns.zeros_like(weight) + weight

compressed_weights, scale, zp = do_integer_quantization(original_weight, reduction_axis, config)
zp = zp.astype(scale.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if zp is not None:
zp = zp.astype(scale.dtype)
this conversion is important for performance

@l-bat l-bat force-pushed the lt/wc_sym_signed branch from d754a97 to 3e6c649 Compare June 13, 2024 07:05
Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexsu52 alexsu52 merged commit 85b3263 into openvinotoolkit:develop Jun 13, 2024
12 checks passed
@l-bat l-bat mentioned this pull request Jul 4, 2024
@l-bat l-bat mentioned this pull request Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Public API-impacting changes documentation Improvements or additions to documentation NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants