Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale estimation/rectification for int4 compression #2549

Merged

Conversation

andreyanufr
Copy link
Collaborator

Changes

Added scale estimation for compression which minimizes L2 error between original MatMul and compressed one.

Reason for changes

Increases accuracy for compressed to 4 bit models.

Related tickets

CVS-129177

Tests

In process

@github-actions github-actions bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Mar 6, 2024
@openvino-nncf-ci openvino-nncf-ci added the API Public API-impacting changes label Mar 6, 2024
Copy link

codecov bot commented Mar 6, 2024

Codecov Report

Attention: Patch coverage is 8.36653% with 230 lines in your changes are missing coverage. Please review.

Project coverage is 29.95%. Comparing base (17a5b65) to head (f06095e).
Report is 5 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2549       +/-   ##
============================================
- Coverage    91.19%   29.95%   -61.24%     
============================================
  Files          493      494        +1     
  Lines        45468    45775      +307     
============================================
- Hits         41464    13713    -27751     
- Misses        4004    32062    +28058     
Files Coverage Δ
nncf/quantization/advanced_parameters.py 84.06% <100.00%> (-7.91%) ⬇️
...ntization/algorithms/weight_compression/backend.py 0.00% <ø> (-100.00%) ⬇️
nncf/openvino/quantization/quantize_model.py 0.00% <0.00%> (-61.30%) ⬇️
...ion/algorithms/weight_compression/torch_backend.py 0.00% <0.00%> (-84.11%) ⬇️
nncf/torch/quantization/quantize_model.py 0.00% <0.00%> (-92.50%) ⬇️
nncf/quantization/quantize_model.py 34.78% <12.50%> (-42.67%) ⬇️
...ization/algorithms/weight_compression/algorithm.py 0.00% <0.00%> (-96.49%) ⬇️
.../quantization/algorithms/weight_compression/awq.py 0.00% <0.00%> (-93.34%) ⬇️
...n/algorithms/weight_compression/weight_lowering.py 0.00% <0.00%> (-97.71%) ⬇️
.../algorithms/weight_compression/openvino_backend.py 0.00% <0.00%> (-98.34%) ⬇️
... and 1 more

... and 319 files with indirect coverage changes

Flag Coverage Δ
COMMON ?
ONNX ?
OPENVINO ?
TENSORFLOW 29.95% <8.36%> (-0.16%) ⬇️
TORCH ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 76.35% <ø> (-17.42%) ⬇️
torch 0.01% <0.00%> (-93.59%) ⬇️
tensorflow 93.74% <ø> (ø)
onnx 0.00% <ø> (-93.07%) ⬇️
openvino 0.00% <0.00%> (-94.19%) ⬇️
ptq 15.26% <8.43%> (-74.80%) ⬇️

@github-actions github-actions bot added the NNCF PT Pull requests that updates NNCF PyTorch label Mar 15, 2024
@andreyanufr andreyanufr changed the title Andreyan/scale estimation pr Scale estimation/rectification for int4 compression Mar 15, 2024
@andreyanufr andreyanufr marked this pull request as ready for review March 15, 2024 08:29
@andreyanufr andreyanufr requested a review from a team as a code owner March 15, 2024 08:29
@andreyanufr
Copy link
Collaborator Author

<style> </style>
lambada-openai      
model precision acc ppl
       
stabilityai_stablelm-2-zephyr-1_6b fp32 0.5925 6.3024
stabilityai_stablelm-2-zephyr-1_6b CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.5696 7.4355
stabilityai_stablelm-2-zephyr-1_6b CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.5467 7.9706
stabilityai_stablelm-2-zephyr-1_6b int4_sym_r10_gs64_max_activation_variance 0.5428 8.5844
       
stabilityai_stablelm-3b-4e1t fp16 0.7132 3.8192
stabilityai_stablelm-3b-4e1t CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.6936 4.0961
stabilityai_stablelm-3b-4e1t int4_sym_r10_gs64_max_activation_variance 0.685 4.324
stabilityai_stablelm-3b-4e1t CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.6798 4.4316
       
stable-zephyr-3b-dpo fp16 0.6099 6.7151
stable-zephyr-3b-dpo CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.5921 7.0513
stable-zephyr-3b-dpo CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.5736 8.3502
stable-zephyr-3b-dpo int4_sym_r10_gs64_max_activation_variance 0.5618 9.3011
       
llama-2-7b-chat fp16 0.7108 3.262
llama-2-7b-chat CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.6911 3.5074
llama-2-7b-chat int4_sym_r10_gs128_max_activation_variance 0.6885 3.5719
llama-2-7b-chat CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.6798 3.6947
       
zephyr-7b-beta fp16 0.7345 3.1783
zephyr-7b-beta CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.7297 3.2551
zephyr-7b-beta CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.7074 3.4549
zephyr-7b-beta int4_sym_r10_gs128_max_activation_variance 0.707 3.5021

nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
nncf/quantization/quantize_model.py Show resolved Hide resolved
nncf/torch/quantization/quantize_model.py Show resolved Hide resolved
nncf/openvino/quantization/quantize_model.py Show resolved Hide resolved
nncf/quantization/advanced_parameters.py Show resolved Hide resolved
2) Added docstrings.
@daniil-lyakhov daniil-lyakhov self-requested a review March 20, 2024 10:31
@andreyanufr andreyanufr requested a review from alexsu52 April 17, 2024 12:58
nncf/quantization/advanced_parameters.py Show resolved Hide resolved
Comment on lines +354 to 364
model,
self._backend_entity.name_to_node_mapping,
all_weight_params,
nodes_to_compress,
activations,
awq_params.subset_size,
awq_params.percent_to_apply,
awq_params.alpha_min,
awq_params.alpha_max,
awq_params.steps,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is out of scope of the PR, but my opinion is __init__ parameters should be look like this:

Suggested change
model,
self._backend_entity.name_to_node_mapping,
all_weight_params,
nodes_to_compress,
activations,
awq_params.subset_size,
awq_params.percent_to_apply,
awq_params.alpha_min,
awq_params.alpha_max,
awq_params.steps,
)
awq_params.subset_size,
awq_params.percent_to_apply,
awq_params.alpha_min,
awq_params.alpha_max,
awq_params.steps,
)

This comment is something to think about.


@staticmethod
def dump_parameters(
model: ov.Model, parameters: Dict, algo_name: Optional[str] = "quantization", path: Optional[List] = None
) -> None:
dump_parameters(model, parameters, algo_name, path)

@staticmethod
def get_compress_decompress_pipeline(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO:

Suggested change
def get_compress_decompress_pipeline(
def create_compress_decompress_fn(

return lambda w, s, zp: compiled_model([w, s, zp])[0]

@staticmethod
def get_compress_pipeline(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO:

Suggested change
def get_compress_pipeline(
def create_compress_fn(

@ljaljushkin
Copy link
Contributor

Scale estimation algorithm doesn't work for group_size=-1 and fails with no clear message:
image
in the short term, error about not supported parameter for scale estimation can be enough.
BTW, AWQ works fine with group_size=-1

return compressed_weights, scale, zero_point


def do_integer_quantization_with_fixed_scale(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this refactoring? Let's discuss it offline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alexsu52 alexsu52 dismissed their stale review April 25, 2024 12:32

I don't have blocking comments.

Comment on lines 99 to 106
def apply(
self,
model: TModel,
graph: NNCFGraph,
statistic_points: Optional[StatisticPointsContainer] = None,
dataset: Optional[Dataset] = None,
) -> TModel:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is 180 lines long and implements several critical steps at once (weights reshaping/preparing, rectification of initial scale, rectification of scale based on grid search). I believe stages separation in private functions with minimal description will not only improve readability, but allow to unit test separate parts independently

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daniil-lyakhov Are you expecting something like this andreyanufr@2c99248 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like this fb883db, but looks like such refactoring is risky without comprehensive testing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to create an issue for that

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to add unit tests to this feature?

nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
Comment on lines 99 to 106
def apply(
self,
model: TModel,
graph: NNCFGraph,
statistic_points: Optional[StatisticPointsContainer] = None,
dataset: Optional[Dataset] = None,
) -> TModel:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like this fb883db, but looks like such refactoring is risky without comprehensive testing

@alexsu52 alexsu52 merged commit 9c00000 into openvinotoolkit:develop Apr 30, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Public API-impacting changes NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants