-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale estimation/rectification for int4 compression #2549
Scale estimation/rectification for int4 compression #2549
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2549 +/- ##
============================================
- Coverage 91.19% 29.95% -61.24%
============================================
Files 493 494 +1
Lines 45468 45775 +307
============================================
- Hits 41464 13713 -27751
- Misses 4004 32062 +28058
... and 319 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more.
|
<style>
</style>
|
nncf/quantization/algorithms/weight_compression/openvino_backend.py
Outdated
Show resolved
Hide resolved
nncf/quantization/algorithms/weight_compression/scale_estimation.py
Outdated
Show resolved
Hide resolved
2) Added docstrings.
2) Updated OV compression/decompression pieline.
model, | ||
self._backend_entity.name_to_node_mapping, | ||
all_weight_params, | ||
nodes_to_compress, | ||
activations, | ||
awq_params.subset_size, | ||
awq_params.percent_to_apply, | ||
awq_params.alpha_min, | ||
awq_params.alpha_max, | ||
awq_params.steps, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is out of scope of the PR, but my opinion is __init__
parameters should be look like this:
model, | |
self._backend_entity.name_to_node_mapping, | |
all_weight_params, | |
nodes_to_compress, | |
activations, | |
awq_params.subset_size, | |
awq_params.percent_to_apply, | |
awq_params.alpha_min, | |
awq_params.alpha_max, | |
awq_params.steps, | |
) | |
awq_params.subset_size, | |
awq_params.percent_to_apply, | |
awq_params.alpha_min, | |
awq_params.alpha_max, | |
awq_params.steps, | |
) |
This comment is something to think about.
nncf/quantization/algorithms/weight_compression/scale_estimation.py
Outdated
Show resolved
Hide resolved
|
||
@staticmethod | ||
def dump_parameters( | ||
model: ov.Model, parameters: Dict, algo_name: Optional[str] = "quantization", path: Optional[List] = None | ||
) -> None: | ||
dump_parameters(model, parameters, algo_name, path) | ||
|
||
@staticmethod | ||
def get_compress_decompress_pipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO:
def get_compress_decompress_pipeline( | |
def create_compress_decompress_fn( |
return lambda w, s, zp: compiled_model([w, s, zp])[0] | ||
|
||
@staticmethod | ||
def get_compress_pipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO:
def get_compress_pipeline( | |
def create_compress_fn( |
nncf/quantization/algorithms/weight_compression/torch_backend.py
Outdated
Show resolved
Hide resolved
…stimation_pr_revert
return compressed_weights, scale, zero_point | ||
|
||
|
||
def do_integer_quantization_with_fixed_scale( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this refactoring? Let's discuss it offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
def apply( | ||
self, | ||
model: TModel, | ||
graph: NNCFGraph, | ||
statistic_points: Optional[StatisticPointsContainer] = None, | ||
dataset: Optional[Dataset] = None, | ||
) -> TModel: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is 180 lines long and implements several critical steps at once (weights reshaping/preparing, rectification of initial scale, rectification of scale based on grid search). I believe stages separation in private functions with minimal description will not only improve readability, but allow to unit test separate parts independently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daniil-lyakhov Are you expecting something like this andreyanufr@2c99248 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More like this fb883db, but looks like such refactoring is risky without comprehensive testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to create an issue for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to add unit tests to this feature?
def apply( | ||
self, | ||
model: TModel, | ||
graph: NNCFGraph, | ||
statistic_points: Optional[StatisticPointsContainer] = None, | ||
dataset: Optional[Dataset] = None, | ||
) -> TModel: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More like this fb883db, but looks like such refactoring is risky without comprehensive testing
Co-authored-by: Daniil Lyakhov <[email protected]>
Changes
Added scale estimation for compression which minimizes L2 error between original MatMul and compressed one.
Reason for changes
Increases accuracy for compressed to 4 bit models.
Related tickets
CVS-129177
Tests
In process