Weight compression via Lora Correction Algorithm #2816

ljaljushkin · 2024-07-16T14:16:27Z

Changes

Lora Correction algorithm for int4/nf4 weight compression.

Reason for changes

Method for improving accuracy by migrating quantization noise to “learnable” lora adapters.

Related tickets

135863

Tests

docstrings, proper names
results for phi3 and stablelm2-1.6b on lambada, wikitext
job/NNCF/job/manual/job/post_training_weight_compression/144/

nncf/quantization/algorithms/weight_compression/lora.py

nncf/quantization/algorithms/weight_compression/algorithm.py

nncf/quantization/algorithms/weight_compression/activation_stats.py

nncf/quantization/algorithms/weight_compression/openvino_backend.py

andreyanufr · 2024-07-16T15:58:04Z

nncf/quantization/algorithms/weight_compression/lora.py

+        for i in range(n_iters):
+            VX = Vr @ X
+            if not w_regularization:
+                sol = fns.linalg.lstsq(fns.transpose(VX), fns.transpose(dY))


driver="gelsy" ?

thanks, corrected

andreyanufr · 2024-07-23T12:58:07Z

nncf/quantization/algorithms/weight_compression/lora_correction.py

+            if not w_regularization:
+                sol = fns.linalg.lstsq(fns.transpose(X), fns.transpose(dYU), driver="gelsy")
+            else:
+                Ind = fns.eye(Vr.shape[1], backend=Vr.backend, dtype=Vr.dtype)


May be also worth to add some math formulas in comment

Done, also refactored to avoid too many transpose. please take a look at it as well

nncf/quantization/quantize_model.py

nncf/quantization/algorithms/weight_compression/algorithm.py

nncf/quantization/algorithms/weight_compression/backend.py

nncf/quantization/algorithms/weight_compression/lora_correction.py

nncf/quantization/algorithms/weight_compression/openvino_backend.py

nncf/quantization/algorithms/weight_compression/weight_lowering.py

…_comments_tmp

…od_squash

…nl/lora_correct_prod_squash

…od_squash

andreyanufr · 2024-08-23T12:06:10Z

nncf/quantization/algorithms/weight_compression/lora_correction.py

+            for i in range(n_gs):
+                offset = i * gs
+                denum = fns.sum(s[offset : offset + gs])
+                s[offset : offset + gs] = s[offset : offset + gs] / denum


I think we can skip first normalization

alexsu52

LGTM, please consider taking my comments into consideration

nncf/quantization/advanced_parameters.py

nncf/quantization/quantize_model.py

…od_squash

ljaljushkin · 2024-08-27T20:24:33Z

Latest build of conformance test with lora test case:
job/NNCF/job/manual/job/post_training_weight_compression/150

openvinotoolkit#2816

github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jul 16, 2024

openvino-nncf-ci added the API Public API-impacting changes label Jul 16, 2024

ljaljushkin commented Jul 16, 2024

View reviewed changes

ljaljushkin force-pushed the nl/lora_correct_prod_squash branch 2 times, most recently from a355df6 to 24b1f9e Compare July 22, 2024 15:45

github-actions bot added the documentation Improvements or additions to documentation label Jul 22, 2024

ljaljushkin marked this pull request as ready for review July 22, 2024 15:50

ljaljushkin requested a review from a team as a code owner July 22, 2024 15:50

ljaljushkin requested review from alexsu52, AlexKoff88 and andreyanufr July 22, 2024 15:50

ljaljushkin added 2 commits July 22, 2024 17:59

Lora Correction Algorithm for int4/nf4 weight compression

24b1f9e

conformance test values

cd5d3de

andreyanufr reviewed Jul 23, 2024

View reviewed changes

alexsu52 reviewed Jul 24, 2024

View reviewed changes

ljaljushkin added 12 commits July 30, 2024 22:07

gelsy

42580d9

unsupported options

bff4e2b

renaming and comments in LoRA algorithm

c099e4b

less transpose, should be faster and less memory

4639097

renaming, typehint, gptq+lora error

d26540f

no wc_params in functions

4479bc3

rename

523861b

changed defaults

5b1e98e

test lora with mixed precision

4747ba4

Merge remote-tracking branch 'fork/nl/lora_comments_tmp' into nl/lora…

258e91a

…_comments_tmp

removed copy-paste in nf4 quant/dequant

3cd49be

tests for unsupported options

85d9c8a

Merge remote-tracking branch 'origin/develop' into nl/lora_comments_tmp

f481f6c

ljaljushkin requested review from alexsu52 and andreyanufr August 19, 2024 08:09

ljaljushkin added 2 commits August 19, 2024 10:10

new reference for lora conformance test

af0c7f5

fixed pre-commit

470fbd8

ljaljushkin force-pushed the nl/lora_correct_prod_squash branch from 470fbd8 to a3b26c7 Compare August 19, 2024 18:17

ljaljushkin added 9 commits August 19, 2024 20:30

fixed pre-commit

a3b26c7

Merge remote-tracking branch 'origin/develop' into nl/lora_correct_pr…

6395026

…od_squash

dump advanced, test for transpose_b=False, expose lora params

565e299

Merge remote-tracking branch 'origin/develop' into nl/lora_correct_pr…

6021a2b

…od_squash

Corrected debug output

c73b2d6

Merge remote-tracking branch 'origin/develop' into nl/lora_correct_pr…

6bc070f

…od_squash

Merge remote-tracking branch 'fork/nl/lora_correct_prod_squash' into …

471f3cc

…nl/lora_correct_prod_squash

Merge remote-tracking branch 'origin/develop' into nl/lora_correct_pr…

4c38115

…od_squash

Merge remote-tracking branch 'origin/develop' into nl/lora_correct_pr…

eefdcda

…od_squash

andreyanufr reviewed Aug 23, 2024

View reviewed changes

andreyanufr approved these changes Aug 26, 2024

View reviewed changes

alexsu52 reviewed Aug 27, 2024

View reviewed changes

nncf/quantization/advanced_parameters.py Outdated Show resolved Hide resolved

nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved

ljaljushkin requested a review from alexsu52 August 27, 2024 15:41

ljaljushkin added 2 commits August 27, 2024 17:52

renaming

a296518

Merge remote-tracking branch 'origin/develop' into nl/lora_correct_pr…

4282bd5

…od_squash

alexsu52 approved these changes Aug 28, 2024

View reviewed changes

alexsu52 merged commit 417c2a1 into openvinotoolkit:develop Aug 28, 2024
13 checks passed

ljaljushkin mentioned this pull request Aug 29, 2024

[TorchFX] INT8 Weights Compression Support #2891

Merged

ljaljushkin added a commit to KodiaqQ/nncf that referenced this pull request Sep 5, 2024

Added Lora Correction algorithm.

cff6fea

openvinotoolkit#2816

ljaljushkin mentioned this pull request Sep 5, 2024

[Release_v2130] Update ReleaseNotes.md #2940

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight compression via Lora Correction Algorithm #2816

Weight compression via Lora Correction Algorithm #2816

ljaljushkin commented Jul 16, 2024 •

edited

Loading

andreyanufr Jul 16, 2024

ljaljushkin Aug 19, 2024

andreyanufr Jul 23, 2024

ljaljushkin Aug 9, 2024

andreyanufr Aug 23, 2024

alexsu52 left a comment

ljaljushkin commented Aug 27, 2024

Weight compression via Lora Correction Algorithm #2816

Weight compression via Lora Correction Algorithm #2816

Conversation

ljaljushkin commented Jul 16, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

andreyanufr Jul 16, 2024

Choose a reason for hiding this comment

ljaljushkin Aug 19, 2024

Choose a reason for hiding this comment

andreyanufr Jul 23, 2024

Choose a reason for hiding this comment

ljaljushkin Aug 9, 2024

Choose a reason for hiding this comment

andreyanufr Aug 23, 2024

Choose a reason for hiding this comment

alexsu52 left a comment

Choose a reason for hiding this comment

ljaljushkin commented Aug 27, 2024

ljaljushkin commented Jul 16, 2024 •

edited

Loading