[CPU] Enable u8 kv cache by default #27454

luo-cheng2021 · 2024-11-07T11:56:01Z

Details:

Enable u8 kv cache by default
...

Tickets:

152621

...tel_cpu/tests/functional/custom/subgraph_tests/src/common/concat_transpose_sdp_transpose.cpp

zhangYiIntel · 2024-11-08T08:08:25Z

src/plugins/intel_cpu/src/config.cpp

@@ -411,6 +412,9 @@ void Config::readProperties(const ov::AnyMap& prop, const ModelType modelType) {
        if (!fcDynamicQuantizationGroupSizeSetExplicitly) {
            fcDynamicQuantizationGroupSize = 0;
        }
+        if (!kvCachePrecisionSetExplicitly) {


why not set kvCachePrecision to u8 here ? To make the kvCachePrecision compatible in ACL platform ? If so, it's better to left a comment here to explain this.

Setting the kvCachePrecision in constructor should be more clearer and consistent with other properties such as DynamicQuantizationGroup, so my suggestion is to keep them in current style.

…nt8_kvcache

src/plugins/intel_cpu/src/memory_state.cpp

zhangYiIntel

LGTM, thanks!

### Changes explicitly disable kv cache compression to u8, f16 precision is used instead. ### Reason for changes PTWC nightly has a different metrics (ticket 157594). It happens, because since openvinotoolkit/openvino#27454 KV Cache compressed to u8 by default and it affects accuracy of fp32 models (ticket 157571). Propose using kv cache in the f16 in order to handle issues in nncf rather than in ov (there's still an open issue with kv cache compression, and it can be modified in the nearest future) ### Related tickets 157571 157594 ### Tests - [x] openvino-nightly/job/post_training_weight_compression/56 ![image](https://github.com/user-attachments/assets/0772a8e5-0f92-4f53-8ac0-e16841bd8193) - [x] https://github.com/openvinotoolkit/nncf/actions/runs/11934079602 - [x] job/weekly/job/openvino-nightly/job/test_examples/77

### Details: - *Enable u8 kv cache by default* - *...* ### Tickets: - *[152621](https://jira.devtools.intel.com/browse/CVS-152621)*

### Details: - *Port for enabling u8 kv cache by default in #27454* ### Tickets: - *[152621](https://jira.devtools.intel.com/browse/CVS-152621)* --------- Co-authored-by: Vladislav Golubev <[email protected]>

default enable u8 kv cache

0f67bd8

github-actions bot added the category: CPU OpenVINO CPU plugin label Nov 7, 2024

disable u8 kvcache default config on non-x86

f37b571

luo-cheng2021 marked this pull request as ready for review November 8, 2024 04:39

luo-cheng2021 requested review from a team as code owners November 8, 2024 04:39

yuxu42 requested a review from zhangYiIntel November 8, 2024 07:58

zhangYiIntel reviewed Nov 8, 2024

View reviewed changes

dmitry-gorokhov added this to the 2025.0 milestone Nov 8, 2024

dmitry-gorokhov assigned zhangYiIntel Nov 8, 2024

luo-cheng2021 added 2 commits November 11, 2024 09:50

apply review comment: fix u8 set_state failure

81d3411

Merge remote-tracking branch 'upstream/master' into luocheng/enable_i…

16578fe

…nt8_kvcache

luo-cheng2021 force-pushed the luocheng/enable_int8_kvcache branch 2 times, most recently from 47823da to 16578fe Compare November 11, 2024 02:12

luo-cheng2021 requested a review from zhangYiIntel November 11, 2024 04:37

dmitry-gorokhov reviewed Nov 11, 2024

View reviewed changes

src/plugins/intel_cpu/src/memory_state.cpp Show resolved Hide resolved

cover bf16/f32/u8 for get/set_state

45eb699

luo-cheng2021 requested a review from dmitry-gorokhov November 12, 2024 09:08

dmitry-gorokhov enabled auto-merge November 12, 2024 11:14

zhangYiIntel approved these changes Nov 13, 2024

View reviewed changes

dmitry-gorokhov added this pull request to the merge queue Nov 13, 2024

Merged via the queue into openvinotoolkit:master with commit 2d148ec Nov 13, 2024
161 checks passed

ljaljushkin mentioned this pull request Nov 15, 2024

New metrics for weight compression with dynamic quantization openvinotoolkit/nncf#2829

Closed

luo-cheng2021 mentioned this pull request Nov 18, 2024

[CPU] Port for enabling u8 kv cache by default #27583

Merged

ljaljushkin mentioned this pull request Nov 20, 2024

set kv cache in f16 openvinotoolkit/nncf#3101

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Enable u8 kv cache by default #27454

[CPU] Enable u8 kv cache by default #27454

luo-cheng2021 commented Nov 7, 2024 •

edited

Loading

zhangYiIntel Nov 8, 2024

luo-cheng2021 Nov 11, 2024

zhangYiIntel left a comment

[CPU] Enable u8 kv cache by default #27454

[CPU] Enable u8 kv cache by default #27454

Conversation

luo-cheng2021 commented Nov 7, 2024 • edited Loading

Details:

Tickets:

zhangYiIntel Nov 8, 2024

Choose a reason for hiding this comment

luo-cheng2021 Nov 11, 2024

Choose a reason for hiding this comment

zhangYiIntel left a comment

Choose a reason for hiding this comment

luo-cheng2021 commented Nov 7, 2024 •

edited

Loading