Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set kv cache in f16 #3101

Merged
merged 1 commit into from
Nov 22, 2024
Merged

Conversation

ljaljushkin
Copy link
Contributor

@ljaljushkin ljaljushkin commented Nov 20, 2024

Changes

explicitly disable kv cache compression to u8, f16 precision is used instead.

Reason for changes

PTWC nightly has a different metrics (ticket 157594).
It happens, because since openvinotoolkit/openvino#27454 KV Cache compressed to u8 by default and it affects accuracy of fp32 models (ticket 157571).

Propose using kv cache in the f16 in order to handle issues in nncf rather than in ov (there's still an open issue with kv cache compression, and it can be modified in the nearest future)

Related tickets

157571
157594

Tests

@github-actions github-actions bot added the NNCF PTQ Pull requests that updates NNCF PTQ label Nov 20, 2024
@ljaljushkin ljaljushkin marked this pull request as ready for review November 20, 2024 12:47
@ljaljushkin ljaljushkin requested a review from a team as a code owner November 20, 2024 12:47
@ljaljushkin ljaljushkin requested a review from alexsu52 November 20, 2024 12:47
@alexsu52 alexsu52 merged commit dc9f5cb into openvinotoolkit:develop Nov 22, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants