Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Fixed BF16 Matmul inference precision #22994

Conversation

dmitry-gorokhov
Copy link
Contributor

@dmitry-gorokhov dmitry-gorokhov commented Feb 21, 2024

Details:

CPU plugin uses EnforceInferencePrecision routine for BF16 precision mark-up. Its logic assumes only activations precision is changed before Matmul op, while weights precision keeps w/o any changes. Since dnnlFCTypeMapping misses BF16 activation, FP32 weights optimized configuration for bf16, execution always happens in FP32 precision even user manually set infer_precision=bf16.
This bug is not visible on FP16 IRs (since BF16+FP16 config is present), so only FP32 IRs affected. SInce save_model and ovc apply FP16 compression be default, the issue mostly applicable for pipelines which use a model directly after convert_model call.

@dmitry-gorokhov dmitry-gorokhov added bug Something isn't working category: CPU OpenVINO CPU plugin labels Feb 21, 2024
@dmitry-gorokhov dmitry-gorokhov added this to the 2024.1 milestone Feb 21, 2024
@dmitry-gorokhov dmitry-gorokhov requested review from a team as code owners February 21, 2024 13:18
@dmitry-gorokhov dmitry-gorokhov force-pushed the fix/bf16_matmul_on_fp32_ir branch from eb634c4 to bbb231f Compare February 22, 2024 08:28
github-merge-queue bot pushed a commit that referenced this pull request Feb 22, 2024
CPU plugin uses EnforceInferencePrecision routine for BF16 precision
mark-up. Its logic assumes only activations precision is changed before
Matmul op, while weights precision keeps w/o any changes. Since
dnnlFCTypeMapping misses BF16 activation, FP32 weights optimized
configuration for bf16, execution always happens in FP32 precision even
user manually set infer_precision=bf16.
This bug is not visible on FP16 IRs (since BF16+FP16 config is present),
so only FP32 IRs affected. SInce save_model and ovc apply FP16
compression be default, the issue mostly applicable for pipelines which
use a model directly after convert_model call.

Cherry-picks: #22994
@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Feb 23, 2024
Merged via the queue into openvinotoolkit:master with commit ac04160 Feb 23, 2024
99 checks passed
@dmitry-gorokhov dmitry-gorokhov deleted the fix/bf16_matmul_on_fp32_ir branch February 23, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants