[CPU] Fixed BF16 Matmul inference precision #22994

dmitry-gorokhov · 2024-02-21T13:18:08Z

Details:

CPU plugin uses EnforceInferencePrecision routine for BF16 precision mark-up. Its logic assumes only activations precision is changed before Matmul op, while weights precision keeps w/o any changes. Since dnnlFCTypeMapping misses BF16 activation, FP32 weights optimized configuration for bf16, execution always happens in FP32 precision even user manually set infer_precision=bf16.
This bug is not visible on FP16 IRs (since BF16+FP16 config is present), so only FP32 IRs affected. SInce save_model and ovc apply FP16 compression be default, the issue mostly applicable for pipelines which use a model directly after convert_model call.

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

CPU plugin uses EnforceInferencePrecision routine for BF16 precision mark-up. Its logic assumes only activations precision is changed before Matmul op, while weights precision keeps w/o any changes. Since dnnlFCTypeMapping misses BF16 activation, FP32 weights optimized configuration for bf16, execution always happens in FP32 precision even user manually set infer_precision=bf16. This bug is not visible on FP16 IRs (since BF16+FP16 config is present), so only FP32 IRs affected. SInce save_model and ovc apply FP16 compression be default, the issue mostly applicable for pipelines which use a model directly after convert_model call. Cherry-picks: #22994

dmitry-gorokhov added bug Something isn't working category: CPU OpenVINO CPU plugin labels Feb 21, 2024

dmitry-gorokhov added this to the 2024.1 milestone Feb 21, 2024

dmitry-gorokhov requested review from a team as code owners February 21, 2024 13:18

dmitry-gorokhov mentioned this pull request Feb 21, 2024

[CPU] Fixed BF16 Matmul inference precision #22995

Merged

maxnick reviewed Feb 21, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp Outdated Show resolved Hide resolved

[CPU] Fixed BF16 Matmul inference precision

bbb231f

dmitry-gorokhov force-pushed the fix/bf16_matmul_on_fp32_ir branch from eb634c4 to bbb231f Compare February 22, 2024 08:28

maxnick approved these changes Feb 22, 2024

View reviewed changes

dmitry-gorokhov enabled auto-merge February 22, 2024 12:53

dmitry-gorokhov added this pull request to the merge queue Feb 23, 2024

Merged via the queue into openvinotoolkit:master with commit ac04160 Feb 23, 2024
99 checks passed

dmitry-gorokhov deleted the fix/bf16_matmul_on_fp32_ir branch February 23, 2024 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Fixed BF16 Matmul inference precision #22994

[CPU] Fixed BF16 Matmul inference precision #22994

dmitry-gorokhov commented Feb 21, 2024 •

edited

Loading

[CPU] Fixed BF16 Matmul inference precision #22994

[CPU] Fixed BF16 Matmul inference precision #22994

Conversation

dmitry-gorokhov commented Feb 21, 2024 • edited Loading

Details:

dmitry-gorokhov commented Feb 21, 2024 •

edited

Loading