Skip to content

Commit

Permalink
Enable u8i8 and bf16 MHA tokenization with transpose_b=true
Browse files Browse the repository at this point in the history
  • Loading branch information
v-Golubev committed Nov 19, 2024
1 parent 0fb5b64 commit 1de39e8
Showing 1 changed file with 6 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1018,9 +1018,12 @@ void Transformations::MainSnippets(void) {
// Only FP32 dynamic MHA is supported
if (matmul->is_dynamic())
return false;
// [114487] brgemm kernel in oneDNN requires brgemm_copy_b kernel if MatMul node has transposed_b=True
// The current solution with ExtractExplicitMatMulTranspose pass is slower for non-f32 cases than using of brgemm_copy_b kernel
if (matmul->get_transpose_a() || matmul->get_transpose_b())
// Ticket 157340: repacking extraction is not supported for i8i8 case.
// If the repacking is performed inside the kernel, it may lead to performance degradation.
if (is_int8 && matmul->get_transpose_b())
return false;

if (matmul->get_transpose_a())
return false;
// [150842] The execution of Brgemm INT8/BF16 on AMX platforms depends on the value of "K % VNNIFactor".
// For more details, please teake a look at the ticket 150842
Expand Down

0 comments on commit 1de39e8

Please sign in to comment.