[Snippets] Added support of BF16/I8/U8 for MatMul #15063

a-sidorova · 2023-01-12T07:06:41Z

Details:

Added support of INT8, BF16 for MatMul on platforms with VNNI, bf16 support and AMX

Tickets:

102166

Blockers:

PR#14996

TODO:

After merging of PR#14996 need to add support of get_supported_precisions for Brgemm

rkazants · 2023-01-12T07:09:53Z

Wow, tons of the code:)

a-sidorova · 2023-01-12T07:13:20Z

Wow, tons of the code:)

The main part of the code is here #14327 😋

This reverts commit 255aa95.

This reverts commit e883075.

src/common/snippets/include/snippets/op/buffer.hpp

IvanNovoselov · 2023-03-22T14:50:26Z

src/common/snippets/include/snippets/op/memory_access.hpp

+    const PortDescriptor& get_output_port_descriptor(const size_t i) const;
+
+    void set_input_count(size_t count, size_t idx);
+    void set_output_count(size_t count, size_t idx);


It makes sense to have a default idx = 0 value

We used to have only one count for both inputs and outputs, so I'm a little confused here now. If I want to set count for Load should I use set_input_count or set_output_count? And if only set_input_count is legal for such operations, what happens if I set_output_count? I've seen that you created set_count for Load and Store, and I think this is a right direction, but still set_input_count could be called on load, which is a bit confusing. I don't insist, but should we make a separate class for single-port operations maybe?

Nice idea. let's do it, thanks

Load works with data on input and with vector register on output. So you should work with set_input_count. And opposite for Store. I thought that it seems like logical things. We can discuss it

IvanNovoselov · 2023-03-22T15:00:10Z

src/common/snippets/src/pass/collapse_subgraph.cpp

+        const bool is_f32 = intype_0 == element::f32 && intype_1 == element::f32;
+        const bool is_int8 = (intype_0 == element::i8 || intype_0 == element::u8) && (intype_1 == element::i8);
+        const bool is_bf16 = intype_0 == element::bf16 && intype_1 == element::bf16;
+        return is_f32 || is_bf16 || is_int8;


Just to clarify, it's not in plugin callback now, but in collapse subgraph

src/common/snippets/src/utils.cpp

src/common/snippets/src/pass/matmul_to_brgemm.cpp

IvanNovoselov · 2023-03-22T15:21:40Z

src/common/snippets/src/pass/vector_to_scalar.cpp

@@ -24,7 +24,8 @@ ngraph::snippets::pass::SetScalarCountForLoad::SetScalarCountForLoad() {
            if (!load)
                return false;

-            load->set_count(1lu);
+            auto& desc = load->get_input_port_descriptor(0);
+            desc.m_count = 1lu;


Ok, so now we have a choice set_input_count / set_output_count or simply set_count 🙃 . Which one should I choose?)

src/plugins/intel_cpu/src/snippets_transformations/op/brgemm_cpu.hpp

src/plugins/intel_cpu/src/emitters/jit_snippets_emitters.hpp

src/plugins/intel_cpu/src/emitters/jit_snippets_emitters.cpp

…matmul_i8_bf16

… MemoryAccess

IvanNovoselov

Approve with minor leftovers

src/common/snippets/src/op/load.cpp

src/common/snippets/src/op/store.cpp

src/common/snippets/src/op/load.cpp

dmitry-gorokhov · 2023-03-27T05:44:09Z

src/common/snippets/src/op/buffer.cpp

+    if (m_type == Type::NewMemory) {
+        OPENVINO_ASSERT(get_input_size() == 0, "Buffer with new allocated memory must to not have arguments!");
+        output_shape = m_shape;
+        output_type = ov::element::u8;  // 1Byte


Won't it break precision propogation in case child operation expects different precision on input?

dmitry-gorokhov · 2023-03-27T05:54:57Z

src/common/snippets/src/pass/collapse_subgraph.cpp

+        const bool is_f32 = intype_0 == element::f32 && intype_1 == element::f32;
+        const bool is_int8 = (intype_0 == element::i8 || intype_0 == element::u8) && (intype_1 == element::i8);
+        const bool is_bf16 = intype_0 == element::bf16 && intype_1 == element::bf16;
+        return is_f32 || is_bf16 || is_int8;


Would be nice to avoid code duplication. It might static method of Brgemm op that returns undef element type if combination of iniputs is invalid.

…matmul_i8_bf16

github-actions bot added category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: inference OpenVINO Runtime library - Inference category: ONNX FE OpenVINO ONNX FrontEnd labels Jan 12, 2023

a-sidorova added this to the 2023.0 milestone Jan 12, 2023

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 61c6b42 to fe72da7 Compare January 25, 2023 08:09

github-actions bot removed category: inference OpenVINO Runtime library - Inference category: build OpenVINO cmake script / infra category: ONNX FE OpenVINO ONNX FrontEnd labels Jan 25, 2023

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch 4 times, most recently from f3b50fd to 40b569d Compare January 26, 2023 10:23

a-sidorova marked this pull request as ready for review January 30, 2023 11:36

a-sidorova requested review from a team as code owners January 30, 2023 11:36

a-sidorova assigned IvanNovoselov Jan 30, 2023

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch 6 times, most recently from fa1bb56 to 80e945e Compare February 10, 2023 12:53

a-sidorova mentioned this pull request Feb 10, 2023

[Snippets] Add support of MHA Tokenization for different precisions #15647

Merged

6 tasks

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 80e945e to 255aa95 Compare February 20, 2023 08:54

a-sidorova added 3 commits March 17, 2023 19:33

BrgemmCopyB: updated classes

01d55cd

Revert "BrgemmCopyB: updated classes"

e9b7007

This reverts commit 255aa95.

Revert "Brgemm: new classes"

3f8bbcf

This reverts commit e883075.

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch 4 times, most recently from 5c39a4e to b4364ee Compare March 18, 2023 06:31

Applied Ivan comments

5deb3c3

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from b4364ee to 5deb3c3 Compare March 19, 2023 06:11

IvanNovoselov reviewed Mar 22, 2023

View reviewed changes

src/common/snippets/include/snippets/op/buffer.hpp Outdated Show resolved Hide resolved

src/common/snippets/include/snippets/op/buffer.hpp Outdated Show resolved Hide resolved

IvanNovoselov reviewed Mar 22, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into feature/snippets/…

c76f658

…matmul_i8_bf16

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 586263e to c76f658 Compare March 23, 2023 11:57

a-sidorova added 2 commits March 23, 2023 17:26

Applied Ivan comments 2

4d97db2

Fixed BrgemmEmitter indexes, added custom Brgemm shape infer, updated…

6b80c5d

… MemoryAccess

github-actions bot added the category: build OpenVINO cmake script / infra label Mar 24, 2023

MemoryAcccess update

5a6ba4b

IvanNovoselov approved these changes Mar 24, 2023

View reviewed changes

src/common/snippets/src/op/load.cpp Show resolved Hide resolved

src/common/snippets/src/op/store.cpp Show resolved Hide resolved

src/common/snippets/src/op/load.cpp Outdated Show resolved Hide resolved

MemoryAccess update 2

5c31c6c

a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 6b041f7 to 5c31c6c Compare March 24, 2023 16:01

a-sidorova assigned dmitry-gorokhov and unassigned IvanNovoselov Mar 27, 2023

dmitry-gorokhov approved these changes Mar 27, 2023

View reviewed changes

a-sidorova added 2 commits March 27, 2023 10:50

Fixed get_supported_presicions for Brgemm on AMX

9719a4e

Merge remote-tracking branch 'upstream/master' into feature/snippets/…

3934896

…matmul_i8_bf16

IvanNovoselov mentioned this pull request Mar 27, 2023

[Snippets] matmul blocking support #16583

Closed

2 tasks

a-sidorova added 2 commits March 27, 2023 20:40

Merge remote-tracking branch 'upstream/master' into feature/snippets/…

ea25c80

…matmul_i8_bf16

Merge remote-tracking branch 'upstream/master' into feature/snippets/…

3b83287

…matmul_i8_bf16

dmitry-gorokhov merged commit 38c924a into openvinotoolkit:master Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Snippets] Added support of BF16/I8/U8 for MatMul #15063

[Snippets] Added support of BF16/I8/U8 for MatMul #15063

a-sidorova commented Jan 12, 2023 •

edited

Loading

rkazants commented Jan 12, 2023

a-sidorova commented Jan 12, 2023

IvanNovoselov Mar 22, 2023

a-sidorova Mar 23, 2023

IvanNovoselov Mar 22, 2023

IvanNovoselov Mar 22, 2023

IvanNovoselov left a comment

dmitry-gorokhov Mar 27, 2023

dmitry-gorokhov Mar 27, 2023

[Snippets] Added support of BF16/I8/U8 for MatMul #15063

[Snippets] Added support of BF16/I8/U8 for MatMul #15063

Conversation

a-sidorova commented Jan 12, 2023 • edited Loading

Details:

Tickets:

Blockers:

TODO:

rkazants commented Jan 12, 2023

a-sidorova commented Jan 12, 2023

IvanNovoselov Mar 22, 2023

Choose a reason for hiding this comment

a-sidorova Mar 23, 2023

Choose a reason for hiding this comment

IvanNovoselov Mar 22, 2023

Choose a reason for hiding this comment

IvanNovoselov Mar 22, 2023

Choose a reason for hiding this comment

IvanNovoselov left a comment

Choose a reason for hiding this comment

dmitry-gorokhov Mar 27, 2023

Choose a reason for hiding this comment

dmitry-gorokhov Mar 27, 2023

Choose a reason for hiding this comment

a-sidorova commented Jan 12, 2023 •

edited

Loading