Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snippets] Added support of BF16/I8/U8 for MatMul #15063

Conversation

a-sidorova
Copy link
Contributor

@a-sidorova a-sidorova commented Jan 12, 2023

Details:

  • Added support of INT8, BF16 for MatMul on platforms with VNNI, bf16 support and AMX

Tickets:

  • 102166

Blockers:

TODO:

  • After merging of PR#14996 need to add support of get_supported_precisions for Brgemm

@github-actions github-actions bot added category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: inference OpenVINO Runtime library - Inference category: ONNX FE OpenVINO ONNX FrontEnd labels Jan 12, 2023
@a-sidorova a-sidorova added this to the 2023.0 milestone Jan 12, 2023
@rkazants
Copy link
Contributor

Wow, tons of the code:)

@a-sidorova
Copy link
Contributor Author

Wow, tons of the code:)

The main part of the code is here #14327 😋

@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 61c6b42 to fe72da7 Compare January 25, 2023 08:09
@github-actions github-actions bot removed category: inference OpenVINO Runtime library - Inference category: build OpenVINO cmake script / infra category: ONNX FE OpenVINO ONNX FrontEnd labels Jan 25, 2023
@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch 4 times, most recently from f3b50fd to 40b569d Compare January 26, 2023 10:23
@a-sidorova a-sidorova marked this pull request as ready for review January 30, 2023 11:36
@a-sidorova a-sidorova requested review from a team as code owners January 30, 2023 11:36
@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch 6 times, most recently from fa1bb56 to 80e945e Compare February 10, 2023 12:53
@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 80e945e to 255aa95 Compare February 20, 2023 08:54
@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch 4 times, most recently from 5c39a4e to b4364ee Compare March 18, 2023 06:31
@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from b4364ee to 5deb3c3 Compare March 19, 2023 06:11
const PortDescriptor& get_output_port_descriptor(const size_t i) const;

void set_input_count(size_t count, size_t idx);
void set_output_count(size_t count, size_t idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It makes sense to have a default idx = 0 value
  2. We used to have only one count for both inputs and outputs, so I'm a little confused here now. If I want to set count for Load should I use set_input_count or set_output_count? And if only set_input_count is legal for such operations, what happens if I set_output_count? I've seen that you created set_count for Load and Store, and I think this is a right direction, but still set_input_count could be called on load, which is a bit confusing. I don't insist, but should we make a separate class for single-port operations maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Nice idea. let's do it, thanks
  2. Load works with data on input and with vector register on output. So you should work with set_input_count. And opposite for Store. I thought that it seems like logical things. We can discuss it

const bool is_f32 = intype_0 == element::f32 && intype_1 == element::f32;
const bool is_int8 = (intype_0 == element::i8 || intype_0 == element::u8) && (intype_1 == element::i8);
const bool is_bf16 = intype_0 == element::bf16 && intype_1 == element::bf16;
return is_f32 || is_bf16 || is_int8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, it's not in plugin callback now, but in collapse subgraph

src/common/snippets/src/utils.cpp Show resolved Hide resolved
@@ -24,7 +24,8 @@ ngraph::snippets::pass::SetScalarCountForLoad::SetScalarCountForLoad() {
if (!load)
return false;

load->set_count(1lu);
auto& desc = load->get_input_port_descriptor(0);
desc.m_count = 1lu;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so now we have a choice set_input_count / set_output_count or simply set_count 🙃 . Which one should I choose?)

@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 586263e to c76f658 Compare March 23, 2023 11:57
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Mar 24, 2023
Copy link
Contributor

@IvanNovoselov IvanNovoselov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with minor leftovers

src/common/snippets/src/op/load.cpp Show resolved Hide resolved
src/common/snippets/src/op/store.cpp Show resolved Hide resolved
src/common/snippets/src/op/load.cpp Outdated Show resolved Hide resolved
@a-sidorova a-sidorova force-pushed the feature/snippets/matmul_i8_bf16 branch from 6b041f7 to 5c31c6c Compare March 24, 2023 16:01
if (m_type == Type::NewMemory) {
OPENVINO_ASSERT(get_input_size() == 0, "Buffer with new allocated memory must to not have arguments!");
output_shape = m_shape;
output_type = ov::element::u8; // 1Byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it break precision propogation in case child operation expects different precision on input?

const bool is_f32 = intype_0 == element::f32 && intype_1 == element::f32;
const bool is_int8 = (intype_0 == element::i8 || intype_0 == element::u8) && (intype_1 == element::i8);
const bool is_bf16 = intype_0 == element::bf16 && intype_1 == element::bf16;
return is_f32 || is_bf16 || is_int8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to avoid code duplication. It might static method of Brgemm op that returns undef element type if combination of iniputs is invalid.

@dmitry-gorokhov dmitry-gorokhov merged commit 38c924a into openvinotoolkit:master Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants