Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snippets] Added Softmax support #57

Merged
merged 12 commits into from
Dec 21, 2022

Conversation

a-sidorova
Copy link
Owner

@a-sidorova a-sidorova commented Nov 24, 2022

Details:

  • Added Softmax support to Snippets partially: added the corresponding config parameter to disable Softmax in Snippets pipeline to avoid performance regressions and enable in tests for validation
  • Added support of Reshape around Softmax via SoftmaxReshapeElimination pass that removes the Reshape ops
  • More details here

Tickets:

  • 92363
  • 95636

Blockers:

@a-sidorova a-sidorova force-pushed the feature/snippets/softmax branch 5 times, most recently from a025b91 to 026360c Compare December 5, 2022 09:23
@a-sidorova a-sidorova force-pushed the feature/snippets/softmax branch from 026360c to f916fe2 Compare December 5, 2022 10:58
@a-sidorova a-sidorova force-pushed the feature/snippets/softmax branch from f916fe2 to 70a7e56 Compare December 6, 2022 10:34
@a-sidorova a-sidorova force-pushed the feature/snippets/softmax branch from 70a7e56 to 53dc219 Compare December 13, 2022 12:47
[Snippets] Added support for Reshape around Softmax

applied comment part

Added config parameter to disable MHA ops tokenization

Buffer 2D

Loops
@a-sidorova a-sidorova force-pushed the feature/snippets/softmax branch from 53dc219 to 9d2d721 Compare December 13, 2022 12:50
Copy link
Collaborator

@dmitry-gorokhov dmitry-gorokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first part

src/common/snippets/include/snippets/generator.hpp Outdated Show resolved Hide resolved
src/common/snippets/include/snippets/op/fill.hpp Outdated Show resolved Hide resolved
src/common/snippets/include/snippets/op/fill.hpp Outdated Show resolved Hide resolved
src/common/snippets/src/op/buffer.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/op/horizon_max.cpp Show resolved Hide resolved
src/common/snippets/include/snippets/op/subgraph.hpp Outdated Show resolved Hide resolved
@dmitry-gorokhov
Copy link
Collaborator

@v-Golubev could you please take a look on transformations? Maybe some generic comments

@dmitry-gorokhov
Copy link
Collaborator

@IvanNovoselov do you have any additional comments on latest version?

src/common/snippets/src/pass/insert_loops.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/insert_loops.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/reset_buffer.cpp Show resolved Hide resolved
src/common/snippets/src/pass/reset_buffer.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/reset_buffer.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/softmax_decomposition.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/softmax_decomposition.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/softmax_decomposition.cpp Outdated Show resolved Hide resolved
/* ====== ReduceMax decomposition ====== */

const auto vector_buffer_max = std::make_shared<ngraph::snippets::op::VectorBuffer>();
const auto loop_max_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{data, data});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the data was added to OutputVector intentionally but could you please leave some comment explaining why we do that?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I don't understand your comment))
Could you clarify what do you mean? Why LoopBegin input is data? Or why we add data twice? Or why we use ngraph::OutputVector Or something else?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, looks like I lost a couple of words while translating them from thoughts into a comment.
I meant that it isn't clear why we add data twice

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's a long story... But I tried to explain it in the comment. Thank you!
4a1f00b

@a-sidorova a-sidorova force-pushed the feature/snippets/softmax branch from 339a1b8 to 1cac5d5 Compare December 15, 2022 06:06
@a-sidorova a-sidorova requested a review from v-Golubev December 15, 2022 07:09
auto reshape1 = std::make_shared<ov::op::v1::Reshape>(softmax_v1, shape1, false);
function = std::make_shared<Function>(NodeVector{reshape1}, ParameterVector{data});

manager.register_pass<pass::InitNodeInfo>();
Copy link
Collaborator

@v-Golubev v-Golubev Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: InitNodeInfo pass is added by default in TransformationTestsF::SetUp() method so we can remove it

Copy link
Owner Author

@a-sidorova a-sidorova Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!
4a1f00b

@a-sidorova a-sidorova requested a review from v-Golubev December 15, 2022 10:50
Copy link
Collaborator

@IvanNovoselov IvanNovoselov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First part

src/common/snippets/include/snippets/op/fill.hpp Outdated Show resolved Hide resolved
src/common/snippets/include/snippets/pass/insert_loops.hpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/insert_loops.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/insert_loops.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/insert_loops.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/insert_loops.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@IvanNovoselov IvanNovoselov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part

// Propagate to down: in Load. Buffer can have several Load and Loops after himself. We should go through all target inputs
{
std::function<void(const Input<Node>&)> propagate_down;
propagate_down = [&](const Input<Node>& target_input) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that we intend to change a node below the matched one. If we'll have another transformation (after the present one - let's call it Next) that matches on Loads, there could be a following situation:

  1. Present transformation matches on node n
  2. Next matches on n's child
  3. Present updates n's child
  4. Next overwrites n's child ignoring the update from Present, since it was made after the match.

As far as I understand, Matcher pass mechanics guarantees that it's safe to modify parent nodes, and the present node (as long as you return true), but modification of child nodes could cause a transformation conflict as described.
Could you make sure that we won't have such problems here please?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed example! But honestly I don't understand the possible problems. The pass matches on Buffer node and updates the corresponding operations like Store, Load, Brgemm. To find them we should go over all LoopBase ops. I mean that Buffer node has own MemoryAccess nodes and they're different for each Buffer node. Yeah, Buffer nodes can have the same LoopBase nodes but we don't touch them, only MemoryAccess

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then subsequent matcher-pass that modifies MemoryAccess can overwrite your changes (it can be completely different transformation not connected to the Buffer)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, I left the comment
8967b35

src/common/snippets/src/pass/insert_movebroadcast.cpp Outdated Show resolved Hide resolved
// Maximum | /
// / LoopEnd
// HorizonMax |
// \ LoopBegin[Sub + Exp + ReduceSum]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example on why do we need to move control-flow operations to a different IR.
As well as add_control_dependency block on L157-164

// Eliminate Reshape before Softmax
reshape0->output(0).replace(reshape0->input_value(0));
copy_runtime_info({reshape0->input_value(0).get_node_shared_ptr(), reshape0->output(0).get_node_shared_ptr()},
reshape0->input_value(0).get_node_shared_ptr());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting why reshape0->input_value(0).get_node_shared_ptr() both in from and to arguments. Why do we need to copy rt_info from a node to itself?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For second Reshape I call replace_output_update_name because we should save output names. When we remove first Reshape we don't need to save the name so I just call reshape0->output(0).replace(reshape0->input_value(0)). But to align with replace_output_update_name I call this copy_rt_info because there is the same code line in replace_output_update_name.

@a-sidorova a-sidorova requested review from IvanNovoselov, v-Golubev and dmitry-gorokhov and removed request for v-Golubev December 20, 2022 15:06
Copy link
Collaborator

@dmitry-gorokhov dmitry-gorokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dmitry-gorokhov
Copy link
Collaborator

@IvanNovoselov, @v-Golubev any remaining comments?

Copy link
Collaborator

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comments from my side that could block the merge

Comment on lines 71 to 77
auto data = std::make_shared<opset1::Parameter>(element::f32, Shape{1, 2, 340, 240});
auto shape0 = std::make_shared<ov::op::v0::Constant>(ov::element::i32, ov::Shape{2}, std::vector<int32_t>{2, 81600});
auto reshape0 = std::make_shared<ov::op::v1::Reshape>(data, shape0, false);
auto softmax_v1 = std::make_shared<ov::op::v8::Softmax>(reshape0, -1);
auto shape1 = std::make_shared<ov::op::v0::Constant>(ov::element::i32, ov::Shape{4}, std::vector<int32_t>{1, 2, 340, 240});
auto reshape1 = std::make_shared<ov::op::v1::Reshape>(softmax_v1, shape1, false);
function_ref = std::make_shared<Function>(NodeVector{reshape1}, ParameterVector{data});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely forgot to say that in case of negative test-cases (when the transformation changes nothing) we don't need to define function_ref field of TransformationTestsF class: this is handled automatically. Sorry for that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry! Thank you very much for the small introduction into transformations tests! 😄
Removed


void SoftmaxTests::SetUp() {
const size_t count = 10;
manager.register_pass<ngraph::pass::InitNodeInfo>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to register InitNodeInfo here and on L62 because it is registered in base class

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that it's registered in base class but we just override this method (it doesn't work like ctor). But I understand your point and I added LoweringTests::SetUp() call instead of explicit pass registration.

Copy link
Collaborator

@IvanNovoselov IvanNovoselov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just discussed a tiny comment offline, but it shouldn't block the merge
Everything is as good as it could be

@dmitry-gorokhov dmitry-gorokhov merged commit f86fd91 into feature/snippets/mha Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants