[Snippets] Added Softmax Support #13449

a-sidorova · 2022-10-12T10:03:52Z

Details:

Added Softmax support to Snippets partially: added the corresponding config parameter to disable Softmax in Snippets pipeline to avoid performance regressions and enable in tests for validation
Added support of Reshape around Softmax via SoftmaxReshapeElimination pass that removes the Reshape ops

Tickets:

92363
95636

dmitry-gorokhov · 2022-11-08T07:28:14Z

@IvanNovoselov could you please start the review?

IvanNovoselov

Almost over, but there might be some more

IvanNovoselov · 2022-11-08T10:23:08Z

src/common/snippets/include/snippets/generator.hpp

@@ -117,7 +117,7 @@ class Generator {
     * @param m model in canonical for for table-based code generation
     * @return pointer to generated code
     */
-    code generate(std::shared_ptr<ov::Model>& m, const void* compile_params = nullptr) const;
+    code generate(std::shared_ptr<ov::Model>& m, const void* compile_params = nullptr, const bool need_check_rt_info = false) const;


Do we really need this flag?

We discussed it before offline. That we should insert for reduction ops Fill op in tail loop if needed. And to avoid load time regression we check for rt_info.
I understand that it's not very beautiful code but we don't have an interface for some config moving for tail loop generation. It's one way which I found
Maybe we should move common config for generation. I mean a common struct { bool, bool, ...}. What do you think? For the future possible new checks

Yes, I see your point totally.
We can just pass the whole snippets config, this would look like a more extendable solution. Probably there is no need for a special generation config at this point.
The original reason for this remark is that we both check rt_info and snippets config. Do we need both? Please see the relevant comment in generator.cpp also.

As discussed now we move whole config. Thanks!

src/common/snippets/src/op/buffer.cpp

src/common/snippets/include/snippets/op/buffer.hpp

src/common/snippets/include/snippets/op/fill.hpp

IvanNovoselov · 2022-11-08T18:08:00Z

src/common/snippets/src/pass/softmax_decomposition.cpp

+        // we should always reset data ptr after this loop because in the next Loop this ptr is used
+        const auto finalization_offsets_max =
+            std::vector<int64_t>{ calculate_required_finalization_offsets(inner_size, data->get_shape()[inner_dim]), 0, 0 };
+        const auto forse_finalization_offsets_max = std::vector<bool>{true, false, false};


To continue force_finalization_offsets discussion: here is a good example, why can't we just set two last finalization_offsets to zero instead of setting force_finalization_offsets to false?

As we discussed offline - I disabled one evaluation optimizations for subgraphs with buffers to remove force_finalization

IvanNovoselov · 2022-11-08T18:26:24Z

src/common/snippets/src/pass/softmax_decomposition.cpp

+        // we should always reset data ptr after this loop because in the next Loop this ptr is used
+        const auto finalization_offsets_max =
+            std::vector<int64_t>{ calculate_required_finalization_offsets(inner_size, data->get_shape()[inner_dim]), 0, 0 };
+        const auto forse_finalization_offsets_max = std::vector<bool>{true, false, false};


To continue with force_finalization_offsets discussion: why don't we just set the last two finalization_offsets to zero here?

As we discussed offline - I disabled one evaluation optimizations for subgraphs with buffers to remove force_finalization

IvanNovoselov · 2022-11-08T18:31:25Z

src/inference/dev_api/cpp_interfaces/interface/ie_internal_plugin_config.hpp

+ * @brief Defines if Softmax can be tokenized in Snippets
+ * @ingroup ie_dev_api_plugin_api
+ */
+DECLARE_CONFIG_KEY(SNIPPETS_SOFTMAX_ENABLE);


Do we need this key exclusively to run Softmax tests? If so, it's worth to add a todo like: remove when Softmax tokenization is fully supported in snippets + Softmax support ticket ID

IvanNovoselov · 2022-11-08T18:32:31Z

src/plugins/intel_cpu/src/config.cpp

@@ -159,6 +159,14 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
                IE_THROW() << "Wrong value for property key " << CPUConfigParams::KEY_CPU_DENORMALS_OPTIMIZATION
                << ". Expected only YES/NO";
            }
+        } else if (key == PluginConfigInternalParams::KEY_SNIPPETS_SOFTMAX_ENABLE) {


the same as for src/inference/dev_api/cpp_interfaces/interface/ie_internal_plugin_config.hpp. Leave a comment with a ticket reference maybe?

IvanNovoselov · 2022-11-08T18:40:41Z

src/plugins/intel_cpu/src/plugin.cpp

                    // CPU Plugin support Swish in Subgraph via conversion to SwichCPU which assumes second input to be constant
                    if (ov::is_type<const ov::op::v4::Swish>(n)) {
                        if (n->inputs().size() > 1 && !ov::is_type<const ov::op::v0::Constant>(n->get_input_node_shared_ptr(1)))
                            return true;
+                    } else if (ov::is_type<const ov::op::v1::Softmax>(n) || ov::is_type<const ov::op::v8::Softmax>(n)) {
+                        return !_tokenizeSoftmaxSnippets;


This will allow tokenization even if has_only_const_inputs || bad_input_rank || bad_output_rank is true. It looks like an undesired behavior (the same problem with Swish). Could you fix it please?

Good catch! Thanks a lot

IvanNovoselov

The second part

IvanNovoselov · 2022-11-09T09:51:20Z

src/common/snippets/src/generator.cpp

    auto params = m->get_parameters();
    auto results = m->get_results();
    auto in = params.size();
    auto out = results.size();
+    auto buffer = static_cast<size_t>(std::any_of(ops.begin(), ops.end(),


num_buffers or num_buffer_ops maybe?

Please also note that in the transpose PR I suggest to pass pointer to body directly to op::Kernel, so it'll be accessible from KernelEmitter. We could count buffers insede the emitter in this case. Let's discuss it offline.

IvanNovoselov · 2022-11-09T10:28:03Z

src/common/snippets/src/generator.cpp

+            const auto& forse_finalization_offsets = loop->get_forse_finalization_offsets();
+            std::vector<int64_t> new_finalization_offsets(loop->get_finalization_offsets());
+            for (auto i = 0; i < new_finalization_offsets.size(); i++) {
+                new_finalization_offsets[i] += increment * apply_increments[i] * (forse_finalization_offsets[i] || force_ptr_increment);


Let's consider the default scenario: all forse_finalization_offsets[] = true and new_finalization_offsets will be incremented even if force_ptr_increment = false. This contradicts with the previous default behavior where new_finalization_offsets are incremented only if force_ptr_increment || loop->has_outer_loop.
Or is forse_finalization_offsets[] some kind or per-port analog of loop->has_outer_loop() ? If so, then do forse_finalization_offsets[] set to false if there is no outer loop?

As we discussed offline - I disabled one evaluation optimizations for subgraphs with buffers to remove force_finalization

src/common/snippets/src/generator.cpp

IvanNovoselov · 2022-11-09T10:49:23Z

src/common/snippets/src/generator.cpp

-                    new_finalization_offsets[i] += increment * apply_increments[i];
-                }
-                loop->set_finalization_offsets(new_finalization_offsets);
+            const auto increment = loop->get_increment();


A general comment on op-specefic optimizations:
Sometimes to enable an op we need to extend existing functionality, which is perfectly fine. However, at first it could be useful to keep usage of this extended functionality localized. This way it would be easier for us to understand limitations of existing functionality and to develop a more general and extendable solution in future.
For example, I generally like you implementation of tail_transformations: we check for a flag set during softmax tokenization (or decomposition) and trigger specific pipeline. I think we should try to align with this paradigm for now.

src/common/snippets/include/snippets/op/subgraph.hpp

IvanNovoselov · 2022-11-09T11:38:28Z

src/plugins/intel_cpu/src/emitters/jit_snippets_emitters.cpp

+    prepare_table();
+
+    const auto shape = n->get_input_partial_shape(0);
+    is_scalar = shape.is_static() && shape.get_shape().back() == 1;


Not 100% sure that it's safe to rely on node shapes, since we won't do reshape every time (see the transpose PR), e.g when dimensions are collapsed. Could you check please?

IvanNovoselov · 2022-11-09T11:47:12Z

src/plugins/intel_cpu/src/emitters/jit_snippets_emitters.cpp

+        const size_t vec_size = vlen / sizeof(float);
+        h->sub(h->rsp, vlen);
+        h->uni_vmovups(h->ptr[h->rsp], src_vmm);
+        h->uni_vmovups(dst_xmm, table_val("float_min"));


Since we are using aux_gpr here anyway and the table has only one value, maybe it's better to mov immediate to aux_gpr and then broadcast? Then we can ommit using the table. What do you think?
Of course it makes sense only if we won't extend this emitter to multiple table

Sounds logically but If I correctly understand we should have additional aux_vmm to store broadcasted value. As I understand table is comfortable tool for constants. Correct me please If I'm wrong. Thanks!

src/plugins/intel_cpu/src/nodes/subgraph.cpp

IvanNovoselov · 2022-11-09T12:05:52Z

src/plugins/intel_cpu/src/nodes/subgraph.cpp

+        new_shapes.emplace_back(std::move(ns));
+    }
+    // Before body reshaping we should scale axis of Softmax
+    auto ops = snippet->get_body()->get_ops();


We agreed not to perform reshaping and to pass plugin shapes in rt_info. Please refer to the transpose PR.

IvanNovoselov · 2022-11-09T12:14:14Z

src/plugins/intel_cpu/src/nodes/subgraph.cpp

+    updateSrcDstPtrs(call_args);
+
+    std::vector<jit_snippets_call_args> per_thread_call_args(parallel_get_max_threads(), call_args);
+    if (buffer_scratchpad_size > 0) {


Not entirely sure why we need to copy jit_snipepts call args nthreads time even if we don't need a buffer. Suggestion: let's keep original signature of schedule_6d and create a schedule_6d_per_thread_buffer (also with the same signature) that will create a vector of call args inside. We did pretty much the same in the dynamism PR.

Added. Thanks!

* [Snippets] Dynamic loop snapshot * [Snippets] Explicit Loop implementation

[Snippets] Added support for Reshape around Softmax applied comment part Added config parameter to disable MHA ops tokenization Buffer 2D Loops

a-sidorova · 2022-12-23T12:50:16Z

Closed because of merging into local branch

a-sidorova force-pushed the feature/snippets/softmax branch 2 times, most recently from b819ce0 to f3760b3 Compare October 12, 2022 15:13

a-sidorova force-pushed the feature/snippets/softmax branch 3 times, most recently from 36ff257 to 3f3ff64 Compare October 31, 2022 13:40

a-sidorova force-pushed the feature/snippets/softmax branch 2 times, most recently from 250aa6f to 42b0f4d Compare November 4, 2022 14:19

a-sidorova marked this pull request as ready for review November 4, 2022 14:20

a-sidorova requested review from a team as code owners November 4, 2022 14:20

a-sidorova force-pushed the feature/snippets/softmax branch from 42b0f4d to a8ba02a Compare November 7, 2022 11:07

a-sidorova added the category: CPU OpenVINO CPU plugin label Nov 7, 2022

a-sidorova added this to the 2022.3 milestone Nov 7, 2022

a-sidorova force-pushed the feature/snippets/softmax branch from a0a3d7c to 7fe52a9 Compare November 7, 2022 15:41

dmitry-gorokhov assigned IvanNovoselov Nov 8, 2022

IvanNovoselov reviewed Nov 8, 2022

View reviewed changes

a-sidorova force-pushed the feature/snippets/softmax branch from 7fe52a9 to c32e7d6 Compare November 9, 2022 11:40

IvanNovoselov reviewed Nov 9, 2022

View reviewed changes

a-sidorova force-pushed the feature/snippets/softmax branch 4 times, most recently from 5c6ced9 to 437242d Compare November 10, 2022 14:15

dmitry-gorokhov removed this from the 2022.3 milestone Nov 16, 2022

a-sidorova force-pushed the feature/snippets/softmax branch 3 times, most recently from a664c06 to 0aaea67 Compare November 24, 2022 10:38

a-sidorova mentioned this pull request Nov 24, 2022

[Snippets] Added Softmax support a-sidorova/openvino#57

Merged

3 tasks

a-sidorova force-pushed the feature/snippets/softmax branch 2 times, most recently from feeaf07 to 2caa808 Compare November 29, 2022 06:24

[Snippets][CPU] Explicit loop (#55)

0075a5f

* [Snippets] Dynamic loop snapshot * [Snippets] Explicit Loop implementation

a-sidorova force-pushed the feature/snippets/softmax branch from 2caa808 to 26b5889 Compare November 30, 2022 09:51

IvanNovoselov added 3 commits November 30, 2022 19:24

Sns explicit tiles leftovers (#60)

a748dac

Sns transpose support (#56)

5d3e6f4

Sns transpose leftovers (#62)

084a844

a-sidorova force-pushed the feature/snippets/softmax branch 2 times, most recently from a025b91 to 026360c Compare December 5, 2022 09:23

Sns matmul support (#61)

6cb1bae

a-sidorova force-pushed the feature/snippets/softmax branch 2 times, most recently from f916fe2 to 70a7e56 Compare December 6, 2022 10:34

a-sidorova force-pushed the feature/snippets/softmax branch from 70a7e56 to 53dc219 Compare December 13, 2022 12:47

[Snippets] Added Softmax support

9d2d721

[Snippets] Added support for Reshape around Softmax applied comment part Added config parameter to disable MHA ops tokenization Buffer 2D Loops

a-sidorova force-pushed the feature/snippets/softmax branch from 53dc219 to 9d2d721 Compare December 13, 2022 12:50

a-sidorova added 2 commits December 14, 2022 16:44

Applied Dmitry comments

2ba9972

Applied Vladislav comments

1cac5d5

a-sidorova force-pushed the feature/snippets/softmax branch from 339a1b8 to 1cac5d5 Compare December 15, 2022 06:06

a-sidorova added 9 commits December 15, 2022 14:18

Applied Vladislav comments 2

4a1f00b

Applied Dmitry and Ivan comments. First part

16bffe1

Applied Ivan comments

4fbb1c2

Moved call_args into parallel section

a705f10

Refactored offset for Buffer

fcd610e

Buffer allocation rank description

75b7531

Refactored execution

0445ad3

Small fixes for tests

e7572f9

Added comment about offset propagation

8967b35

a-sidorova closed this Dec 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Snippets] Added Softmax Support #13449

[Snippets] Added Softmax Support #13449

a-sidorova commented Oct 12, 2022 •

edited

Loading

dmitry-gorokhov commented Nov 8, 2022

IvanNovoselov left a comment

IvanNovoselov Nov 8, 2022

a-sidorova Nov 9, 2022

IvanNovoselov Nov 9, 2022

a-sidorova Nov 10, 2022

IvanNovoselov Nov 8, 2022

a-sidorova Nov 10, 2022

IvanNovoselov Nov 8, 2022

a-sidorova Nov 10, 2022

IvanNovoselov Nov 8, 2022

IvanNovoselov Nov 8, 2022

IvanNovoselov Nov 8, 2022

a-sidorova Nov 9, 2022

IvanNovoselov left a comment

IvanNovoselov Nov 9, 2022

IvanNovoselov Nov 9, 2022

a-sidorova Nov 10, 2022

IvanNovoselov Nov 9, 2022

IvanNovoselov Nov 9, 2022

IvanNovoselov Nov 9, 2022

a-sidorova Nov 9, 2022

IvanNovoselov Nov 9, 2022

IvanNovoselov Nov 9, 2022

a-sidorova Nov 10, 2022

a-sidorova commented Dec 23, 2022

[Snippets] Added Softmax Support #13449

[Snippets] Added Softmax Support #13449

Conversation

a-sidorova commented Oct 12, 2022 • edited Loading

Details:

Tickets:

dmitry-gorokhov commented Nov 8, 2022

IvanNovoselov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IvanNovoselov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-sidorova commented Dec 23, 2022

a-sidorova commented Oct 12, 2022 •

edited

Loading