diff --git a/src/common/snippets/docs/snippets_design_guide.md b/src/common/snippets/docs/snippets_design_guide.md index d495b35a3fc437..fdeee2ceb6fe79 100644 --- a/src/common/snippets/docs/snippets_design_guide.md +++ b/src/common/snippets/docs/snippets_design_guide.md @@ -5,7 +5,7 @@ This document describes the design and rationale for a snippets code generator. Core **CNN operators (convolution, gemm, fully connected) are limited by compute, the rest is memory bound**. Math approximations (like transcendental functions) are rare in emerging workloads and could be treated with the same machinery. **Snippets are designed to optimize topology for memory**, while leaving compute intensive kernels for backend developers. -The **potential speedup is proportional to shrink in memory-walked bytes**. Therefore, you can transform the problem to a task to optimize for memory walks, whatever pattern snippet has and operations it contains. The number of memory walks should be less or equal to handcrafted optimizations. This guarantees performance improvements over the previous approach (excluding corner cases caused by cache effects). *Shrinkage factor might be encoded to some cost function in future evolution of code generator*. Snippets generator provides diagnostics to estimate this shrinkage factor with `ngraph::snippets::op::Subgraph::print_statistics(bool verbose)` member. +The **potential speedup is proportional to shrink in memory-walked bytes**. Therefore, you can transform the problem to a task to optimize for memory walks, whatever pattern snippet has and operations it contains. The number of memory walks should be less or equal to handcrafted optimizations. This guarantees performance improvements over the previous approach (excluding corner cases caused by cache effects). *Shrinkage factor might be encoded to some cost function in future evolution of code generator*. Snippets generator provides diagnostics to estimate this shrinkage factor with `ov::snippets::op::Subgraph::print_statistics(bool verbose)` member. The SnippetS generator is designed for back-end developers. The main purpose of inventing the snippets code generator is an **operator fusion**, **register allocation** and **target kernel generation** decomposition. This allows modifications (like new fusion support) and feature extensions (like new operation support) to be done in a single point of modification and avoid combinatorial explosion for fusions/types/architectures etc. @@ -28,7 +28,7 @@ Code generation is split into 2 phases, **tokenization** and **lowering**. ### Tokenization -Tokenization runs on full topology nGraph function inside a specific plugin in a stage of common transformations. Input of tokenization is a topology graph. Output is a modified topology graph with `ngraph::snippets::op::Subgraph` operations installed. Each subgraph contains nGraph function (called **body**) which holds a part of original topology legal for snippet generation (can be scheduled with a single schedule). +Tokenization runs on full topology nGraph function inside a specific plugin in a stage of common transformations. Input of tokenization is a topology graph. Output is a modified topology graph with `ov::snippets::op::Subgraph` operations installed. Each subgraph contains nGraph function (called **body**) which holds a part of original topology legal for snippet generation (can be scheduled with a single schedule). A procedure of finding subgraphs suitable for code generation is called **tokenization**. During tokenization the topology tree is split into subgraphs in the same greedy approach which is used for parsing input stream of characters into the tokens. It may also be seen as and modified into a basic block construction problem, since there is a leader and potentially terminators. See the example of implementation [here](https://github.com/openvinotoolkit/openvino/blob/master/src/common/snippets/src/pass/collapse_subgraph.cpp). @@ -94,7 +94,7 @@ The goal of this step is to apply target-independent and schedule-related optimi All input and output shapes are normalized to 6D for future schedule generation. If shape propagation fails or leads to inconsistent output shapes an exception is raised. -The layout assigned by a user code and passed to a `generate` function is propagated through a subgraph on this step as well. The layout is passed to a `generate` function as a `BlockedShapeVector` which is a `std::vector` , while `BlockedShape` is `std::tuple`. For example, if backend supports `NCHW16c` layout and a tensor has a size of `<1, 42, 17, 31>` and holds single precision floating point, this structure should be `std::make_tuple(ngraph::Shape {1, 3, 17, 31, 16}, ngraph::AxisVector {0, 1, 2, 3, 1}, ngraph::element::f32);`. This allows generic layout representation. +The layout assigned by a user code and passed to a `generate` function is propagated through a subgraph on this step as well. The layout is passed to a `generate` function as a `BlockedShapeVector` which is a `std::vector` , while `BlockedShape` is `std::tuple`. For example, if backend supports `NCHW16c` layout and a tensor has a size of `<1, 42, 17, 31>` and holds single precision floating point, this structure should be `std::make_tuple(ov::Shape {1, 3, 17, 31, 16}, ov::AxisVector {0, 1, 2, 3, 1}, ov::element::f32);`. This allows generic layout representation. ##### Dialect conversion @@ -191,17 +191,17 @@ Broadcast and regular streaming vector load is possible from the same pointer. B #### Target-specific optimizations -Target developers can plug in to the code generation pipeline some specific optimizations with passing `ngraph::pass::Manager` into `generate` function of `subgraph`. **Passes are executed on subgraph in canonical form converted to a snippet dialect**. +Target developers can plug in to the code generation pipeline some specific optimizations with passing `ov::pass::Manager` into `generate` function of `subgraph`. **Passes are executed on subgraph in canonical form converted to a snippet dialect**. *It might be also extended to provide an interface for target independent optimizations in future* #### Register allocation -Canonicalized subgraph in a snippets dialect forms a basic block or region inside a snippet (kernel). Registers are allocated globally for the whole subgraph. Since all operations for a subgraph are assumed to be vector, only vector registers are allocated for the first generation of SnippetS. Linear scan register allocation algorithm is used. Register allocator is implemented as the `ngraph::snippets::pass::AssignRegisters` function pass and store allocated registers for each node into `rt_info`. `rt_info` for a node holds a register for Node's output. *However, this part should be refactored better, either to become target independent or to use target-specific abstraction to acquire a new register* +Canonicalized subgraph in a snippets dialect forms a basic block or region inside a snippet (kernel). Registers are allocated globally for the whole subgraph. Since all operations for a subgraph are assumed to be vector, only vector registers are allocated for the first generation of SnippetS. Linear scan register allocation algorithm is used. Register allocator is implemented as the `ov::snippets::pass::AssignRegisters` function pass and store allocated registers for each node into `rt_info`. `rt_info` for a node holds a register for Node's output. *However, this part should be refactored better, either to become target independent or to use target-specific abstraction to acquire a new register* #### Schedule generation -The goal of this step is to transform subgraphs in a scalar notation into kernel functions callable from user code. The `Kernel` and `Tile` operations are introduced for this purpose. Each of these operations has a constructor from code region described as a collection of operation and operand pairs `Kernel(const std::vector, ngraph::snippets::RegInfo>>& region);`. +The goal of this step is to transform subgraphs in a scalar notation into kernel functions callable from user code. The `Kernel` and `Tile` operations are introduced for this purpose. Each of these operations has a constructor from code region described as a collection of operation and operand pairs `Kernel(const std::vector, ov::snippets::RegInfo>>& region);`. The example above can be used for the following hierarchical IR. If the scope to layout oblivious operations with broadcasting support is limited, `Tile` could be generated as a single loop over the most warning dimension. The second `Tile` is generated to handle tails and can be omitted if not needed. A special pass replaces memory operations on vector with scalar versions for tail subgraph. @@ -253,7 +253,7 @@ Where A target code emission is table based. A target is responsible for filling `jitters` table field in `Generator` class. ``` -std::map(std::shared_ptr)>> jitters; +std::map(std::shared_ptr)>> jitters; ``` ##### Interface with a target @@ -279,7 +279,7 @@ Once a schedule is generated, a target code is emitted from a kernel in `Generat A target can potentially extend the snippets dialect with a target-specific operation for code emission. It should implement: -* nGraph operation (for example, `class FMA : public ngraph::op::Op`) +* nGraph operation (for example, `class FMA : public ov::op::Op`) * Emitter for the operation (for example, `class FmaEmitter : public Emitter` ) * register the pair in `jitters` map diff --git a/src/common/snippets/include/snippets/emitter.hpp b/src/common/snippets/include/snippets/emitter.hpp index 4298d54890c5aa..7c88d07d2660b2 100644 --- a/src/common/snippets/include/snippets/emitter.hpp +++ b/src/common/snippets/include/snippets/emitter.hpp @@ -6,9 +6,10 @@ #include #include -#include "ngraph/node.hpp" -namespace ngraph { +#include "openvino/core/node.hpp" + +namespace ov { namespace snippets { using code = const uint8_t *; @@ -24,11 +25,9 @@ class Emitter { /** * @brief Default constructor */ - Emitter(const std::shared_ptr& n) { - } + Emitter(const std::shared_ptr& n) {} - Emitter(std::vector, RegInfo>>& region) { - } + Emitter(std::vector, RegInfo>>& region) {} /** * @brief called by generator to generate code to produce target code for a specific operation @@ -47,12 +46,12 @@ class Emitter { * @brief called by generator to generate data section, if needed for a specific operation * @return void */ - virtual void emit_data() const { - } + virtual void emit_data() const {} + virtual ~Emitter() = default; }; -using AllocatedEmitter = std::pair, ngraph::snippets::RegInfo>; +using AllocatedEmitter = std::pair, ov::snippets::RegInfo>; } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/generator.hpp b/src/common/snippets/include/snippets/generator.hpp index 25f605ba7ede4b..1651679df15a22 100644 --- a/src/common/snippets/include/snippets/generator.hpp +++ b/src/common/snippets/include/snippets/generator.hpp @@ -9,73 +9,12 @@ #pragma once #include "snippets_isa.hpp" -#include "emitter.hpp" -namespace ngraph { -namespace snippets { - -auto getRegisters(std::shared_ptr& n) -> ngraph::snippets::RegInfo; - -typedef std::pair(const std::shared_ptr&)>, - std::function>(const std::shared_ptr&)>> jitters_value; -/** - * @interface TargetMachine - * @brief Base class Target machine representation. Target derives from this class to provide generator information about supported emitters - * @ingroup snippets - */ -class TargetMachine { -public: - /** - * @brief checks if target is natively supported - * @return true, if supported - */ - virtual bool is_supported() const = 0; - - /** - * @brief finalizes code generation - * @return generated kernel binary - */ - virtual code get_snippet() const = 0; - - /** - * @brief gets number of lanes supported by target's vector ISA - * @return number of lanes - */ - virtual size_t get_lanes() const = 0; - - /** - * @brief called by generator to all the emitter for a target machine - * @return a map by node's type info with callbacks to create an instance of emitter for corresponding operation type - */ - std::function(std::shared_ptr)> get(const ngraph::DiscreteTypeInfo type) const { - auto jitter = jitters.find(type); - if (jitter == jitters.end()) { - OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation."); - } - return jitter->second.first; - } - - std::function>(const std::shared_ptr&)> - get_supported_precisions(const ngraph::DiscreteTypeInfo type) const { - auto jitter = jitters.find(type); - if (jitter == jitters.end()) { - OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation."); - } - return jitter->second.second; - } +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/pass/pass.hpp" - /** - * @brief checks if emitter for a specific operation is supported - * @return true, if supported - */ - bool has(const ngraph::DiscreteTypeInfo type) const { - return jitters.find(type) != jitters.end(); - } - virtual ~TargetMachine() = default; - -protected: - std::map jitters; -}; +namespace ov { +namespace snippets { /** * @interface Schedule @@ -117,7 +56,7 @@ class Generator { /** * @brief Default constructor */ - Generator(const std::shared_ptr& t) : target(t) {} + Generator(const std::shared_ptr& t) : target(t), lowered_saved{} {} /** * @brief Default destructor */ @@ -126,19 +65,6 @@ class Generator { * @interface GeneratorConfig * @brief Allows to tweak the lowering process. */ - class GeneratorConfig { - public: - // True if the lowered Emitters need to be accessed during runtime. Normally they're destroyed after code emission. - bool m_save_lowered_code = false; - // True if we can optimize tails for single evaluation during code generation - // More details with optimization examples you can see in generate() method - // For example, tails with Buffer ops doesn't support single evaluation optimizations - // because of that we should always reset memory pointer using finalization offsets - // after data storing to Buffer - bool m_optimize_single_evaluation = true; - // True if we should check runtime info for nodes to call specific needed transformations - bool m_need_fill_tail_register = false; - }; /** * @brief virtual method any specific implementation should implement * @param m model in canonical for for table-based code generation @@ -146,7 +72,11 @@ class Generator { * @param compile_params parameters for generated code * @return pointer to generated code */ - code generate(std::shared_ptr& m, const GeneratorConfig& config, const void* compile_params = nullptr); + struct LoweringResult { + LoweringResult(code c) : binary_code(c) {} + code binary_code = nullptr; + }; + LoweringResult generate(lowered::LinearIR& linear_ir, const lowered::Config& config, const void* compile_params = nullptr); /** * @brief gets target machine @@ -180,8 +110,8 @@ class Generator { std::shared_ptr target; // todo: we need to save lowered code to access compiled brgemm kernels on execution time (normally lowered is destructed by then). // This is temporary solution, remove this when kernel caching is implemented. Don't forget to make generate const method. - std::vector lowered_saved; + lowered::LinearIR lowered_saved; }; } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/itt.hpp b/src/common/snippets/include/snippets/itt.hpp index f13b1de6db7dbd..0c594165ab5776 100644 --- a/src/common/snippets/include/snippets/itt.hpp +++ b/src/common/snippets/include/snippets/itt.hpp @@ -11,15 +11,15 @@ #include -namespace ngraph { +namespace ov { namespace pass { namespace itt { namespace domains { - OV_ITT_DOMAIN(SnippetsTransform); +OV_ITT_DOMAIN(SnippetsTransform); } // namespace domains } // namespace itt } // namespace pass -} // namespace ngraph +} // namespace ov OV_CC_DOMAINS(internal_op); diff --git a/src/common/snippets/include/snippets/lowered/expression.hpp b/src/common/snippets/include/snippets/lowered/expression.hpp new file mode 100644 index 00000000000000..b5ed04b6f4ed15 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/expression.hpp @@ -0,0 +1,99 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include +#include + +#include "snippets/emitter.hpp" +#include "snippets/target_machine.hpp" +#include "snippets/lowered/port_connector.hpp" +#include "snippets/lowered/expression_port.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { + +class LinearIR; + +class Expression : public std::enable_shared_from_this { + friend class LinearIR; + friend class ExpressionPort; + +public: + static size_t LOOP_NULL_ID; + + Expression() = default; + virtual ~Expression() = default; + + std::shared_ptr get_node() const; + std::shared_ptr get_emitter() const; + + RegInfo get_reg_info() const; + void set_reg_info(RegInfo rinfo); + + const PortConnectorPtr& get_input_port_connector(size_t i) const; + const PortConnectorPtr& get_output_port_connector(size_t i) const; + std::vector get_input_port_connectors() const { return m_input_port_connectors; } + std::vector get_output_port_connectors() const { return m_output_port_connectors; } + + const PortDescriptorPtr& get_input_port_descriptor(size_t i) const; + const PortDescriptorPtr& get_output_port_descriptor(size_t i) const; + std::vector get_input_port_descriptors() const { return m_input_port_descriptors; } + std::vector get_output_port_descriptors() const { return m_output_port_descriptors; } + + size_t get_input_count() const { return m_input_port_connectors.size(); } + size_t get_output_count() const { return m_output_port_connectors.size(); } + + std::vector get_loop_ids() const { return m_loop_ids; } + void set_loop_ids(const std::vector& loops) { m_loop_ids = loops; } + void set_loop_id(size_t id, size_t idx); + void remove_loop_id(size_t id); + + void validate() const; + void init_emitter(const std::shared_ptr& target); + + ExpressionPort get_input_port(size_t i); + ExpressionPort get_output_port(size_t i); + +protected: + // Note: The constructor initialization is private since an expression can be created only by Linear IR. + // The method must be used only by Linear IR builder of expressions! + explicit Expression(const std::shared_ptr& n); + + void replace_input(size_t port, PortConnectorPtr to); + + std::shared_ptr m_source_node{nullptr}; + std::shared_ptr m_emitter{nullptr}; + std::vector m_input_port_connectors{}; + std::vector m_output_port_connectors{}; + std::vector m_input_port_descriptors{}; + std::vector m_output_port_descriptors{}; + // The order Loops identifies: Outer ---> Inner + std::vector m_loop_ids; +}; +using ExpressionPtr = std::shared_ptr; + +class IOExpression : public Expression { + friend class LinearIR; + +public: + enum class io_type {INPUT, OUTPUT, UNDEFINED}; + + int64_t get_index() const { return m_index; } + io_type get_type() const { return m_type; } + +private: + explicit IOExpression(const std::shared_ptr& n, int64_t index); + explicit IOExpression(const std::shared_ptr& n, int64_t index); + + int64_t m_index = -1; + io_type m_type = io_type::UNDEFINED; +}; + +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/expression_factory.hpp b/src/common/snippets/include/snippets/lowered/expression_factory.hpp new file mode 100644 index 00000000000000..947bbd3c823c20 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/expression_factory.hpp @@ -0,0 +1,55 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "linear_ir.hpp" + +#include "snippets/snippets_isa.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +class LinearIR::ExpressionFactory { +public: + template + static ExpressionPtr build(const std::shared_ptr& n, Args&&... params) { + if (const auto par = ov::as_type_ptr(n)) { + return create(par, params...); + } else if (const auto res = ov::as_type_ptr(n)) { + return create(res, params...); + } else if (const auto loop_begin = ov::as_type_ptr(n)) { + return create(loop_begin, params...); + } else if (const auto loop_end = ov::as_type_ptr(n)) { + return create(loop_end, params...); + } + return create(n, params...); + } + +private: + /* -- Default Builders - initialize input port connectors from parents and create new output port connectors themselves */ + static ExpressionPtr create(const std::shared_ptr& par, const LinearIR& linear_ir, + const std::shared_ptr& model); + static ExpressionPtr create(const std::shared_ptr& res, const LinearIR& linear_ir, + const std::shared_ptr& model); + static ExpressionPtr create(const std::shared_ptr& n, const LinearIR& linear_ir, + const std::shared_ptr& model); + + /* -- Input Builders - get input port connectors from method parameters and create new output port connectors themselves */ + static ExpressionPtr create(const std::shared_ptr& n, const std::vector& inputs); + static ExpressionPtr create(const std::shared_ptr& n, const std::vector& inputs); + static ExpressionPtr create(const std::shared_ptr& n, const std::vector& inputs); + + // Creates inputs for expression using parent output port connectors + static void create_expression_inputs(const LinearIR& linear_ir, const ExpressionPtr& expr); + // Creates new output port connectors + static void create_expression_outputs(const ExpressionPtr& expr); + // The method verifies of input port connectors to availability of the expression as consumer and add it if missed + static void init_expression_inputs(const ExpressionPtr& expr, const std::vector& inputs); +}; + +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/expression_port.hpp b/src/common/snippets/include/snippets/lowered/expression_port.hpp new file mode 100644 index 00000000000000..4db3cb2cec719c --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/expression_port.hpp @@ -0,0 +1,51 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include +#include + +#include "port_descriptor.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { + +class PortConnector; +class Expression; +class ExpressionPort { +public: + enum Type { + Input, + Output + }; + + ExpressionPort() = default; + explicit ExpressionPort(const std::shared_ptr& expr, Type type, size_t port); + + const std::shared_ptr& get_expr() const { return m_expr; } + Type get_type() const { return m_type; } + size_t get_index() const { return m_port_index; } + + const PortDescriptorPtr& get_descriptor_ptr() const; + const std::shared_ptr& get_port_connector_ptr() const; + // Returns connected ports to the current: + // - Input port returns one source (parent) port + // - Output port returns all consumer ports (children) + std::set get_connected_ports() const; + + friend bool operator==(const ExpressionPort& lhs, const ExpressionPort& rhs); + friend bool operator!=(const ExpressionPort& lhs, const ExpressionPort& rhs); + friend bool operator<(const ExpressionPort& lhs, const ExpressionPort& rhs); + +private: + std::shared_ptr m_expr; + Type m_type = Type::Output; + size_t m_port_index = 0; +}; +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/linear_ir.hpp b/src/common/snippets/include/snippets/lowered/linear_ir.hpp new file mode 100644 index 00000000000000..3ea2464829fa0c --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/linear_ir.hpp @@ -0,0 +1,112 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include + +#include "expression.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +class Config { +public: + // True if the lowered Emitters need to be accessed during runtime. Normally they're destroyed after code emission. + bool m_save_expressions = false; + // True if we should check runtime info for nodes to call specific needed transformations + bool m_need_fill_tail_register = false; + size_t m_loop_depth = 1; +}; + +/* The control flow of Snippets is built on Linear Intermediate Representation (Linear IR). + * The class diagram is described in the documentation `snippets/docs/snippets_design_guide.md`. + */ +class LinearIR { + class ExpressionFactory; +public: + using container = std::list; + using io_container = std::list>; + using exprIt = container::iterator; + using constExprIt = container::const_iterator; + + LinearIR() = default; + explicit LinearIR(const std::shared_ptr& m, Config config = {}); + + ExpressionPtr create_expression(const std::shared_ptr& n, const std::vector& inputs); + + static LinearIR::container deep_copy_range(LinearIR::container::const_iterator begin, LinearIR::container::const_iterator end); + + const container& get_ops() const {return m_expressions; } + const io_container& get_IO_ops() const {return m_io_expressions; } + Config get_config() {return m_config; } + + const ExpressionPtr& get_expr_by_node(const std::shared_ptr& n) const; + + void replace_input(const std::set& consumers, const PortConnectorPtr& to); + void replace_input(const ExpressionPort& expr_port, const PortConnectorPtr& to); + + /** + * @brief Move an expression from the position "from" to the position immediately before "to". + * Note: this method does NOT take care about data dependencies and no relevant checks are performed. + * and doesn't touch internal maps. + */ + void move(constExprIt from, constExprIt to); + + bool empty() const noexcept {return m_expressions.empty(); } + void debug_print(bool tds_as_pointers = false) const; + + container::reference back() noexcept {return m_expressions.back();} + container::const_reference back() const noexcept {return m_expressions.back();} + container::reference front() noexcept {return m_expressions.front();} + container::const_reference front() const noexcept {return m_expressions.front();} + + exprIt begin() noexcept {return m_expressions.begin();} + exprIt end() noexcept {return m_expressions.end();} + constExprIt begin() const noexcept {return cbegin();} + constExprIt end() const noexcept {return cend();} + constExprIt cbegin() const noexcept {return m_expressions.cbegin();} + constExprIt cend() const noexcept {return m_expressions.cend();} + container::reverse_iterator rbegin() noexcept {return m_expressions.rbegin();} + container::reverse_iterator rend() noexcept {return m_expressions.rend();} + container::const_reverse_iterator crbegin() const noexcept {return m_expressions.crbegin();} + container::const_reverse_iterator crend() const noexcept {return m_expressions.crend();} + + exprIt insert(constExprIt pos, const ov::NodeVector& nodes); + exprIt insert(constExprIt pos, const std::shared_ptr& n); + exprIt insert(constExprIt pos, container::value_type&& value); + exprIt insert(constExprIt pos, const container::value_type& value); + exprIt insert(constExprIt pos, exprIt begin, exprIt end); + exprIt insert(constExprIt pos, constExprIt begin, constExprIt end); + + exprIt erase(exprIt pos); + exprIt erase(constExprIt pos); + + void init_emitters(const std::shared_ptr& target); + void serialize(const std::string& xml, const std::string& bin); + + class LoopManager; + using LoopManagerPtr = std::shared_ptr; + + const LoopManagerPtr& get_loop_manager() const { return m_loop_manager; } + +private: + static ov::NodeVector get_ordered_ops(const std::shared_ptr& model); + // Default ctor - can be called only from Linear IR initialization as default way + ExpressionPtr create_expression(const std::shared_ptr& n, const std::shared_ptr& model = nullptr); + + void register_expression(const ExpressionPtr& expr, bool io_allowed = false); + void unregister_expression(const ExpressionPtr& expr); + + container m_expressions{}; + std::unordered_map, std::shared_ptr> m_node2expression_map; + io_container m_io_expressions; + Config m_config{}; + LoopManagerPtr m_loop_manager = nullptr; +}; + +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/loop_manager.hpp b/src/common/snippets/include/snippets/lowered/loop_manager.hpp new file mode 100644 index 00000000000000..01b91a187e216f --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/loop_manager.hpp @@ -0,0 +1,83 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "linear_ir.hpp" + +#include +#include + +#include "port_descriptor.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +class LinearIR::LoopManager { +public: + LoopManager() = default; + + class LoopInfo { + public: + LoopInfo() = default; + LoopInfo(size_t work_amount, size_t increment, + const std::vector& entries, + const std::vector& exits) + : work_amount(work_amount), increment(increment), entry_exprs(entries), exit_exprs(exits) {} + size_t work_amount = 0; + size_t increment = 0; + // The order of entry and exit expressions is important: + // - The position before first entry expr is Loop Begin position + // - The position after last exit expr is Loop End position + // Note: Scalars aren't entry expressions but can be before first entry expr in Linear IR + std::vector entry_exprs = {}; + std::vector exit_exprs = {}; + }; + using LoopInfoPtr = std::shared_ptr; + + size_t add_loop_info(const LoopInfoPtr& loop); + void remove_loop_info(size_t index); + LoopInfoPtr get_loop_info(size_t index) const; + size_t get_loop_count() const { return m_map.size(); } + const std::map& get_map() const; + + void mark_loop(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + size_t loop_depth, size_t vector_size); + void mark_loop(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + size_t idx, + size_t work_amount, + size_t work_amount_increment, + const std::vector& entries, + const std::vector& exits); + + void get_loop_bounds(const LinearIR& linear_ir, + size_t loop_id, + LinearIR::constExprIt& loop_begin_pos, + LinearIR::constExprIt& loop_end_pos) const; + static void get_loop_bounds(const LinearIR& linear_ir, + const std::vector& entries, + const std::vector& exits, + LinearIR::constExprIt& loop_begin_pos, + LinearIR::constExprIt& loop_end_pos, + size_t loop_id = Expression::LOOP_NULL_ID); + +private: + static void exprs_marking(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + size_t loop_id, size_t idx); + static void get_io_loop_ports(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + std::vector& entries, + std::vector& exits); + + std::map m_map = {}; + size_t next_id = 0; +}; + +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/allocate_buffers.hpp b/src/common/snippets/include/snippets/lowered/pass/allocate_buffers.hpp new file mode 100644 index 00000000000000..dd25b5872f5379 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/allocate_buffers.hpp @@ -0,0 +1,42 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" +#include "snippets/snippets_isa.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface AllocateBuffers + * @brief The pass calculates common size of buffer scratchpad and propagates Buffer offsets to connected MemoryAccess operations. + * Notes: + * - The pass implicitly regulates InPlace processing for some Buffers when it's possible. + * The pass don't allocate new memory for InPlace Buffers, we propagate the same offsets for them. + * - The pass should be splitted into two passes: ProcessInplace (markup of Buffers which can use the same memory) + * and AllocateBuffer (allocate memory for Buffers using MemorySolver which can optimally reuse memory). + * @ingroup snippets + */ + +class AllocateBuffers : public Pass { +public: + OPENVINO_RTTI("AllocateBuffers", "Pass") + bool run(lowered::LinearIR& linear_ir) override; + + size_t get_scratchpad_size() const { return m_buffer_scratchpad_size; } + +private: + static void propagate_offset(const LinearIR& linear_ir, const ExpressionPtr& buffer_expr, size_t offset); + + size_t m_buffer_scratchpad_size = 0; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/assign_registers.hpp b/src/common/snippets/include/snippets/lowered/pass/assign_registers.hpp new file mode 100644 index 00000000000000..4425b4e59d8f77 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/assign_registers.hpp @@ -0,0 +1,35 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" +#include "snippets/generator.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface AssignRegisters + * @brief Assigns in/out abstract registers indexes to every operation. + * Note that changing of the IR is likely to invalidate register assignment. + * @ingroup snippets + */ +class AssignRegisters : public Pass { +public: + OPENVINO_RTTI("AssignRegisters", "Pass") + explicit AssignRegisters(const std::function& op)>& mapper) : m_reg_type_mapper(mapper) {} + bool run(LinearIR& linear_ir) override; + +private: + std::function& op)> m_reg_type_mapper; + static constexpr size_t reg_count = 16lu; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/clean_repeated_ptr_shifts.hpp b/src/common/snippets/include/snippets/lowered/pass/clean_repeated_ptr_shifts.hpp new file mode 100644 index 00000000000000..892137747a2776 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/clean_repeated_ptr_shifts.hpp @@ -0,0 +1,38 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface CleanRepeatedDataPointerShifts + * @brief The pass `fuses` (reset) ptr increments and finalization offsets for ports of Loop + * with the same data expression (Buffer with the same ID, the same parent of Loads) to avoid double ptr shifts + * Note: Buffer always employ inplace logics by default. It means that if a loop has both + * an input and an output connected to Buffers, the corresponding register should nevertheless be + * incremented only once (because when the input reg is incremented, output incremented automatically). + * This condition should be removed when Buffers stop being inplace by default. + * @ingroup snippets + */ +class CleanRepeatedDataPointerShifts: public Pass { +public: + OPENVINO_RTTI("CleanRepeatedDataPointerShifts", "Pass") + CleanRepeatedDataPointerShifts() = default; + + bool run(LinearIR& linear_ir) override; + +private: + bool reuse_increments(const LinearIR& linear_ir, const ExpressionPtr& loop_end_expr); +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/cleanup_loop_offsets.hpp b/src/common/snippets/include/snippets/lowered/pass/cleanup_loop_offsets.hpp new file mode 100644 index 00000000000000..5af01ad137e09b --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/cleanup_loop_offsets.hpp @@ -0,0 +1,29 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface CleanupLoopOffsets + * @brief Loops are inserted with finalization offsets that reset all managed pointers to their initial values. + * This transformation "fuses" the offsets with an outer loop's ptr_increments, and zeroes the offsets before Results. + * @ingroup snippets + */ +class CleanupLoopOffsets : public Pass { +public: + OPENVINO_RTTI("CleanupLoopOffsets", "Pass") + bool run(LinearIR& linear_ir) override; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/fuse_loops.hpp b/src/common/snippets/include/snippets/lowered/pass/fuse_loops.hpp new file mode 100644 index 00000000000000..e67673dcd2a2ad --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/fuse_loops.hpp @@ -0,0 +1,61 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +#include "snippets/lowered/loop_manager.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface FuseLoops + * @brief The pass fuses marking Loops. The transformations support the following fusions of loops: + * + * - Upper Loop is fused into the Current Loop + * Loop_0 (Upper) | + * | => | + * Loop_1 (Current) Loop_0 + Loop_1 => new `Loop_1` + * * It's possible only if other consumers of Loop_0 are after Loop_1 in Linear IR. + * Because Upper Loop_0 will be explicitly moved before Current Loop_1 in linear IR, + * and we must save control dependency (to avoid cases when after fusion some consumers of Loop_0 are before this Loop) + * + * - Lower Loop is fused into the Current Loop + * Loop_0 (Current) Loop_0 + Loop_1 => new `Loop_0` + * | => | + * Loop_1 (Lower) | + * * It's possible only if other parents of Loop_1 are before Loop_0 in Linear IR. + * Because Lower Loop_1 will be explicitly moved after Current Loop_0 in linear IR, + * and we must save control dependency (to avoid cases when after fusion some parents of Loop_1 are after this Loop) + * + * The main conditions of possible fusion is the equal increments and the equal/broadcastable work amounts. + * @ingroup snippets + */ +class FuseLoops : public Pass { +public: + OPENVINO_RTTI("FuseLoops", "Pass") + FuseLoops(); + bool run(LinearIR& linear_ir) override; + +private: + static bool can_be_fused(const LinearIR::LoopManager::LoopInfoPtr& loop_current, + const LinearIR::LoopManager::LoopInfoPtr& loop_target); + static bool fuse_upper_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_entry_point, + size_t current_loop_id, size_t target_loop_id, size_t dim_idx, + LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos); + static bool fuse_lower_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_entry_point, + size_t current_loop_id, size_t target_loop_id, size_t dim_idx, + LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos); + static void fuse_points(std::vector& exit_points, std::vector& entry_points, + LinearIR::constExprIt loop_begin_pos, LinearIR::constExprIt loop_end_pos); +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/identify_buffers.hpp b/src/common/snippets/include/snippets/lowered/pass/identify_buffers.hpp new file mode 100644 index 00000000000000..e71758a4b97bcd --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/identify_buffers.hpp @@ -0,0 +1,48 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +#include "snippets/op/buffer.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface IdentifyBuffers + * @brief The pass set identifiers for Buffers in common Buffer system. + * The buffers with the same identifier will be assigned the same data register. + * The pass uses greedy graph coloring algorithm using adjacency matrix: + * - Buffers - are vertices of graph; + * - Loops, Brgemm (the same other ops) - are "edges" between Buffers (hub of edges). + * The buffers are connected to the same Loop - are adjacent in graph sense bounds. + * - The vertices (buffers) are adjacent if they are connected to the same Loop and + * their data pointers cannot be proportionally incremented in Loops: different ptr increments or data sizes; + * - Firstly, create adjacency matrix using the definition above; + * - Secondly, assign the same color to non-adjacent vertices of graph (buffers), and use different colors otherwise. + * Note: should be called before ResetBuffer() pass to have correct offsets + * @ingroup snippets + */ +class IdentifyBuffers: public Pass { +public: + OPENVINO_RTTI("IdentifyBuffers", "Pass") + IdentifyBuffers() = default; + + bool run(LinearIR& linear_ir) override; + +private: + using BufferSet = std::vector>; + + std::vector create_adjacency_matrix(const LinearIR& linear_ir, const BufferSet& buffers) const; + std::map coloring(BufferSet& buffers, std::vector& adj); +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/init_loops.hpp b/src/common/snippets/include/snippets/lowered/pass/init_loops.hpp new file mode 100644 index 00000000000000..32266dca1d0f56 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/init_loops.hpp @@ -0,0 +1,41 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +#include "snippets/lowered/loop_manager.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface InitLoops + * @brief The pass explicitly insert LoadBegin and LoadEnd in Linear IR using LoopManager::LoopInfo from Loop markup algorithm + * @ingroup snippets + */ +class InitLoops : public Pass { +public: + OPENVINO_RTTI("InsertLoops", "Pass") + InitLoops(); + bool run(LinearIR& linear_ir) override; + +private: + static void insertion(LinearIR& linear_ir, const LinearIR::LoopManager::LoopInfoPtr& loop_info, + size_t loop_id, size_t dim_idx, bool has_outer_loop); + static std::vector init_ptr_increments(const std::vector& loop_inputs, + const std::vector& loop_outputs, + size_t dim_idx); + static std::vector init_finalization_offsets(const std::vector& finalization_offsets, size_t work_amount); + static std::vector init_element_type_sizes(const std::vector& loop_inputs, + const std::vector& loop_outputs); +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/insert_buffers.hpp b/src/common/snippets/include/snippets/lowered/pass/insert_buffers.hpp new file mode 100644 index 00000000000000..6215bc43f1ba2f --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/insert_buffers.hpp @@ -0,0 +1,43 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface InsertBuffers + * @brief The pass inserts Buffer between exit points of one loop (or Brgemm) and + * entry points of another loop (or Brgemm) to store intermediate data. + * The pass should be called after FuseLoops. + * @param m_buffer_allocation_rank - rank of shape for memory allocation: shape[shape_rank - normalize(m_allocation_rank) : shape_rank] + * @ingroup snippets + */ +class InsertBuffers : public Pass { +public: + OPENVINO_RTTI("InsertBuffers", "Pass") + InsertBuffers(int32_t buffer_allocation_rank); + bool run(LinearIR& linear_ir) override; + +private: + void insertion(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, size_t loop_id, + const std::vector& loop_entries, const std::vector& loop_exits); + + LinearIR::constExprIt insertion_position(const LinearIR& linear_ir, + const LinearIR::LoopManagerPtr& loop_manager, + const ExpressionPtr& up_expr, + const ExpressionPtr& down_expr); + + int32_t m_buffer_allocation_rank; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/include/snippets/lowered/pass/insert_load_store.hpp b/src/common/snippets/include/snippets/lowered/pass/insert_load_store.hpp new file mode 100644 index 00000000000000..6b87b8dfa6b5fe --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/insert_load_store.hpp @@ -0,0 +1,44 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +#include "snippets/lowered/loop_manager.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface InsertLoadStore + * @brief The pass inserts Load and Store expressions in Linear IR after Parameters, Buffers and before Results, Buffers accordingly. + * Note: The pass should be called after FuseLoops and InsertBuffers passes to have all possible data expressions. + * @param m_vector_size - the count of elements for loading/storing + * @ingroup snippets + */ +class InsertLoadStore : public Pass { +public: + explicit InsertLoadStore(size_t vector_size); + OPENVINO_RTTI("InsertLoadStore", "Pass") + bool run(LinearIR& linear_ir) override; + +private: + bool insert_load(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it); + bool insert_store(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it); + void update_loops(const LinearIR::LoopManagerPtr& loop_manager, const std::vector& loop_ids, + const ExpressionPort& actual_port, const std::vector& target_ports, bool is_entry = true); + void update_loop(const LinearIR::LoopManager::LoopInfoPtr& loop_info, + const ExpressionPort& actual_port, const std::vector& target_ports, bool is_entry = true); + size_t get_count(const PortDescriptorPtr& port_desc) const; + + size_t m_vector_size; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/insert_tail_loop.hpp b/src/common/snippets/include/snippets/lowered/pass/insert_tail_loop.hpp new file mode 100644 index 00000000000000..ae6eca60af52c3 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/insert_tail_loop.hpp @@ -0,0 +1,33 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface InsertTailLoop + * @brief Injects tail-processing loop after a vector loop if required. + * Additional optimizations are performed if a loop body is executed only once. + * @ingroup snippets + */ +class InsertTailLoop : public Pass { + static void tail_transformations(LinearIR& linear_ir, + LinearIR::container::const_iterator tail_begin, + LinearIR::container::const_iterator tail_end, + size_t tail_size); +public: + OPENVINO_RTTI("InsertTailLoop", "Pass") + bool run(LinearIR& linear_ir) override; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/load_movebroadcast_to_broadcastload.hpp b/src/common/snippets/include/snippets/lowered/pass/load_movebroadcast_to_broadcastload.hpp similarity index 53% rename from src/common/snippets/include/snippets/pass/load_movebroadcast_to_broadcastload.hpp rename to src/common/snippets/include/snippets/lowered/pass/load_movebroadcast_to_broadcastload.hpp index 95c50fa0230963..769208842e9338 100644 --- a/src/common/snippets/include/snippets/pass/load_movebroadcast_to_broadcastload.hpp +++ b/src/common/snippets/include/snippets/lowered/pass/load_movebroadcast_to_broadcastload.hpp @@ -4,24 +4,26 @@ #pragma once -#include -#include +#include "pass.hpp" -namespace ngraph { +namespace ov { namespace snippets { +namespace lowered { namespace pass { /** * @interface LoadMoveBroadcastToBroadcastLoad * @brief Fuses consecutive Load and MoveBroadcast into a single load insctruction. - * The pass is used to convert model to a canonical form for code generation * @ingroup snippets */ -class LoadMoveBroadcastToBroadcastLoad: public ngraph::pass::MatcherPass { +class LoadMoveBroadcastToBroadcastLoad: public Pass { public: - LoadMoveBroadcastToBroadcastLoad(); + LoadMoveBroadcastToBroadcastLoad() = default; + OPENVINO_RTTI("LoadMoveBroadcastToBroadcastLoad", "Pass") + bool run(LinearIR& linear_ir) override; }; } // namespace pass +} // namespace lowered } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/mark_loops.hpp b/src/common/snippets/include/snippets/lowered/pass/mark_loops.hpp new file mode 100644 index 00000000000000..048f9457ddb455 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/mark_loops.hpp @@ -0,0 +1,36 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface MarkLoops + * @brief The pass marks expressions with Loop IDs. + * The pass iterates expression by expression till the following conditions: + * - the layouts and subtensors them are the same + * - the consumer of the expression is explicitly after this expression - the pass marks the branches + * @ingroup snippets + */ +class MarkLoops : public Pass { +public: + OPENVINO_RTTI("MarkLoops", "Pass") + MarkLoops(size_t vector_size); + bool run(LinearIR& linear_ir) override; + +private: + size_t m_vector_size; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/move_result_out_of_loop.hpp b/src/common/snippets/include/snippets/lowered/pass/move_result_out_of_loop.hpp new file mode 100644 index 00000000000000..c0428a60fe8fea --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/move_result_out_of_loop.hpp @@ -0,0 +1,32 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface MoveResultOutOfLoop + * @brief After passes with Loop work Result expressions might be inside Loop. + * It means that Result can be before his Parent and LoopEnd, this situation breaks control dependency and + * create cycle dependency in AssignRegister algorithm. + * The pass extracts Result expressions from Loop and insert after. + * @ingroup snippets + */ +class MoveResultOutOfLoop : public Pass { +public: + OPENVINO_RTTI("MoveResultOutOfLoop", "Pass") + MoveResultOutOfLoop() = default; + bool run(LinearIR& linear_ir) override; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/move_scalar_to_consumer.hpp b/src/common/snippets/include/snippets/lowered/pass/move_scalar_to_consumer.hpp new file mode 100644 index 00000000000000..ba2cfcbb755e9b --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/move_scalar_to_consumer.hpp @@ -0,0 +1,35 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface MoveScalarToConsumer + * @brief As a result of loop insertion or fusion, Scalar operations might end up outside of the loop where their + * consumer is located. This transformation moves every scalar right before its consumer. This is needed to guarantee + * computation validity and also to optimize register allocation. + * Details: + * If ScalarEmitters are called outside the Loop, and only the first Loop iteration would yield correct data + * (assuming the vector reg assigned to scalar will get corrupted inside the loop body). + * To avoid such cases, we move Constants to the places in Linear IR before right Consumer to execute Scalar on each Loop iteration. + * @ingroup snippets + */ +class MoveScalarToConsumer : public Pass { +public: + OPENVINO_RTTI("MoveScalarsToConsumer", "Pass") + MoveScalarToConsumer() = default; + bool run(LinearIR& linear_ir) override; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/pass.hpp b/src/common/snippets/include/snippets/lowered/pass/pass.hpp new file mode 100644 index 00000000000000..3410f8f8fa89a7 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/pass.hpp @@ -0,0 +1,67 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "snippets/lowered/linear_ir.hpp" + +#include "openvino/core/rtti.hpp" +#include "openvino/core/type.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface Pass + * @brief Base class for transformations on linear IR + * @ingroup snippets + */ +class Pass { +public: + Pass() = default; + virtual ~Pass() = default; + // Note that get_type_info_static and get_type_info are needed to mimic OPENVINO_RTTI interface, + // so the standard OPENVINO_RTTI(...) macros could be used in derived classes. + _OPENVINO_HIDDEN_METHOD static const ::ov::DiscreteTypeInfo& get_type_info_static() { + static ::ov::DiscreteTypeInfo type_info_static {"Pass"}; + type_info_static.hash(); + return type_info_static; + } + + virtual const DiscreteTypeInfo& get_type_info() const { + return get_type_info_static(); + } + + const char* get_type_name() const { + return get_type_info().name; + } + + virtual bool run(lowered::LinearIR& linear_ir) = 0; +}; + +class PassPipeline { +public: + PassPipeline() = default; + + void register_pass(const std::shared_ptr& pass); + + template + void register_pass(Args&&... args) { + static_assert(std::is_base_of::value, "Pass not derived from lowered::Pass"); + auto pass = std::make_shared(std::forward(args)...); + register_pass(pass); + } + + void run(lowered::LinearIR& linear_ir); + +private: + std::vector> m_passes; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/propagate_layout.hpp b/src/common/snippets/include/snippets/lowered/pass/propagate_layout.hpp new file mode 100644 index 00000000000000..6ba062b0525556 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/propagate_layout.hpp @@ -0,0 +1,29 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface PropagateLayout + * @brief Propagate layout from Parameter child to parameter and from Result Parent to Result. This is needed to calculate + * proper data pointer offsets in the Kernel; + * @ingroup snippets + */ +class PropagateLayout : public Pass { +public: + OPENVINO_RTTI("PropagateLayout", "Pass") + bool run(LinearIR& linear_ir) override; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/pass/softmax_decomposition.hpp b/src/common/snippets/include/snippets/lowered/pass/softmax_decomposition.hpp new file mode 100644 index 00000000000000..8b5634ebb29fa4 --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/pass/softmax_decomposition.hpp @@ -0,0 +1,32 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "pass.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +/** + * @interface SoftmaxDecomposition + * @brief Decomposes Softmax to a range of low-level operations on linear IR + * @ingroup snippets + */ +class SoftmaxDecomposition : public Pass { +public: + explicit SoftmaxDecomposition(size_t vector_size); + OPENVINO_RTTI("SoftmaxDecomposition", "Pass") + bool run(LinearIR& linear_ir) override; + +private: + size_t m_vector_size; +}; + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/port_connector.hpp b/src/common/snippets/include/snippets/lowered/port_connector.hpp new file mode 100644 index 00000000000000..9336c032916b3a --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/port_connector.hpp @@ -0,0 +1,43 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include +#include + +#include "port_descriptor.hpp" +#include "expression_port.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { + +class Expression; + +class PortConnector { +public: + PortConnector() = default; + explicit PortConnector(ExpressionPort source_descriptor, const std::set& consumer_descriptors = {}); + + const ExpressionPort& get_source() const { return m_source_port; } + std::set get_consumers() const { return m_consumer_ports; } + + void add_consumer(const ExpressionPort& consumer); + void remove_consumer(const ExpressionPort& consumer); + bool found_consumer(const ExpressionPort& consumer) const; + std::set::const_iterator find_consumer(const ExpressionPort& consumer) const; + std::set::iterator find_consumer(const ExpressionPort& consumer); + +private: + ExpressionPort m_source_port; + std::set m_consumer_ports; +}; +using PortConnectorPtr = std::shared_ptr; + + +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/lowered/port_descriptor.hpp b/src/common/snippets/include/snippets/lowered/port_descriptor.hpp new file mode 100644 index 00000000000000..4d3100dd56182c --- /dev/null +++ b/src/common/snippets/include/snippets/lowered/port_descriptor.hpp @@ -0,0 +1,99 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "openvino/core/node.hpp" +#include "openvino/core/attribute_visitor.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { + +class PortDescriptor; +using PortDescriptorPtr = std::shared_ptr; +class PortDescriptor { +public: + // The structure with service values for scheduling parameters + struct ServiceDimensions { + // The value for the subtensor that means that scheduling should be by full dimension + static size_t FULL_DIM; + }; + + explicit PortDescriptor(const ov::Input& node, + std::vector subtensor_shape = {}, + std::vector layout = {}); + explicit PortDescriptor(const ov::Input& node, + std::vector subtensor_shape = {}, + std::vector layout = {}); + explicit PortDescriptor(const ov::Output& node, + std::vector subtensor_shape = {}, + std::vector layout = {}); + explicit PortDescriptor(const ov::Output& node, + std::vector subtensor_shape = {}, + std::vector layout = {}); + PortDescriptor(std::vector shape, std::vector subtensor_shape, std::vector layout = {}); + PortDescriptor() = default; + + std::vector get_shape() const {return m_tensor_shape;} + std::vector get_subtensor() const {return m_subtensor_shape;} + std::vector get_layout() const {return m_layout;} + size_t get_reg() const { return m_reg; } + + void set_shape(const std::vector& tensor) { m_tensor_shape = tensor; } + void set_layout(const std::vector& layout) { m_layout = layout; } + void set_subtensor(const std::vector& subtensor) { m_subtensor_shape = subtensor; } + void set_reg(size_t reg) { m_reg = reg; } + + std::string serialize() const; + bool empty() const { return m_layout.empty() && m_subtensor_shape.empty();} + PortDescriptorPtr clone() const; + + friend bool operator==(const PortDescriptor& lhs, const PortDescriptor& rhs); + friend bool operator!=(const PortDescriptor& lhs, const PortDescriptor& rhs) {return !(lhs == rhs);} + +private: + void validate_arguments(); + /// \brief Original tensor shape + std::vector m_tensor_shape{}; + /// \brief Order of dimensions: NCHW == {0, 1, 2, 3}, NHWC == {0, 2, 3, 1}, NCHW16c == {0, 1, 2, 3, 1} + std::vector m_layout{}; + /// \brief Minimal tensor size that could be processed in one call + std::vector m_subtensor_shape{}; + /// \brief The corresponding abstract/physical register + size_t m_reg = 0; +}; + +class PortDescriptorUtils { +public: + static void set_port_descriptor_ptr(const ov::Input& n, const PortDescriptorPtr& desc); + static void set_port_descriptor_ptr(const ov::Output& n, const PortDescriptorPtr& desc); + + static PortDescriptorPtr get_port_descriptor_ptr(const ov::Input& in); + static PortDescriptorPtr get_port_descriptor_ptr(const ov::Input& out); + static PortDescriptorPtr get_port_descriptor_ptr(const ov::Output& in); + static PortDescriptorPtr get_port_descriptor_ptr(const ov::Output& out); + + static void clean(const std::shared_ptr& node); + +private: + static void init_default(std::vector& in_descs, std::vector& out_descs, const std::shared_ptr& node); +}; + +class PortDescriptorVectorAttribute : public ov::RuntimeAttribute { +public: + OPENVINO_RTTI("PortDescriptorVectorAttribute", "", ov::RuntimeAttribute); + + PortDescriptorVectorAttribute() = default; + explicit PortDescriptorVectorAttribute(std::vector in_descs = {}, std::vector out_descs = {}) + : inputs(std::move(in_descs)), outputs(std::move(out_descs)) {} + + std::vector inputs{}; + std::vector outputs{}; +}; + +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/brgemm.hpp b/src/common/snippets/include/snippets/op/brgemm.hpp index 58c70f164799a6..d83f18c69c98eb 100644 --- a/src/common/snippets/include/snippets/op/brgemm.hpp +++ b/src/common/snippets/include/snippets/op/brgemm.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" #include "memory_access.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -20,7 +20,8 @@ class Brgemm : public MemoryAccess { public: OPENVINO_OP("Brgemm", "SnippetsOpset", MemoryAccess); Brgemm(const Output& A, const Output& B, - const size_t offset_a = 0lu, const size_t offset_b = 0lu, const size_t offset_c = 0lu); + const size_t offset_a = 0lu, const size_t offset_b = 0lu, const size_t offset_c = 0lu, + std::vector layout_a = {}, std::vector layout_b = {}, std::vector layout_c = {}); Brgemm() = default; size_t get_offset_a() const { return get_input_offset(0); } @@ -34,9 +35,15 @@ class Brgemm : public MemoryAccess { protected: ov::element::Type get_output_type() const; + std::vector get_planar_input_shapes(const std::vector>& inputs) const; ov::PartialShape get_output_partial_shape(const std::vector& input_shapes) const; + ov::PartialShape get_planar_output_shape(const ov::PartialShape& output_shape) const; + +private: + void custom_constructor_validate_and_infer_types(std::vector layout_a, std::vector layout_b, std::vector layout_c); + void validate_inputs() const; }; } // namespace op } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/include/snippets/op/broadcastload.hpp b/src/common/snippets/include/snippets/op/broadcastload.hpp index edcbe170a371f6..337c698dbd6f26 100644 --- a/src/common/snippets/include/snippets/op/broadcastload.hpp +++ b/src/common/snippets/include/snippets/op/broadcastload.hpp @@ -6,9 +6,9 @@ #include -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -19,7 +19,7 @@ namespace op { */ class BroadcastLoad : public MemoryAccess { public: - OPENVINO_OP("BroadcastLoad", "SnippetsOpset", ngraph::snippets::op::MemoryAccess); + OPENVINO_OP("BroadcastLoad", "SnippetsOpset", ov::snippets::op::MemoryAccess); BroadcastLoad(const Output& x, ov::PartialShape output_shape, size_t offset = 0lu); BroadcastLoad() = default; @@ -36,4 +36,4 @@ class BroadcastLoad : public MemoryAccess { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/broadcastmove.hpp b/src/common/snippets/include/snippets/op/broadcastmove.hpp index 0d6368970b8a10..ac369d2419bcac 100644 --- a/src/common/snippets/include/snippets/op/broadcastmove.hpp +++ b/src/common/snippets/include/snippets/op/broadcastmove.hpp @@ -4,9 +4,9 @@ #pragma once -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -15,7 +15,7 @@ namespace op { * @brief Added to a subgraph if explicit broadcast instruction should be generated * @ingroup snippets */ -class BroadcastMove : public ngraph::op::Op { +class BroadcastMove : public ov::op::Op { public: OPENVINO_OP("BroadcastMove", "SnippetsOpset"); @@ -35,4 +35,4 @@ class BroadcastMove : public ngraph::op::Op { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/buffer.hpp b/src/common/snippets/include/snippets/op/buffer.hpp index 8c6f98ac894e93..6b8ec7b5fd31b7 100644 --- a/src/common/snippets/include/snippets/op/buffer.hpp +++ b/src/common/snippets/include/snippets/op/buffer.hpp @@ -1,12 +1,12 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -16,18 +16,22 @@ namespace op { * If Buffer has a parent, the operation is for intermediate data storage - IntermediateMemory type. * Otherwise, the operation is for allocation of new empty memory with shape `m_shape` - NewMemory type * Notes: - * - All buffers in a graph have the same memory pointer. So if we have a few buffers, + * - All buffers with the same ID in a graph have the same memory pointer. So if we have a few buffers, * each the corresponding MemoryAccess op for Buffer should have offset for common memory pointer of this Buffer * - Buffer should be a single consumer for operation output port + * @param m_type - type of Buffer: IntermediateMemory/NewMemory + * @param m_shape - output allocation shape for Buffer with type NewMemory + * @param m_offset - offset in common Buffer scratchpad + * @param m_id - Buffer ID in common Buffer system * @ingroup snippets */ -class Buffer : public ngraph::op::Op { +class Buffer : public ov::op::Op { public: OPENVINO_OP("Buffer", "SnippetsOpset"); Buffer() = default; - Buffer(const ov::Shape& shape); - Buffer(const ov::Output& arg, const ov::Shape& shape); - Buffer(const ov::Output& arg, int32_t allocation_rank = -1); + Buffer(const ov::Shape& shape, size_t id = 0); + Buffer(const ov::Output& arg, const ov::Shape& shape, size_t id = 0); + Buffer(const ov::Output& arg, int32_t allocation_rank = -1, size_t id = 0); bool visit_attributes(AttributeVisitor& visitor) override; void validate_and_infer_types() override; @@ -38,8 +42,13 @@ class Buffer : public ngraph::op::Op { IntermediateMemory }; + size_t get_id() const { return m_id; } Type get_type() const { return m_type; } ov::Shape get_allocation_shape() const { return m_shape; } + int64_t get_offset() const { return m_offset; } + void set_id(size_t id) { m_id = id; } + void set_offset(int64_t offset) { m_offset = offset; } + size_t get_byte_size() const; bool is_intermediate_memory() const { return m_type == Type::IntermediateMemory; } @@ -48,8 +57,10 @@ class Buffer : public ngraph::op::Op { private: Type m_type = Type::IntermediateMemory; ov::Shape m_shape = {}; + int64_t m_offset = 0; + size_t m_id = 0; // Default ID - 0. All Buffers are from the same set }; } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/convert_saturation.hpp b/src/common/snippets/include/snippets/op/convert_saturation.hpp index 3a0aa4b2aa2201..3911c1585c4132 100644 --- a/src/common/snippets/include/snippets/op/convert_saturation.hpp +++ b/src/common/snippets/include/snippets/op/convert_saturation.hpp @@ -5,9 +5,9 @@ #pragma once #include -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -35,4 +35,4 @@ class ConvertSaturation : public ov::op::v0::Convert { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/convert_truncation.hpp b/src/common/snippets/include/snippets/op/convert_truncation.hpp index e88b7cd9de4ffa..1064ca403a3bf1 100644 --- a/src/common/snippets/include/snippets/op/convert_truncation.hpp +++ b/src/common/snippets/include/snippets/op/convert_truncation.hpp @@ -5,9 +5,9 @@ #pragma once #include -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -34,4 +34,4 @@ class ConvertTruncation : public ov::op::v0::Convert { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/fill.hpp b/src/common/snippets/include/snippets/op/fill.hpp index 85b95ec3799d31..d14c28cf5a1f39 100644 --- a/src/common/snippets/include/snippets/op/fill.hpp +++ b/src/common/snippets/include/snippets/op/fill.hpp @@ -1,12 +1,12 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -20,7 +20,7 @@ namespace op { * - fill_value - hexadecimal filling value * @ingroup snippets */ -class Fill : public ngraph::op::Op { +class Fill : public ov::op::Op { public: OPENVINO_OP("Fill", "SnippetsOpset"); @@ -44,4 +44,4 @@ class Fill : public ngraph::op::Op { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/horizon_max.hpp b/src/common/snippets/include/snippets/op/horizon_max.hpp index d26c4a8c9e58c6..80bdf36ed4beea 100644 --- a/src/common/snippets/include/snippets/op/horizon_max.hpp +++ b/src/common/snippets/include/snippets/op/horizon_max.hpp @@ -1,12 +1,12 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -15,7 +15,7 @@ namespace op { * @brief The operation calculates a horizon maximum of a vector register * @ingroup snippets */ -class HorizonMax : public ngraph::op::Op { +class HorizonMax : public ov::op::Op { public: OPENVINO_OP("HorizonMax", "SnippetsOpset"); @@ -29,4 +29,4 @@ class HorizonMax : public ngraph::op::Op { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/horizon_sum.hpp b/src/common/snippets/include/snippets/op/horizon_sum.hpp index 2dc25374bc0f70..47f3b8dccf8dfa 100644 --- a/src/common/snippets/include/snippets/op/horizon_sum.hpp +++ b/src/common/snippets/include/snippets/op/horizon_sum.hpp @@ -1,12 +1,12 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -15,7 +15,7 @@ namespace op { * @brief The operation calculates a horizon sum of a vector register * @ingroup snippets */ -class HorizonSum : public ngraph::op::Op { +class HorizonSum : public ov::op::Op { public: OPENVINO_OP("HorizonSum", "SnippetsOpset"); @@ -29,4 +29,4 @@ class HorizonSum : public ngraph::op::Op { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/kernel.hpp b/src/common/snippets/include/snippets/op/kernel.hpp index a8d17745fdeaf4..66fa9286bb7b4e 100644 --- a/src/common/snippets/include/snippets/op/kernel.hpp +++ b/src/common/snippets/include/snippets/op/kernel.hpp @@ -4,10 +4,10 @@ #pragma once -#include "ngraph/op/op.hpp" -#include "snippets/emitter.hpp" +#include "openvino/op/op.hpp" +#include "snippets/lowered/linear_ir.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -16,22 +16,21 @@ namespace op { * @brief Generated by Canonicalization and represents compute kernel legal for scheduling * @ingroup snippets */ -class Kernel : public ngraph::op::Op { +class Kernel : public ov::op::Op { public: OPENVINO_OP("Kernel", "SnippetsOpset"); - Kernel(std::vector region, std::shared_ptr m); + Kernel(lowered::LinearIR region); Kernel() = default; - std::vector region; - const std::shared_ptr model; + lowered::LinearIR region; std::shared_ptr clone_with_new_inputs(const OutputVector& inputs) const override { - return std::make_shared(region, model); + return std::make_shared(region); } const void *compile_params = nullptr; }; } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/load.hpp b/src/common/snippets/include/snippets/op/load.hpp index 38acd0e8a10255..a10d7c5ca16071 100644 --- a/src/common/snippets/include/snippets/op/load.hpp +++ b/src/common/snippets/include/snippets/op/load.hpp @@ -4,10 +4,10 @@ #pragma once -#include +#include "openvino/op/op.hpp" #include "snippets/op/memory_access.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -33,6 +33,9 @@ class Load : public MemoryAccess { void validate_and_infer_types() override; std::shared_ptr clone_with_new_inputs(const OutputVector& new_args) const override; + +protected: + void validate_memory_access_params() const; }; /** @@ -60,4 +63,4 @@ class LoadReshape : public Load { }; } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/loop.hpp b/src/common/snippets/include/snippets/op/loop.hpp index 89cf0abd5173ff..3a939f65fcb99c 100644 --- a/src/common/snippets/include/snippets/op/loop.hpp +++ b/src/common/snippets/include/snippets/op/loop.hpp @@ -1,14 +1,14 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" #include "snippets/emitter.hpp" #include "ngraph/op/parameter.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -17,20 +17,12 @@ namespace op { * @brief Base class for LoopBegin and LoopEnd * @ingroup snippets */ -class LoopBase : public ngraph::op::Op { +class LoopBase : public ov::op::Op { public: OPENVINO_OP("LoopBase", "SnippetsOpset"); - LoopBase(const std::vector>& args, size_t work_amount, size_t increment); + LoopBase(const std::vector>& args); LoopBase() = default; - bool visit_attributes(AttributeVisitor& visitor) override; - size_t get_work_amount() const; - size_t get_increment() const; - bool get_evaluate_once() const; - protected: - size_t work_amount; - size_t work_amount_increment; - bool evaluate_once; // true if the Loop is executed only once, used to skip setting and testing the loop counter }; class LoopEnd; /** @@ -45,18 +37,16 @@ class LoopBegin : public LoopBase { public: OPENVINO_OP("LoopBegin", "SnippetsOpset", LoopBase); - explicit LoopBegin(const OutputVector& args); - LoopBegin() = default; + LoopBegin(); void validate_and_infer_types() override; std::shared_ptr clone_with_new_inputs(const OutputVector& inputs) const override; - std::shared_ptr get_loop_end(); - // begin_address and input_regs are needed to communicate information between LoopBegin and LoopEnd emitters + std::shared_ptr get_loop_end() const; + bool visit_attributes(AttributeVisitor& visitor) override; + // begin_address are needed to communicate information between LoopBegin and LoopEnd emitters const uint8_t* begin_address; - std::vector input_regs; private: void validate_and_infer_types_except_LoopEnd(); - LoopBegin(const std::vector>& args, size_t work_amount, size_t work_amount_increment); }; /** @@ -77,16 +67,21 @@ class LoopBegin : public LoopBase { class LoopEnd : public LoopBase { public: OPENVINO_OP("LoopEnd", "SnippetsOpset", LoopBase); - LoopEnd(const std::vector>& args, size_t work_amount, size_t work_amount_increment, - std::vector apply_increment, std::vector finalization_offsets); - LoopEnd(const std::vector>& args, size_t work_amount, size_t work_amount_increment, - std::vector ptr_increments, std::vector finalization_offsets); + LoopEnd(const Output& loop_begin, size_t work_amount, size_t work_amount_increment, + std::vector apply_increment, std::vector finalization_offsets, + std::vector element_type_sizes, size_t input_num, size_t output_num); + LoopEnd(const Output& loop_begin, size_t work_amount, size_t work_amount_increment, + std::vector ptr_increments, std::vector finalization_offsets, + std::vector element_type_sizes, size_t input_num, size_t output_num); LoopEnd() = default; std::shared_ptr get_loop_begin(); void validate_and_infer_types() override; std::shared_ptr clone_with_new_inputs(const OutputVector& inputs) const override; const std::vector& get_finalization_offsets() const; const std::vector& get_ptr_increments() const; + const std::vector& get_element_type_sizes() const; + size_t get_input_num() const; + size_t get_output_num() const; void set_finalization_offsets(std::vector offsets); void set_ptr_increments(std::vector new_ptr_increments); // update_ptr_increments resets non-zero increments to the new_increments. It's used when work_amount_increment is @@ -98,14 +93,23 @@ class LoopEnd : public LoopBase { // Used to propagate information about Loop structure, needed to simplify some optimizations. For example, // to skip pointer increments when outer Loop is empty, and work_amount == vector_size (one inner vector Loop) // true by default, the optimizations enabled if it's false; - bool has_outer_loop; + bool has_outer_loop = true; + size_t get_work_amount() const; + size_t get_increment() const; + bool get_evaluate_once() const; + bool visit_attributes(AttributeVisitor& visitor) override; private: - std::vector ptr_increments; - std::vector finalization_offsets; - size_t loop_io_size; + std::vector ptr_increments = {}; + std::vector finalization_offsets = {}; + std::vector element_type_sizes = {}; + size_t work_amount = 0; + size_t work_amount_increment = 0; + size_t input_num = 0; + size_t output_num = 0; + bool evaluate_once = false; // true if the Loop is executed only once, used to skip setting and testing the loop counter }; } // namespace op } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/memory_access.hpp b/src/common/snippets/include/snippets/op/memory_access.hpp index 7b090c8f65d528..905a39f850b3d8 100644 --- a/src/common/snippets/include/snippets/op/memory_access.hpp +++ b/src/common/snippets/include/snippets/op/memory_access.hpp @@ -4,9 +4,9 @@ #pragma once -#include +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -14,12 +14,12 @@ namespace op { * @interface MemoryAccess * @brief This is a base class for memory access operations (like Load and Store). * It provides universal interface to manipulate with memory: load/store. - * @param m_input_ports - vector of input descriptors: variables of PortDescriptor class - * @param m_output_ports - vector of output descriptors: variables of PortDescriptor class + * @param m_input_ports - map of input descriptors: variables of PortDescriptor class + * @param m_output_ports - map of output descriptors: variables of PortDescriptor class * @ingroup snippets */ -class MemoryAccess : public ngraph::op::Op { +class MemoryAccess : public ov::op::Op { public: OPENVINO_OP("MemoryAccess", "SnippetsOpset"); @@ -55,24 +55,35 @@ class MemoryAccess : public ngraph::op::Op { size_t get_input_offset(size_t idx = 0) const; size_t get_output_offset(size_t idx = 0) const; - size_t get_input_port_count() const { return m_input_ports.size(); } - size_t get_output_port_count() const { return m_output_ports.size(); } + std::map get_memory_access_input_ports() const { return m_input_ports; } + std::map get_memory_access_output_ports() const { return m_output_ports; } + + bool is_memory_access_input_port(size_t idx) const; + bool is_memory_access_output_port(size_t idx) const; + + // All input and output ports are MemoryAccess + bool is_full_memory_access_op() const; bool visit_attributes(AttributeVisitor& visitor) override; protected: explicit MemoryAccess(const OutputVector& arguments, size_t input_count = 0, size_t output_count = 0); + explicit MemoryAccess(const OutputVector& arguments, const std::set& input_ports, const std::set& output_ports); MemoryAccess() = default; + // This method can be called only in ctors + void ctor_initialize(const std::set& input_ports, const std::set& output_ports); + void set_input_port_descriptor(const PortDescriptor& desc, const size_t i); void set_output_port_descriptor(const PortDescriptor& desc, const size_t i); const PortDescriptor& get_input_port_descriptor(const size_t i) const; const PortDescriptor& get_output_port_descriptor(const size_t i) const; - std::vector m_input_ports; - std::vector m_output_ports; + // [port_num, port_desc] + std::map m_input_ports; + std::map m_output_ports; }; } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/nop.hpp b/src/common/snippets/include/snippets/op/nop.hpp index ad1936ac1954a7..3a9d5ba41f4bf4 100644 --- a/src/common/snippets/include/snippets/op/nop.hpp +++ b/src/common/snippets/include/snippets/op/nop.hpp @@ -4,9 +4,9 @@ #pragma once -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -15,7 +15,7 @@ namespace op { * @brief Generated by Canonicalization and represents not-an-operation * @ingroup snippets */ -class Nop : public ngraph::op::Op { +class Nop : public ov::op::Op { public: OPENVINO_OP("Nop", "SnippetsOpset"); @@ -29,4 +29,4 @@ class Nop : public ngraph::op::Op { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/powerstatic.hpp b/src/common/snippets/include/snippets/op/powerstatic.hpp index 2f4e3fbcfa2873..5a1d0abb23ffb4 100644 --- a/src/common/snippets/include/snippets/op/powerstatic.hpp +++ b/src/common/snippets/include/snippets/op/powerstatic.hpp @@ -4,11 +4,11 @@ #pragma once -#include +#include "openvino/op/op.hpp" #include #include -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -41,4 +41,4 @@ class PowerStatic : public ov::op::util::UnaryElementwiseArithmetic { }; } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/scalar.hpp b/src/common/snippets/include/snippets/op/scalar.hpp index 108a34d600528f..43ecb1aad671cc 100644 --- a/src/common/snippets/include/snippets/op/scalar.hpp +++ b/src/common/snippets/include/snippets/op/scalar.hpp @@ -4,10 +4,10 @@ #pragma once -#include "ngraph/op/op.hpp" +#include "openvino/op/op.hpp" #include "ngraph/op/constant.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -41,4 +41,4 @@ class Scalar : public ov::op::v0::Constant { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/serialization_node.hpp b/src/common/snippets/include/snippets/op/serialization_node.hpp new file mode 100644 index 00000000000000..fc447f15d6a8ea --- /dev/null +++ b/src/common/snippets/include/snippets/op/serialization_node.hpp @@ -0,0 +1,37 @@ +// Copyright (C) 2018-2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "openvino/op/op.hpp" +#include +#include + +namespace ov { +namespace snippets { +namespace op { + +/** + * @interface SerializationNode + * @brief Fake node needed to serialize lowered::Expression sessionIR + * @ingroup snippets + */ +class SerializationNode : public ov::op::Op { +public: + OPENVINO_OP("SerializationNode", "SnippetsOpset"); + + SerializationNode() = default; + SerializationNode(const Output &arg, const std::shared_ptr& expr); + + void validate_and_infer_types() override; + std::shared_ptr clone_with_new_inputs(const OutputVector &new_args) const override; + bool visit_attributes(AttributeVisitor &visitor) override; + +private: + std::shared_ptr m_expr; +}; + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/store.hpp b/src/common/snippets/include/snippets/op/store.hpp index b62a4c6ccb18b7..909c86a689d233 100644 --- a/src/common/snippets/include/snippets/op/store.hpp +++ b/src/common/snippets/include/snippets/op/store.hpp @@ -4,10 +4,10 @@ #pragma once -#include +#include "openvino/op/op.hpp" #include "snippets/op/memory_access.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -37,4 +37,4 @@ class Store : public MemoryAccess { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/op/subgraph.hpp b/src/common/snippets/include/snippets/op/subgraph.hpp index ca552e7fb8fa41..9d63bcba1367a6 100644 --- a/src/common/snippets/include/snippets/op/subgraph.hpp +++ b/src/common/snippets/include/snippets/op/subgraph.hpp @@ -8,13 +8,13 @@ #include #include -#include -#include +#include "openvino/op/op.hpp" +#include "openvino/core/rt_info.hpp" #include #include "snippets/generator.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -69,7 +69,7 @@ class Subgraph : public ov::op::util::SubGraphOp { // // D = < 1, 3, 17, 15, 32> < 0, 1, 2, 3, 4> // E = < 1, 3, 17, 1, 32> < 0, 1, 2, 3, 4> - using BlockedShape = std::tuple; + using BlockedShape = std::tuple; using BlockedShapeVector = std::vector; Subgraph() = default; @@ -92,24 +92,25 @@ class Subgraph : public ov::op::util::SubGraphOp { const ov::Model& body() const { return *m_bodies[0]; } ov::Model& body() { return *m_bodies[0]; } - const std::shared_ptr& get_generator() const { return m_generator; } - std::shared_ptr & get_generator() { return m_generator; } + const std::shared_ptr& get_generator() const { return m_generator; } + std::shared_ptr& get_generator() { return m_generator; } size_t get_buffer_scratchpad_size() const { return m_buffer_scratchpad; } size_t get_virtual_port_count() const { return m_virtual_port_count; } - bool is_buffer_needed() const { return m_buffer_needed; } bool is_quantized() const { return config.m_is_quantized; } bool has_domain_sensitive_ops() const { return config.m_has_domain_sensitive_ops; } snippets::Schedule generate(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes, - ngraph::pass::Manager& pre_dialect, - ngraph::pass::Manager& post_dialect, - ngraph::pass::Manager& post_precision, + ov::pass::Manager& pre_common, + ov::pass::Manager& post_common, + ov::pass::Manager& post_precision, + lowered::pass::PassPipeline& target_lowered_pipeline, const void* compile_params = nullptr); snippets::Schedule generate(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes, const void* compile_params = nullptr); - snippets::Schedule generate(ngraph::pass::Manager& pre_dialect, - ngraph::pass::Manager& post_dialect, - ngraph::pass::Manager& post_precision, + snippets::Schedule generate(ov::pass::Manager& pre_common, + ov::pass::Manager& post_common, + ov::pass::Manager& post_precision, + lowered::pass::PassPipeline& target_lowered_pipeline, const void* compile_params = nullptr); snippets::Schedule generate(const void* compile_params = nullptr); ov::PartialShape canonicalize(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes); @@ -118,10 +119,9 @@ class Subgraph : public ov::op::util::SubGraphOp { // plugin sets generator for a snippet to some specific generator. // it's going to be replaced with Jitters table later - void set_generator(std::shared_ptr generator); + void set_generator(std::shared_ptr generator); void set_tile_rank(size_t newRank) {tileRank = newRank;} void set_virtual_port_count(const size_t count); - void set_buffer_needed(const bool need); void print() const; void print_statistics(bool verbose); @@ -129,7 +129,7 @@ class Subgraph : public ov::op::util::SubGraphOp { void serialize() const; void set_master_shape(ov::PartialShape new_shape) {master_shape = std::move(new_shape);} - static auto wrap_node_as_subgraph(const std::shared_ptr& node) -> std::shared_ptr; + static auto wrap_node_as_subgraph(const std::shared_ptr& node) -> std::shared_ptr; static void fill_empty_output_names(const Output& target_output_node, const Output& replacement_output_node); // Non-scalar Constants are tokenized as Parameters inside Subgraph body but some operations with constant inputs @@ -138,23 +138,23 @@ class Subgraph : public ov::op::util::SubGraphOp { static auto constant_input_should_be_inside_body(const std::shared_ptr& node) -> bool; static bool check_broadcast(const std::shared_ptr& node) noexcept; + // Return estimated unique buffer count (upper bound). It's needed for tokenization + static auto get_estimated_buffer_count(const ov::NodeVector& ops) -> size_t; + static auto is_domain_sensitive_op(const std::shared_ptr& op) -> bool; private: void align_element_types(const BlockedShapeVector& outputShapes, const BlockedShapeVector& inputShapes); - void convert_to_snippet_dialect(); + void data_flow_transformations(ov::pass::Manager& pre_common, ov::pass::Manager& post_common, ov::pass::Manager& post_precision); + void control_flow_transformations(lowered::LinearIR& linear_ir, lowered::pass::PassPipeline& target_pipeline); void init_config(); - void initialize_buffer_scratchpad_size(); // Count of Subgraph virtual ports: // - Potential non-scalar Constants that will be created after some transformations (At the moment it's relevant only for FakeQuantize decomposition) - // Need Buffer op or not - // - Buffers. All Buffers are considered as one common additional virtual port. So we cannot summarize them as potential non-scalar Constants // NOTE: To avoid overheads in each calculation of this count (for example, in validate_and_type_infer()), // we should MANUALLY calculate it where it needed. size_t m_virtual_port_count = 0; - bool m_buffer_needed = false; size_t m_buffer_scratchpad = 0lu; Shape exec_domain = {}; - std::shared_ptr m_generator = nullptr; + std::shared_ptr m_generator = nullptr; ov::PartialShape master_shape; size_t tileRank = 0; // set by plugin to specify the number of dimensions processed in a single kernel call @@ -171,9 +171,6 @@ class Subgraph : public ov::op::util::SubGraphOp { // True if body has operations that don't support plugin-side domain optimizations // (e.g. Transpose, Softmax, MatMul in general doesn't support dimensions collapsing) bool m_has_domain_sensitive_ops = false; - // True if we should go through whole body to check for where loops should be explicitly inserted. - // Otherwise, we insert Loops on Parameters and Results - for example, it's optimized out for subgraph with only Eltwise ops - bool m_explicit_loop_insertion = false; } config; }; @@ -182,13 +179,13 @@ static inline std::ostream& operator<<(std::ostream& os, const op::Subgraph::Blo return os; } -static inline auto create_body(std::string name, const ngraph::ResultVector& results, const ngraph::ParameterVector& parameters) -> +static inline auto create_body(std::string name, const ov::ResultVector& results, const ov::ParameterVector& parameters) -> std::shared_ptr { auto body = std::make_shared(results, parameters, name); return body; }; -static inline auto build_subgraph(const std::shared_ptr& node, const ngraph::OutputVector& inputs, +static inline auto build_subgraph(const std::shared_ptr& node, const ov::OutputVector& inputs, const std::shared_ptr& body, const std::string name = "") -> std::shared_ptr{ auto subgraph = std::make_shared(inputs, body); @@ -197,16 +194,16 @@ static inline auto build_subgraph(const std::shared_ptr& node, con return subgraph; }; -// Need to update tensor name manually, since intel_cpu::Graph::Replicate() looks at input.get_tensor().get_name(); +// Need to update tensor name manually, since intel_cpu::Graph::Replicate() looks at input.get_shape().get_name(); // If subgraph->get_output_size() == 1, then the name will be restored correctly from the node name -auto inline update_out_tensor_name(const std::shared_ptr& subgraph) -> void { +auto inline update_out_tensor_name(const std::shared_ptr& subgraph) -> void { bool not_set = true; for (unsigned int i = 0; i < subgraph->get_output_size() && not_set; i++) { - for (const auto &in : subgraph->get_output_target_inputs(i)) { + for (const auto& in : subgraph->get_output_target_inputs(i)) { if (ov::is_type(in.get_node())) { const auto& body_result = subgraph->body_ptr()->get_output_op(i); const auto& body_result_input = body_result->get_input_source_output(0); - ngraph::snippets::op::Subgraph::fill_empty_output_names( + ov::snippets::op::Subgraph::fill_empty_output_names( subgraph->output(i), body_result_input); not_set = false; break; @@ -217,4 +214,4 @@ auto inline update_out_tensor_name(const std::shared_ptr +#include "openvino/op/op.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { @@ -15,7 +15,7 @@ namespace op { * @brief The operation is for intermediate data storage in vector register * @ingroup snippets */ -class VectorBuffer : public ngraph::op::Op { +class VectorBuffer : public ov::op::Op { public: OPENVINO_OP("VectorBuffer", "SnippetsOpset"); @@ -31,4 +31,4 @@ class VectorBuffer : public ngraph::op::Op { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/assign_registers.hpp b/src/common/snippets/include/snippets/pass/assign_registers.hpp deleted file mode 100644 index 81a5e3b2b29d62..00000000000000 --- a/src/common/snippets/include/snippets/pass/assign_registers.hpp +++ /dev/null @@ -1,34 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include - -#include "snippets/generator.hpp" - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface AssignRegisters - * @brief Assigns internal `vector` register indexes to operations. - * Changing order of variables or datafrow lead to invalidation of register assignment. - * @ingroup snippets - */ -class AssignRegisters : public ngraph::pass::FunctionPass { -public: - explicit AssignRegisters(const std::function& op)>& mapper) : m_reg_type_mapper(mapper) { - set_property(ngraph::pass::PassProperty::REQUIRE_STATIC_SHAPE, true); - } - bool run_on_model(const std::shared_ptr& m) override; - -private: - std::function& op)> m_reg_type_mapper; -}; - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/pass/broadcast_to_movebroadcast.hpp b/src/common/snippets/include/snippets/pass/broadcast_to_movebroadcast.hpp index 0c90c1193ea3d2..625684417ccdf6 100644 --- a/src/common/snippets/include/snippets/pass/broadcast_to_movebroadcast.hpp +++ b/src/common/snippets/include/snippets/pass/broadcast_to_movebroadcast.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/pattern/matcher.hpp" +#include "openvino/pass/graph_rewrite.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -17,7 +17,7 @@ namespace pass { * Otherwise the pass removes Broadcast operation. * @ingroup snippets */ -class BroadcastToMoveBroadcast: public ngraph::pass::MatcherPass { +class BroadcastToMoveBroadcast: public ov::pass::MatcherPass { public: BroadcastToMoveBroadcast(); }; @@ -25,4 +25,4 @@ class BroadcastToMoveBroadcast: public ngraph::pass::MatcherPass { } // namespace pass } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/include/snippets/pass/collapse_subgraph.hpp b/src/common/snippets/include/snippets/pass/collapse_subgraph.hpp index 96c272a28d5f5d..df2522c83a879c 100644 --- a/src/common/snippets/include/snippets/pass/collapse_subgraph.hpp +++ b/src/common/snippets/include/snippets/pass/collapse_subgraph.hpp @@ -4,12 +4,11 @@ #pragma once -#include -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -35,16 +34,16 @@ namespace pass { * Scalar constants are placed as is into subgraph due to optimization purpose * @ingroup snippets */ -class TokenizeSnippets: public ngraph::pass::MatcherPass { +class TokenizeSnippets: public ov::pass::MatcherPass { public: OPENVINO_RTTI("TokenizeSnippets", "0"); explicit TokenizeSnippets(); static bool AppropriateForSubgraph(const std::shared_ptr&); - static const std::set supported_element_types; + static const std::set supported_element_types; }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/common_optimizations.hpp b/src/common/snippets/include/snippets/pass/common_optimizations.hpp index 950a0427e8ecc0..08b07339f5fe40 100644 --- a/src/common/snippets/include/snippets/pass/common_optimizations.hpp +++ b/src/common/snippets/include/snippets/pass/common_optimizations.hpp @@ -1,22 +1,22 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { -class CommonOptimizations : public ngraph::pass::MatcherPass { +class CommonOptimizations : public ov::pass::MatcherPass { public: - NGRAPH_RTTI_DECLARATION; + OPENVINO_RTTI("CommonOptimizations", "0"); CommonOptimizations(); }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/convert_constants.hpp b/src/common/snippets/include/snippets/pass/convert_constants.hpp index 511f0ab02230b7..4b62d80ba44748 100644 --- a/src/common/snippets/include/snippets/pass/convert_constants.hpp +++ b/src/common/snippets/include/snippets/pass/convert_constants.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -17,11 +17,11 @@ namespace pass { * Only single-value (0D) constants are currently supported. * @ingroup snippets */ -class ConvertConstantsToScalars: public ngraph::pass::MatcherPass { +class ConvertConstantsToScalars: public ov::pass::MatcherPass { public: ConvertConstantsToScalars(); }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/convert_power_to_powerstatic.hpp b/src/common/snippets/include/snippets/pass/convert_power_to_powerstatic.hpp index f1afcfe5e50ff2..85ead3b5785a04 100644 --- a/src/common/snippets/include/snippets/pass/convert_power_to_powerstatic.hpp +++ b/src/common/snippets/include/snippets/pass/convert_power_to_powerstatic.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -16,11 +16,11 @@ namespace pass { * @brief Replace Power with a scalar input with snippets::op::PowerStatic for generation of a more optimal code. * @ingroup snippets */ -class ConvertPowerToPowerStatic: public ngraph::pass::MatcherPass { +class ConvertPowerToPowerStatic: public ov::pass::MatcherPass { public: ConvertPowerToPowerStatic(); }; } // namespace pass } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/include/snippets/pass/explicit_transpose_matmul_inputs.hpp b/src/common/snippets/include/snippets/pass/explicit_transpose_matmul_inputs.hpp index fc90067f4af398..dbad1a714b8271 100644 --- a/src/common/snippets/include/snippets/pass/explicit_transpose_matmul_inputs.hpp +++ b/src/common/snippets/include/snippets/pass/explicit_transpose_matmul_inputs.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -22,11 +22,11 @@ namespace pass { * change Transpose order to {0, 2, 3, 1} which is supported by Snippets * @ingroup snippets */ -class ExplicitTransposeMatMulInputs: public ngraph::pass::MatcherPass { +class ExplicitTransposeMatMulInputs: public ov::pass::MatcherPass { public: ExplicitTransposeMatMulInputs(); }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/fq_decomposition.hpp b/src/common/snippets/include/snippets/pass/fq_decomposition.hpp index cfb9ff41955867..cc62bd11354281 100644 --- a/src/common/snippets/include/snippets/pass/fq_decomposition.hpp +++ b/src/common/snippets/include/snippets/pass/fq_decomposition.hpp @@ -4,13 +4,12 @@ #pragma once -#include "ngraph/op/fake_quantize.hpp" -#include "ngraph/pass/graph_rewrite.hpp" -#include "ngraph/pass/constant_folding.hpp" +#include "openvino/op/fake_quantize.hpp" +#include "openvino/pass/graph_rewrite.hpp" #include "snippets/pass/transform_convert.hpp" #include "transformations_visibility.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -50,18 +49,18 @@ namespace pass { * */ -class FakeQuantizeDecomposition : public ngraph::pass::MatcherPass { +class FakeQuantizeDecomposition : public ov::pass::MatcherPass { public: FakeQuantizeDecomposition(); - static bool getScalesAndShifts(const std::shared_ptr& fq_node, + static bool getScalesAndShifts(const std::shared_ptr& fq_node, std::vector& cl, std::vector& ch, std::vector& isc, std::vector& ish, std::vector& osc, std::vector& osh); - static std::vector calculateScales(const ngraph::element::Type& out_type, + static std::vector calculateScales(const ov::element::Type& out_type, const std::vector& cl, const std::vector& ch, const std::vector& isc, @@ -80,11 +79,11 @@ class FakeQuantizeDecomposition : public ngraph::pass::MatcherPass { * 2. ConstantFolding * 3. Validate */ -class CommonFakeQuantizeDecomposition: public ngraph::pass::FunctionPass { +class CommonFakeQuantizeDecomposition: public ov::pass::ModelPass { public: - bool run_on_model(const std::shared_ptr& m) override; + bool run_on_model(const std::shared_ptr& m) override; }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/fuse_transpose_brgemm.hpp b/src/common/snippets/include/snippets/pass/fuse_transpose_brgemm.hpp index 1c2eaa11ea039f..69266fc90ffc62 100644 --- a/src/common/snippets/include/snippets/pass/fuse_transpose_brgemm.hpp +++ b/src/common/snippets/include/snippets/pass/fuse_transpose_brgemm.hpp @@ -1,13 +1,17 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include "ngraph/pass/graph_rewrite.hpp" -#include "ngraph/pattern/matcher.hpp" +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +#include "openvino/op/transpose.hpp" + +#include "snippets/lowered/port_descriptor.hpp" + +namespace ov { namespace snippets { namespace pass { @@ -18,13 +22,16 @@ namespace pass { * but only 0213 Transpose is currently supported. * @ingroup snippets */ -class FuseTransposeBrgemm: public ngraph::pass::MatcherPass { +class FuseTransposeBrgemm: public ov::pass::MatcherPass { public: OPENVINO_RTTI("FuseTransposeBrgemm", "0"); FuseTransposeBrgemm(); static const std::set> supported_cases; + +private: + static bool is_supported_transpose(const Output& transpose_port); }; } // namespace pass } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/include/snippets/pass/insert_buffer.hpp b/src/common/snippets/include/snippets/pass/insert_buffer.hpp deleted file mode 100644 index a7fe4f00208fef..00000000000000 --- a/src/common/snippets/include/snippets/pass/insert_buffer.hpp +++ /dev/null @@ -1,30 +0,0 @@ -// Copyright (C) 2018-2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface InsertBuffer - * @brief The pass inserts Buffers on Inputs and Outputs of special operations [Softmax, Transpose] is it's needed - * @param allocation_rank - rank of shape for Buffer memory allocation: shape[shape_rank - normalize(m_allocation_rank) : shape_rank]. - * It's needed to allocate needed memory size that depends on Tile rank, for example. - * Default value is -1 (full shape) - * @ingroup snippets - */ -class InsertBuffer: public ngraph::pass::MatcherPass { -public: - InsertBuffer(const int32_t allocation_rank = -1); -}; - - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/pass/insert_load_store.hpp b/src/common/snippets/include/snippets/pass/insert_load_store.hpp deleted file mode 100644 index aab892312bf6c9..00000000000000 --- a/src/common/snippets/include/snippets/pass/insert_load_store.hpp +++ /dev/null @@ -1,39 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface InsertLoad - * @brief Inserts explicit load instruction after each parameter and buffer. - * The pass is used to convert model to a canonical form for code generation - * @ingroup snippets - */ -class InsertLoad: public ngraph::pass::MatcherPass { -public: - InsertLoad(const size_t count = 1lu); -}; - -/** - * @interface InsertStore - * @brief Inserts explicit store instruction before each result and buffer. - * The pass is used to convert model to a canonical form for code generation - * @ingroup snippets - */ -class InsertStore: public ngraph::pass::MatcherPass { -public: - InsertStore(const size_t count = 1lu); -}; - - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/pass/insert_loops.hpp b/src/common/snippets/include/snippets/pass/insert_loops.hpp deleted file mode 100644 index 57046789167ad5..00000000000000 --- a/src/common/snippets/include/snippets/pass/insert_loops.hpp +++ /dev/null @@ -1,43 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface InsertLoops - * @brief Insert explicit Loop operations into the body to process multiple data entities during one kernel execution - * @param master_shape - shape used to determine loop work amounts - * @param loop_depth - the number of last master_shape dimensions processed by loops (aka tileRank - obsolete), could be 1 or 2 - * @param vector_size - the number of entities processed on one iteration of vector loop - * @param single_loop_body - true, if we can just insert LoopBegin on inputs and LoopEnd on outputs, othwerwise - * the pass goes all over the body analyzing where LoopBegin and LoopEnd should be inserted: - * synchronization nodes are MatMul, Buffer and other already existing Loops. - * @ingroup snippets - */ -class InsertLoops: public ngraph::pass::FunctionPass { -public: - OPENVINO_RTTI("InsertLoops", "0"); - InsertLoops(ov::PartialShape master_shape, size_t loop_depth, size_t vector_size, bool is_optimized = true); - bool run_on_model(const std::shared_ptr& m) override; - - static std::vector calculate_inner_apply_increments(const ov::PartialShape& master, const std::vector& shapes); - static std::vector calculate_outer_apply_increments(const std::vector& shapes); - static std::vector calculate_finalization_offsets(const ov::PartialShape& master, const std::vector& shapes); -private: - ov::PartialShape m_master_shape; - size_t m_loop_depth; - size_t m_vector_size; - bool m_single_loop_body; -}; - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/pass/insert_movebroadcast.hpp b/src/common/snippets/include/snippets/pass/insert_movebroadcast.hpp index e0458e0b263bb5..fd5ab9e46a1fd7 100644 --- a/src/common/snippets/include/snippets/pass/insert_movebroadcast.hpp +++ b/src/common/snippets/include/snippets/pass/insert_movebroadcast.hpp @@ -4,10 +4,10 @@ #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -17,15 +17,15 @@ namespace pass { * The pass is used to convert model to a canonical form for code generation * @ingroup snippets */ -class InsertMoveBroadcast: public ngraph::pass::MatcherPass { +class InsertMoveBroadcast: public ov::pass::MatcherPass { public: InsertMoveBroadcast(); - static Output BroadcastNodeLastDim(const ngraph::Output& value, + static Output BroadcastNodeLastDim(const ov::Output& value, const ov::PartialShape& target_shape, const ov::PartialShape& normalized_shape); }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/loop_fusion.hpp b/src/common/snippets/include/snippets/pass/loop_fusion.hpp deleted file mode 100644 index 14676a15a6ee58..00000000000000 --- a/src/common/snippets/include/snippets/pass/loop_fusion.hpp +++ /dev/null @@ -1,29 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface LoopFusion - * @brief Fuse Loops into one Loop if their semantics allow it - * @ingroup snippets - */ -class LoopFusion: public ngraph::pass::MatcherPass { -public: - LoopFusion(); - -private: - bool Merge(const std::shared_ptr& buffer); -}; - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/pass/loop_helpers.hpp b/src/common/snippets/include/snippets/pass/loop_helpers.hpp deleted file mode 100644 index 12e0e9746bc8f0..00000000000000 --- a/src/common/snippets/include/snippets/pass/loop_helpers.hpp +++ /dev/null @@ -1,99 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include "ngraph/op/op.hpp" -#include "ngraph/op/parameter.hpp" -#include "snippets/op/loop.hpp" - -namespace ngraph { -namespace snippets { -namespace op { - -/* ==== LoopBegin === */ -/** - * @interface insertLoopBeginAfterOutputs - * @brief Inserts LoopBegin operation after the group of operations described - * by the input argument (OutputVector). Use insertLoopBegin instead - it has a more universal interface. - * @ingroup snippets - */ -std::shared_ptr insertLoopBeginAfterOutputs(const OutputVector& originalOutputs); - -/** - * @interface insertLoopBegin - * @brief Inserts LoopBegin operation after the group of operations described - * by the input argument (ParameterVector, NodeVector or OutputVector). - * @ingroup snippets - */ -template -std::shared_ptr insertLoopBegin(const T& afterTheseNodes) { - static_assert(std::is_same() || std::is_same(), - "Unsupported template parameter for insertLoopBegin. Only ParameterVector or NodeVector is allowed"); - OutputVector originalOutputs; - std::vector>> childInputs; - for (const auto &n : afterTheseNodes) { - const auto& nodeOutputs = n->outputs(); - // Ignore the LoopBegin->LoopEnd edge to make it easier to construct enclosed Loops - std::move(nodeOutputs.begin(), nodeOutputs.end() - 1 * ov::is_type(n), std::back_inserter(originalOutputs)); - } - - return insertLoopBeginAfterOutputs(originalOutputs); -} - -template<> -inline std::shared_ptr insertLoopBegin(const OutputVector& afterTheseNodes) { - return insertLoopBeginAfterOutputs(afterTheseNodes); -} -/* ============== */ - -/* ==== LoopEnd === */ -/** - * @interface insertLoopBeginAfterOutputs - * @brief Inserts LoopBegin operation after the group of operations described - * by the input argument (vector of inputs). Use insertLoopEnd instead - it has a more universal interface. - * @param originalInputs LoopEnd will be inserted before these inputs - * @param loopBegin pointer to the beginning of the Loop region - * @param work_amount total number of evaluations to be processed by the loop - * @param increment number of evaluations processed in one iteration of the loop - * @param apply_increment describes which data pointers attributed to the loop should be incremented on every iteration. - * should be used when Loop is connected to Parameters and/or Results - * @param finalization_offsets pointer shifts that should be applied to data pointers before exiting the loop - * @ingroup snippets - */ - -std::shared_ptr insertLoopEndBeforeInputs(const std::vector>& originalInputs, - const std::shared_ptr& loopBegin, - size_t work_amount, size_t increment, - std::vector apply_increment = {}, - std::vector finalization_offsets = {}); - -/** - * @interface insertLoopEnd - * @brief Inserts LoopEnd operation before the group of operations described - * by the input argument (ResultVector, NodeVector or OutputVector). - * @ingroup snippets - */ -template -std::shared_ptr insertLoopEnd(const T& beforeTheseNodes, Args ...args) { - static_assert(std::is_same() || std::is_same(), - "Unsupported template parameter for insertLoopBegin. Only ParameterVector or NodeVector is allowed"); - std::vector> originalInputs; - for (const auto &n : beforeTheseNodes) { - const auto& nodeInputs = n->inputs(); - // Ignore the LoopBegin->LoopEnd edge to facilitate enclosed Loops construction - std::move(nodeInputs.begin(), nodeInputs.end() - 1 * ov::is_type(n), std::back_inserter(originalInputs)); - } - return insertLoopEndBeforeInputs(originalInputs, args...); -} - -template -std::shared_ptr insertLoopEnd(const std::vector>& beforeTheseNodes, Args ...args) { - return insertLoopEndBeforeInputs(beforeTheseNodes, args...); -} -/* ============== */ - -} // namespace op -} // namespace snippets -} // namespace ngraph \ No newline at end of file diff --git a/src/common/snippets/include/snippets/pass/matmul_to_brgemm.hpp b/src/common/snippets/include/snippets/pass/matmul_to_brgemm.hpp index 1f00b944b56808..86f9569cc09328 100644 --- a/src/common/snippets/include/snippets/pass/matmul_to_brgemm.hpp +++ b/src/common/snippets/include/snippets/pass/matmul_to_brgemm.hpp @@ -1,28 +1,33 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include "ngraph/pass/graph_rewrite.hpp" -#include "ngraph/pattern/matcher.hpp" +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +#include "snippets/op/brgemm.hpp" + +namespace ov { namespace snippets { namespace pass { /** * @interface MatMulToBrgemm - * @brief Replaces ngraph::MatMul with snippets::op::Brgemm operation (only non-trasposing MatMuls are currently supported) + * @brief Replaces ov::MatMul with snippets::op::Brgemm operation (only non-trasposing MatMuls are currently supported) * @ingroup snippets */ -class MatMulToBrgemm: public ngraph::pass::MatcherPass { +class MatMulToBrgemm: public ov::pass::MatcherPass { public: OPENVINO_RTTI("MatMulToBrgemm", "0"); MatMulToBrgemm(); + +private: + void init_ports(const std::shared_ptr& brgemm) const; }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/mha_tokenization.hpp b/src/common/snippets/include/snippets/pass/mha_tokenization.hpp index 7c161e8447e9b8..c1a1700b1da7eb 100644 --- a/src/common/snippets/include/snippets/pass/mha_tokenization.hpp +++ b/src/common/snippets/include/snippets/pass/mha_tokenization.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -17,7 +17,7 @@ namespace pass { * TODO: Write pattern * @ingroup snippets */ -class TokenizeMHASnippets: public ngraph::pass::MatcherPass { +class TokenizeMHASnippets: public ov::pass::MatcherPass { public: OPENVINO_RTTI("TokenizeMHASnippets", "0"); TokenizeMHASnippets(); @@ -25,4 +25,4 @@ class TokenizeMHASnippets: public ngraph::pass::MatcherPass { } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/propagate_precision.hpp b/src/common/snippets/include/snippets/pass/propagate_precision.hpp index f3e7485bcaf106..1f5bd0cf9542bf 100644 --- a/src/common/snippets/include/snippets/pass/propagate_precision.hpp +++ b/src/common/snippets/include/snippets/pass/propagate_precision.hpp @@ -8,7 +8,7 @@ #include #include "snippets/generator.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -17,7 +17,7 @@ namespace pass { * @ingroup snippets * @brief PropagatePrecision transformation propagate precision from parameters to results. */ -class PropagatePrecision: public ngraph::pass::FunctionPass { +class PropagatePrecision: public ov::pass::ModelPass { public: OPENVINO_RTTI("PropagatePrecision", "0"); PropagatePrecision(const std::shared_ptr& target_machine); @@ -39,7 +39,7 @@ class PropagatePrecision: public ngraph::pass::FunctionPass { const element::Type& actual, const element::Type& required) noexcept; - static bool validate_and_infer_types_and_restore_outputs(const std::shared_ptr& op); + static bool validate_and_infer_types_and_restore_outputs(const std::shared_ptr& op); private: const std::shared_ptr target_machine; @@ -47,4 +47,4 @@ class PropagatePrecision: public ngraph::pass::FunctionPass { } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/reset_buffer.hpp b/src/common/snippets/include/snippets/pass/reset_buffer.hpp deleted file mode 100644 index 599b533e3ebf1e..00000000000000 --- a/src/common/snippets/include/snippets/pass/reset_buffer.hpp +++ /dev/null @@ -1,29 +0,0 @@ -// Copyright (C) 2018-2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface ResetBufferState - * @brief If there is Buffer between loops we should reset Buffer pointer after first loop execution (data storing) using finalization offsets - * to have correct buffer data pointer for data loading in the next loop where data was stored in previous loop - * @ingroup snippets - */ -class ResetBufferState: public ngraph::pass::MatcherPass { -public: - ResetBufferState(); - - static int64_t calculate_required_finalization_offsets(const size_t inner_master_work_amount, const size_t inner_target_work_amount); -}; - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/pass/set_softmax_ports.hpp b/src/common/snippets/include/snippets/pass/set_softmax_ports.hpp new file mode 100644 index 00000000000000..92a75c2345a408 --- /dev/null +++ b/src/common/snippets/include/snippets/pass/set_softmax_ports.hpp @@ -0,0 +1,26 @@ +// Copyright (C) 2018-2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" + +namespace ov { +namespace snippets { +namespace pass { + +/** + * @interface SetSoftmaxPorts + * @brief The pass updates port descriptors in accordance with the Softmax reduction axis + * @ingroup snippets + */ +class SetSoftmaxPorts: public ov::pass::MatcherPass { +public: + SetSoftmaxPorts(); +}; + +} // namespace pass +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/softmax_decomposition.hpp b/src/common/snippets/include/snippets/pass/softmax_decomposition.hpp deleted file mode 100644 index b640ab35b0bbbc..00000000000000 --- a/src/common/snippets/include/snippets/pass/softmax_decomposition.hpp +++ /dev/null @@ -1,30 +0,0 @@ -// Copyright (C) 2018-2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface SoftmaxDecomposition - * @brief The pass decomposise Softmax into explicit Snippets dialects - * Note: - * - At the moment Snippets supports Softmax only in MHA pattern where there are Buffer ops before and after Softmax. - * Also Snippets support Loops with Buffer ops on inputs and outputs if Buffer have the same buffer byte size - * because of work with ptr increment. So we have to set Tile rank as buffer allocation rank even if rank 1 is enough - * @ingroup snippets - */ -class SoftmaxDecomposition: public ngraph::pass::MatcherPass { -public: - SoftmaxDecomposition(const size_t vector_size, const int32_t buffer_allocation_rank = -1); -}; - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/pass/softmax_reshape_elimination.hpp b/src/common/snippets/include/snippets/pass/softmax_reshape_elimination.hpp index 7522f411669dc3..8c240a08eebc29 100644 --- a/src/common/snippets/include/snippets/pass/softmax_reshape_elimination.hpp +++ b/src/common/snippets/include/snippets/pass/softmax_reshape_elimination.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -16,7 +16,7 @@ namespace pass { * @brief The pass removes Reshape operations around Softmax if possible * @ingroup snippets */ -class SoftmaxReshapeElimination: public ngraph::pass::MatcherPass { +class SoftmaxReshapeElimination: public ov::pass::MatcherPass { public: SoftmaxReshapeElimination(); }; @@ -24,4 +24,4 @@ class SoftmaxReshapeElimination: public ngraph::pass::MatcherPass { } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/tokenization.hpp b/src/common/snippets/include/snippets/pass/tokenization.hpp index 19b776ec25751d..a1f4bb4f2f8d6e 100644 --- a/src/common/snippets/include/snippets/pass/tokenization.hpp +++ b/src/common/snippets/include/snippets/pass/tokenization.hpp @@ -1,16 +1,16 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" #include "snippets/pass/mha_tokenization.hpp" #include "snippets/pass/collapse_subgraph.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -46,7 +46,7 @@ class EnumerateNodes : public ov::pass::ModelPass { * 4. Some common transformations for Subgraphs. For example, FakeQuantize decomposition * @ingroup snippets */ -class SnippetsTokenization : public ngraph::pass::FunctionPass { +class SnippetsTokenization : public ov::pass::ModelPass { public: OPENVINO_RTTI("SnippetsTokenization", "0"); bool run_on_model(const std::shared_ptr& m) override; @@ -55,4 +55,4 @@ class SnippetsTokenization : public ngraph::pass::FunctionPass { } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/transform_convert.hpp b/src/common/snippets/include/snippets/pass/transform_convert.hpp index c9499e7482ced9..73dde67850a00e 100644 --- a/src/common/snippets/include/snippets/pass/transform_convert.hpp +++ b/src/common/snippets/include/snippets/pass/transform_convert.hpp @@ -4,10 +4,10 @@ #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -18,11 +18,11 @@ namespace pass { * This op is used for real Convert ops inside subgraph body in CPU Plugin * @ingroup snippets */ -class TransformConvertToConvertTruncation: public ngraph::pass::MatcherPass { +class TransformConvertToConvertTruncation: public ov::pass::MatcherPass { public: TransformConvertToConvertTruncation(); }; } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/transpose_decomposition.hpp b/src/common/snippets/include/snippets/pass/transpose_decomposition.hpp index 9f939eea4b78a8..013a538172ac7e 100644 --- a/src/common/snippets/include/snippets/pass/transpose_decomposition.hpp +++ b/src/common/snippets/include/snippets/pass/transpose_decomposition.hpp @@ -1,13 +1,13 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once -#include -#include +#include "openvino/pass/graph_rewrite.hpp" +#include "openvino/pass/pattern/matcher.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -16,7 +16,7 @@ namespace pass { * @brief Decompose Transpose to Load + Store wrapped in several loops. * @ingroup snippets */ -class TransposeDecomposition: public ngraph::pass::MatcherPass { +class TransposeDecomposition: public ov::pass::MatcherPass { public: OPENVINO_RTTI("TransposeDecomposition", "0"); TransposeDecomposition(); @@ -25,4 +25,4 @@ class TransposeDecomposition: public ngraph::pass::MatcherPass { } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/pass/vector_to_scalar.hpp b/src/common/snippets/include/snippets/pass/vector_to_scalar.hpp deleted file mode 100644 index da65a64e4cd828..00000000000000 --- a/src/common/snippets/include/snippets/pass/vector_to_scalar.hpp +++ /dev/null @@ -1,40 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -/** - * @interface SetScalarCountForLoad - * @brief Set count `1` for Load to represent as ScalarLoad - * The pass is used to change element count to loading to "1" to load scalar value - * Used for tail generation - * @ingroup snippets - */ -class SetScalarCountForLoad: public ngraph::pass::MatcherPass { -public: - SetScalarCountForLoad(); -}; - -/** - * @interface SetScalarCountForStore - * @brief Set count `1` for Store to represent as ScalarStore - * The pass is used to change element count to stroring to "1" to store scalar valuw - * Used for tail generation - * @ingroup snippets - */ -class SetScalarCountForStore: public ngraph::pass::MatcherPass { -public: - SetScalarCountForStore(); -}; - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/include/snippets/snippets_isa.hpp b/src/common/snippets/include/snippets/snippets_isa.hpp index af489925c51998..87579feebb1796 100644 --- a/src/common/snippets/include/snippets/snippets_isa.hpp +++ b/src/common/snippets/include/snippets/snippets_isa.hpp @@ -4,8 +4,8 @@ #pragma once -#include "ngraph/ops.hpp" -#include +#include "openvino/core/node.hpp" +#include "openvino/opsets/opset1.hpp" #include "op/broadcastload.hpp" #include "op/broadcastmove.hpp" @@ -25,12 +25,12 @@ #include "op/brgemm.hpp" #include "op/vector_buffer.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace isa { -#define NGRAPH_OP(a, b) using b::a; +#define OV_OP(a, b) using b::a; #include "snippets_isa_tbl.hpp" -#undef NGRAPH_OP +#undef OV_OP } // namespace isa } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/include/snippets/snippets_isa_tbl.hpp b/src/common/snippets/include/snippets/snippets_isa_tbl.hpp index 1816322bb36f4d..b0a87a8a82a1f9 100644 --- a/src/common/snippets/include/snippets/snippets_isa_tbl.hpp +++ b/src/common/snippets/include/snippets/snippets_isa_tbl.hpp @@ -4,82 +4,82 @@ #pragma once -#ifndef NGRAPH_OP -#warning "NGRAPH_OP not defined" -#define NGRAPH_OP(x, y) +#ifndef OV_OP +#warning "OV_OP not defined" +#define OV_OP(x, y) #endif // SnippetS dialect -NGRAPH_OP(Load, ngraph::snippets::op) -NGRAPH_OP(LoadReshape, ngraph::snippets::op) -NGRAPH_OP(LoopBegin, ngraph::snippets::op) -NGRAPH_OP(LoopEnd, ngraph::snippets::op) -NGRAPH_OP(Brgemm, ngraph::snippets::op) -NGRAPH_OP(BroadcastLoad, ngraph::snippets::op) +OV_OP(Load, ov::snippets::op) +OV_OP(LoadReshape, ov::snippets::op) +OV_OP(LoopBegin, ov::snippets::op) +OV_OP(LoopEnd, ov::snippets::op) +OV_OP(Brgemm, ov::snippets::op) +OV_OP(BroadcastLoad, ov::snippets::op) -NGRAPH_OP(Store, ngraph::snippets::op) +OV_OP(Store, ov::snippets::op) -NGRAPH_OP(BroadcastMove, ngraph::snippets::op) -NGRAPH_OP(Scalar, ngraph::snippets::op) -NGRAPH_OP(Nop, ngraph::snippets::op) +OV_OP(BroadcastMove, ov::snippets::op) +OV_OP(Scalar, ov::snippets::op) +OV_OP(Nop, ov::snippets::op) // Layout-oblivious from opset1 // opset completeness -NGRAPH_OP(Constant, ngraph::op) -NGRAPH_OP(Parameter, ngraph::op::v0) -NGRAPH_OP(Result, ngraph::op::v0) -NGRAPH_OP(Broadcast, ngraph::op::v1) -NGRAPH_OP(ConvertTruncation, ngraph::snippets::op) -NGRAPH_OP(ConvertSaturation, ngraph::snippets::op) +OV_OP(Constant, ov::op::v0) +OV_OP(Parameter, ov::op::v0) +OV_OP(Result, ov::op::v0) +OV_OP(Broadcast, ov::op::v1) +OV_OP(ConvertTruncation, ov::snippets::op) +OV_OP(ConvertSaturation, ov::snippets::op) // unary -NGRAPH_OP(Abs, ngraph::op::v0) -NGRAPH_OP(Acos, ngraph::op::v0) -NGRAPH_OP(Asin, ngraph::op::v0) -NGRAPH_OP(Atan, ngraph::op::v0) -NGRAPH_OP(Ceiling, ngraph::op::v0) -NGRAPH_OP(Clamp, ngraph::op::v0) -NGRAPH_OP(Cos, ngraph::op::v0) -NGRAPH_OP(Cosh, ngraph::op::v0) -NGRAPH_OP(Elu, ngraph::op::v0) -NGRAPH_OP(Erf, ngraph::op::v0) -NGRAPH_OP(Exp, ngraph::op::v0) -NGRAPH_OP(Floor, ngraph::op::v0) -NGRAPH_OP(HardSigmoid, ngraph::op::v0) -NGRAPH_OP(Log, ngraph::op::v0) -NGRAPH_OP(LogicalNot, ngraph::op::v1) -NGRAPH_OP(Negative, ngraph::op::v0) -NGRAPH_OP(Relu, ngraph::op::v0) -NGRAPH_OP(Round, ngraph::op::v5) -NGRAPH_OP(Selu, ngraph::op::v0) -NGRAPH_OP(Sign, ngraph::op::v0) -NGRAPH_OP(Sigmoid, ngraph::op::v0) -NGRAPH_OP(Sin, ngraph::op::v0) -NGRAPH_OP(Sinh, ngraph::op::v0) -NGRAPH_OP(Sqrt, ngraph::op::v0) -NGRAPH_OP(Tan, ngraph::op::v0) -NGRAPH_OP(Tanh, ngraph::op::v0) +OV_OP(Abs, ov::op::v0) +OV_OP(Acos, ov::op::v0) +OV_OP(Asin, ov::op::v0) +OV_OP(Atan, ov::op::v0) +OV_OP(Ceiling, ov::op::v0) +OV_OP(Clamp, ov::op::v0) +OV_OP(Cos, ov::op::v0) +OV_OP(Cosh, ov::op::v0) +OV_OP(Elu, ov::op::v0) +OV_OP(Erf, ov::op::v0) +OV_OP(Exp, ov::op::v0) +OV_OP(Floor, ov::op::v0) +OV_OP(HardSigmoid, ov::op::v0) +OV_OP(Log, ov::op::v0) +OV_OP(LogicalNot, ov::op::v1) +OV_OP(Negative, ov::op::v0) +OV_OP(Relu, ov::op::v0) +OV_OP(Round, ov::op::v5) +OV_OP(Selu, ov::op::v0) +OV_OP(Sign, ov::op::v0) +OV_OP(Sigmoid, ov::op::v0) +OV_OP(Sin, ov::op::v0) +OV_OP(Sinh, ov::op::v0) +OV_OP(Sqrt, ov::op::v0) +OV_OP(Tan, ov::op::v0) +OV_OP(Tanh, ov::op::v0) // binary -NGRAPH_OP(Add, ngraph::op::v1) -NGRAPH_OP(Divide, ngraph::op::v1) -NGRAPH_OP(Equal, ngraph::op::v1) -NGRAPH_OP(FloorMod, ngraph::op::v1) -NGRAPH_OP(Greater, ngraph::op::v1) -NGRAPH_OP(GreaterEqual, ngraph::op::v1) -NGRAPH_OP(Less, ngraph::op::v1) -NGRAPH_OP(LessEqual, ngraph::op::v1) -NGRAPH_OP(LogicalAnd, ngraph::op::v1) -NGRAPH_OP(LogicalOr, ngraph::op::v1) -NGRAPH_OP(LogicalXor, ngraph::op::v1) -NGRAPH_OP(Maximum, ngraph::op::v1) -NGRAPH_OP(Minimum, ngraph::op::v1) -NGRAPH_OP(Mod, ngraph::op::v1) -NGRAPH_OP(Multiply, ngraph::op::v1) -NGRAPH_OP(NotEqual, ngraph::op::v1) -NGRAPH_OP(Power, ngraph::op::v1) -NGRAPH_OP(PRelu, ngraph::op::v0) -NGRAPH_OP(SquaredDifference, ngraph::op::v0) -NGRAPH_OP(Subtract, ngraph::op::v1) -NGRAPH_OP(Xor, ngraph::op::v0) +OV_OP(Add, ov::op::v1) +OV_OP(Divide, ov::op::v1) +OV_OP(Equal, ov::op::v1) +OV_OP(FloorMod, ov::op::v1) +OV_OP(Greater, ov::op::v1) +OV_OP(GreaterEqual, ov::op::v1) +OV_OP(Less, ov::op::v1) +OV_OP(LessEqual, ov::op::v1) +OV_OP(LogicalAnd, ov::op::v1) +OV_OP(LogicalOr, ov::op::v1) +OV_OP(LogicalXor, ov::op::v1) +OV_OP(Maximum, ov::op::v1) +OV_OP(Minimum, ov::op::v1) +OV_OP(Mod, ov::op::v1) +OV_OP(Multiply, ov::op::v1) +OV_OP(NotEqual, ov::op::v1) +OV_OP(Power, ov::op::v1) +OV_OP(PRelu, ov::op::v0) +OV_OP(SquaredDifference, ov::op::v0) +OV_OP(Subtract, ov::op::v1) +OV_OP(Xor, ov::op::v0) diff --git a/src/common/snippets/include/snippets/target_machine.hpp b/src/common/snippets/include/snippets/target_machine.hpp new file mode 100644 index 00000000000000..7bffc3af62351e --- /dev/null +++ b/src/common/snippets/include/snippets/target_machine.hpp @@ -0,0 +1,80 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +/** + * @brief A file contains public interface for target independent code generator. + * @file generator.hpp + */ +#pragma once + +#include "emitter.hpp" + +namespace ov { +namespace snippets { + +typedef std::pair(const std::shared_ptr&)>, + std::function>(const std::shared_ptr&)>> jitters_value; + +/** + * @interface TargetMachine + * @brief Base class Target machine representation. Target derives from this class to provide generator information about supported emitters + * @ingroup snippets + */ +class TargetMachine { +public: + /** + * @brief checks if target is natively supported + * @return true, if supported + */ + virtual bool is_supported() const = 0; + + /** + * @brief finalizes code generation + * @return generated kernel binary + */ + virtual code get_snippet() const = 0; + + /** + * @brief gets number of lanes supported by target's vector ISA + * @return number of lanes + */ + virtual size_t get_lanes() const = 0; + + + /** + * @brief called by generator to all the emitter for a target machine + * @return a map by node's type info with callbacks to create an instance of emitter for corresponding operation type + */ + std::function(const std::shared_ptr)> get(const ov::DiscreteTypeInfo& type) const { + auto jitter = jitters.find(type); + if (jitter == jitters.end()) { + OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation."); + } + return jitter->second.first; + } + + std::function>(const std::shared_ptr&)> + get_supported_precisions(const ov::DiscreteTypeInfo type) const { + auto jitter = jitters.find(type); + if (jitter == jitters.end()) { + OPENVINO_THROW(std::string("Target code emitter is not available for ") + type.name + " operation."); + } + return jitter->second.second; + } + + /** + * @brief checks if emitter for a specific operation is supported + * @return true, if supported + */ + bool has(const ov::DiscreteTypeInfo type) const { + return jitters.find(type) != jitters.end(); + } + virtual ~TargetMachine() = default; + +protected: + std::map jitters; +}; + +} // namespace snippets +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/include/snippets/utils.hpp b/src/common/snippets/include/snippets/utils.hpp index 3325ff42446594..8a038bfa02d9ed 100644 --- a/src/common/snippets/include/snippets/utils.hpp +++ b/src/common/snippets/include/snippets/utils.hpp @@ -12,28 +12,24 @@ #include "emitter.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace utils { // Get non-scalar Constant count that will be created after FakeQuantize decomposition. // This count is needed to know exact count of non-scalar Constants during tokenization. -auto get_non_scalar_constant_count_for_fq(const std::shared_ptr& fq) -> size_t; +auto get_non_scalar_constant_count_for_fq(const std::shared_ptr& fq) -> size_t; -inline auto is_scalar_constant(const std::shared_ptr& source_output_node) -> bool { - return ngraph::is_type(source_output_node) && ngraph::shape_size(source_output_node->get_shape()) == 1; +inline auto is_scalar_constant(const std::shared_ptr& source_output_node) -> bool { + return ov::is_type(source_output_node) && ov::shape_size(source_output_node->get_shape()) == 1; } - +ov::PartialShape get_port_planar_shape(const Input& out); ov::PartialShape get_port_planar_shape(const Output& out); ov::PartialShape get_reordered_planar_shape(const ov::PartialShape& shape, const std::vector& layout); -std::vector get_node_output_layout(const std::shared_ptr& node); -std::vector get_node_output_layout(const Node* node); -void set_transpose_output_layout(const ov::Output& port, const std::shared_ptr& node); -void set_output_layout(const ov::Output& port, const std::vector& layout); -inline ov::Dimension get_inner_dim(const ov::PartialShape &shape) { return *(shape.rbegin()); } -inline ov::Dimension get_outer_dim(const ov::PartialShape &shape) { return *(shape.rbegin() + 1); } +// Copy runtime info using default ngraph method but delete PortDescriptors which may be transferred after copying +void safe_copy_runtime_info(const std::shared_ptr&, const std::shared_ptr& to); inline auto normalize_rank(int32_t allocation_rank, const size_t shape_rank) -> int32_t { return allocation_rank < 0 ? allocation_rank + static_cast(shape_rank) + 1 : allocation_rank; @@ -56,4 +52,4 @@ constexpr bool everyone_is(T val, P item, Args... item_others) { } } // namespace utils } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/src/generator.cpp b/src/common/snippets/src/generator.cpp index d59c6772fff9d9..56747783303869 100644 --- a/src/common/snippets/src/generator.cpp +++ b/src/common/snippets/src/generator.cpp @@ -3,224 +3,53 @@ // #include "snippets/generator.hpp" -#include "snippets/pass/assign_registers.hpp" -#include "snippets/pass/vector_to_scalar.hpp" -#include "snippets/pass/insert_load_store.hpp" -#include "snippets/op/loop.hpp" -#include "snippets/op/subgraph.hpp" -#include "snippets/op/kernel.hpp" -#include - -#include -#include - -namespace ngraph { -namespace snippets { - -auto getRegisters(const std::shared_ptr &n) -> RegInfo { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::getRegisters") - - // ToDo: change to reg_t - std::vector rin, rout; - for (const auto& output : n->outputs()) { - const auto& rt = output.get_tensor_ptr()->get_rt_info(); - auto it_rt = rt.find("reginfo"); - if (it_rt != rt.end()) - rout.push_back(it_rt->second.as()); - } - - for (const auto& input : n->inputs()) { - auto rt = input.get_source_output().get_tensor_ptr()->get_rt_info(); - auto it_rt = rt.find("reginfo"); - if (it_rt != rt.end()) - rin.push_back(it_rt->second.as()); - } +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/pass/assign_registers.hpp" +#include "snippets/lowered/pass/insert_tail_loop.hpp" - return std::make_pair(rin, rout); -} - -auto tail_transformations(NodeVector& tail, const size_t tail_size, const ngraph::snippets::Generator::GeneratorConfig& config) -> void { - NodeVector updated_tile; - auto insertFill = [tail_size](const ov::Input& input) -> std::shared_ptr { - auto copyRegInfo = [](const ov::descriptor::Tensor& from, ov::descriptor::Tensor& to) -> void { - auto rt = from.get_rt_info(); - auto reginfo = rt.find("reginfo"); - if (reginfo != rt.end()) { - to.get_rt_info()["reginfo"] = reginfo->second; - } - }; - std::shared_ptr fill = nullptr; - auto& rt = input.get_rt_info(); - auto fill_rt = rt.find("set_fill"); - if (fill_rt != rt.end()) { - const auto fill_value = fill_rt->second.as(); - fill = std::make_shared(input.get_source_output(), tail_size, fill_value); - input.get_node()->set_argument(input.get_index(), fill); - // we should explicitly copy reg info because we insert Fill after assign register - copyRegInfo(fill->get_input_tensor(0), fill->get_output_tensor(0)); - } - return fill; - }; +#include "snippets/op/kernel.hpp" - for (auto& op : tail) { - // We should fill vector regs by float_min and zero to have - // correct math calculations for ReduceMax and ReduceSum in scalar case. - // Note: We find Maximum and Add ops because HorizonMax and HorizonSum are outside Loop, - // so they are missed in - if (config.m_need_fill_tail_register && - (ov::is_type(op) || - ov::is_type(op))) { - for (size_t i = 0; i < op->inputs().size(); ++i) { - if (auto fill = insertFill(op->input(i))) { - updated_tile.push_back(fill); - } - } - } else if (const auto memory_access = std::dynamic_pointer_cast(op)) { - for (size_t i = 0; i < memory_access->get_input_port_count(); ++i) { - if (memory_access->get_input_count(i) > 1) { - memory_access->set_input_count(tail_size, i); - } - } - for (size_t i = 0; i < memory_access->get_output_port_count(); ++i) { - if (memory_access->get_output_count(i) > 1) { - memory_access->set_output_count(tail_size, i); - } - } - } - updated_tile.push_back(op); - } +#include "snippets/itt.hpp" - tail = std::move(updated_tile); -} +namespace ov { +namespace snippets { -ngraph::snippets::code ngraph::snippets::Generator::generate(std::shared_ptr& m, - const GeneratorConfig& config, - const void* compile_params) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::Generator::generate") +Generator::LoweringResult Generator::generate(lowered::LinearIR& linear_ir, const lowered::Config& config, const void* compile_params) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::Generator::generate") + OV_ITT_TASK_CHAIN(GENERATE, ov::pass::itt::domains::SnippetsTransform, "Snippets::Generator", "::Transformations") if (!target->is_supported()) OPENVINO_THROW("unsupported architecture for code generation"); - OV_ITT_TASK_CHAIN(GENERATE, ngraph::pass::itt::domains::SnippetsTransform, "Snippets::Generator", "::VectorTile") - // vector loop - std::vector lowered; - auto lower_ops = [&lowered, this](const NodeVector& ops){ - std::transform(ops.begin(), ops.end(), std::back_inserter(lowered), - [this](const std::shared_ptr& n){ - return std::make_pair(target->get(n->get_type_info())(n), ngraph::snippets::getRegisters(n)); - }); - }; - // *1* solo vector/tail loop + empty outer loop - // => skip increments (both counter & ptr) : set evaluate_once flag - // *2* solo vector/tail loop + non-empty outer loop - // => skip counter increments but perform ptr increments : set evaluate_once, - // and perform pointer increments through finalization offsets - // *3* vector loop(s) + one tail loop - // => vector as usual, tail depends on outer loop, see *1* and *2* - auto optimize_single_evaluation = [](const std::shared_ptr& loop, bool force_ptr_increment = false) { - if (loop->get_work_amount() < 2 * loop->get_increment()) { - loop->set_evaluate_once(true); - if (force_ptr_increment || loop->has_outer_loop) { - std::vector new_finalization_offsets(loop->get_finalization_offsets()); - const auto& ptr_increments = loop->get_ptr_increments(); - for (size_t i = 0; i < new_finalization_offsets.size(); i++) { - new_finalization_offsets[i] += ptr_increments[i]; - } - loop->set_finalization_offsets(new_finalization_offsets); - } - return true; - } else { - return false; - } + std::function& op)> reg_type_mapper = [&](const std::shared_ptr& op) -> opRegType { + return get_op_reg_type(op); }; - const auto& ops = m->get_ordered_ops(); - for (auto op = ops.begin(); op < ops.end(); op++) { - const auto& loop_begin = ov::as_type_ptr(*op); - - // ignore outer loops and possible manual scalar loops - if (loop_begin && loop_begin->get_increment() != 1) { - OV_ITT_TASK_NEXT(GENERATE, "::VectorLoop") - NodeVector vector_loop, tail_loop; - std::shared_ptr vector_loop_end, tail_loop_end; - vector_loop_end = loop_begin->get_loop_end(); - tail_loop_end = nullptr; - while (*op != vector_loop_end) - vector_loop.push_back(*op++); - vector_loop.push_back(*op); - const auto work_amount = vector_loop_end->get_work_amount(); - const auto increment = vector_loop_end->get_increment(); - const auto tail_size = work_amount % increment; - const auto need_tail = tail_size != 0; - const auto need_vector_loop = work_amount >= increment; - // Note, that finalization_offsets could be modified inside optimize_single_evaluation, - // so need to save them here to cover (evaluate_once vector with non-zero finalization_offsets + tail) - std::vector tail_finalization_offsets = need_tail ? vector_loop_end->get_finalization_offsets() : std::vector {}; - // vector loops are required => Just copy the body, original loop is already a vector one - if (need_vector_loop) { - // Note that finalization offsets should be applied after the last iteration. - // So if there is a tail, then we should apply offsets after it, but not now. - if (need_tail) - vector_loop_end->set_finalization_offsets(std::vector(tail_finalization_offsets.size(), 0)); + lowered::pass::PassPipeline lowered_pipeline; + lowered_pipeline.register_pass(reg_type_mapper); + lowered_pipeline.register_pass(); + lowered_pipeline.run(linear_ir); - if (config.m_optimize_single_evaluation) { - // force ptr increments if there is tail - optimize_single_evaluation(vector_loop_end, need_tail); - } - - lower_ops(vector_loop); - } - OV_ITT_TASK_NEXT(GENERATE, "::TailLoop") - // tail is required => transform the body into a tail representation - // tail loop is fake loop because for tail we should calculate only - // finalization offsets which are supported by LoopEnd. - if (need_tail) { - NodeMap vector_to_tail_node_map; - tail_loop = ngraph::clone_nodes(vector_loop, vector_to_tail_node_map); - tail_transformations(tail_loop, tail_size, config); - tail_loop_end = ov::as_type_ptr(*tail_loop.rbegin()); - tail_loop_end->set_finalization_offsets(tail_finalization_offsets); - tail_loop_end->set_increment(tail_size); - // ptr increments were set to the old increment, need to update them in accordance with the new one - tail_loop_end->update_ptr_increments(static_cast(tail_size)); - tail_loop_end->set_work_amount(tail_size); - tail_loop_end->has_outer_loop = vector_loop_end->has_outer_loop; - - if (config.m_optimize_single_evaluation) { - // tail loop is always executed once - optimize_single_evaluation(tail_loop_end); - } - - lower_ops(tail_loop); - } - } else { - lower_ops({*op}); - } - } + linear_ir.init_emitters(target); OV_ITT_TASK_NEXT(GENERATE, "::EmitCode") - //todo: Kernel need info on i/o data access pattern and data shapes to calculate data offsets - // pass Params and Results - // todo: it's probably better to move AllocaledEmitter creation inside Kernel constructor - // So Kernel accepts only model ptr and target, and creates AllocatedEmitter inside - //emission - auto loops2DKernel = std::make_shared(lowered, m); + auto loops2DKernel = std::make_shared(linear_ir); loops2DKernel->compile_params = compile_params; std::shared_ptr kernel = target->get(op::Kernel::get_type_info_static())(loops2DKernel); kernel->emit_code({}, {}); OV_ITT_TASK_NEXT(GENERATE, "::EmitData") - for (auto& op : lowered) { - op.first->emit_data(); + for (auto& l : linear_ir.get_ops()) { + l->get_emitter()->emit_data(); } OV_ITT_TASK_NEXT(GENERATE, "::GetSnippet") // todo: we save lowered to access compiled brgemm kernels on execution time (normally lowered is destructed by then) // remove this when kernel caching is implemented. Don't forget to make generate const method. - if (config.m_save_lowered_code) - lowered_saved = lowered; + if (config.m_save_expressions) + lowered_saved = linear_ir; - return target->get_snippet(); + return { target->get_snippet() }; } std::shared_ptr Generator::get_target_machine() const { @@ -228,8 +57,8 @@ std::shared_ptr Generator::get_target_machine() const { } Generator::opRegType Generator::get_op_reg_type(const std::shared_ptr& op) const { - if (std::dynamic_pointer_cast(op) || - std::dynamic_pointer_cast(op) || + if (std::dynamic_pointer_cast(op) || + std::dynamic_pointer_cast(op) || std::dynamic_pointer_cast(op) || std::dynamic_pointer_cast(op) || std::dynamic_pointer_cast(op) || @@ -244,10 +73,10 @@ Generator::opRegType Generator::get_op_reg_type(const std::shared_ptr& op) ov::op::util::is_binary_elementwise_arithmetic(op) || ov::op::util::is_binary_elementwise_comparison(op) || ov::op::util::is_binary_elementwise_logical(op) || - std::dynamic_pointer_cast(op) || - std::dynamic_pointer_cast(op) || - std::dynamic_pointer_cast(op) || - std::dynamic_pointer_cast(op) || + std::dynamic_pointer_cast(op) || + std::dynamic_pointer_cast(op) || + std::dynamic_pointer_cast(op) || + std::dynamic_pointer_cast(op) || std::dynamic_pointer_cast(op) || std::dynamic_pointer_cast(op) || std::dynamic_pointer_cast(op) || @@ -262,6 +91,5 @@ Generator::opRegType Generator::get_specific_op_reg_type(const std::shared_ptrget_type_name()) + " isn't determined!"); } - }// namespace snippets -}// namespace ngraph +}// namespace ov diff --git a/src/common/snippets/src/lowered/expression.cpp b/src/common/snippets/src/lowered/expression.cpp new file mode 100644 index 00000000000000..d69a90f7636e5a --- /dev/null +++ b/src/common/snippets/src/lowered/expression.cpp @@ -0,0 +1,133 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/expression.hpp" + +#include "snippets/itt.hpp" +#include "snippets/utils.hpp" + +#include "openvino/core/graph_util.hpp" +#include "openvino/core/type.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +size_t Expression::LOOP_NULL_ID = SIZE_MAX; + +Expression::Expression(const std::shared_ptr& n) + : m_source_node{n}, m_emitter{nullptr}, m_input_port_connectors{}, m_output_port_connectors{} { + m_input_port_descriptors.reserve(n->get_input_size()); + m_output_port_descriptors.reserve(n->get_output_size()); + for (const auto& input : n->inputs()) { + m_input_port_descriptors.push_back(PortDescriptorUtils::get_port_descriptor_ptr(input)); + } + for (const auto& output : n->outputs()) { + m_output_port_descriptors.push_back(PortDescriptorUtils::get_port_descriptor_ptr(output)); + } +} + +const PortConnectorPtr& Expression::get_input_port_connector(size_t i) const { + OPENVINO_ASSERT(i < m_input_port_connectors.size(), "Failed to get input port connector: target input port must be less than input count!"); + return m_input_port_connectors[i]; +} +const PortConnectorPtr& Expression::get_output_port_connector(size_t i) const { + OPENVINO_ASSERT(i < m_output_port_connectors.size(), "Failed to get output port connector: target output port must be less than output count!"); + return m_output_port_connectors[i]; +} + +const PortDescriptorPtr& Expression::get_input_port_descriptor(size_t i) const { + OPENVINO_ASSERT(i < m_input_port_descriptors.size(), "Failed to get input port descriptor: target input port must be less than input count!"); + return m_input_port_descriptors[i]; +} +const PortDescriptorPtr& Expression::get_output_port_descriptor(size_t i) const { + OPENVINO_ASSERT(i < m_output_port_descriptors.size(), "Failed to get output port descriptor: target output port must be less than output count!"); + return m_output_port_descriptors[i]; +} + +std::shared_ptr Expression::get_node() const { + if (!m_source_node) + OPENVINO_THROW("An attempt to get uninitialized node from lowered expression"); + return m_source_node; +} + +std::shared_ptr Expression::get_emitter() const { + return m_emitter; +} + +RegInfo Expression::get_reg_info() const { + RegInfo reg_info; + reg_info.first.reserve(m_input_port_descriptors.size()); + reg_info.second.reserve(m_output_port_descriptors.size()); + for (const auto& port : m_input_port_descriptors) + reg_info.first.push_back(port->get_reg()); + for (const auto& port : m_output_port_descriptors) + reg_info.second.push_back(port->get_reg()); + return reg_info; +} + +void Expression::set_reg_info(RegInfo rinfo) { + const auto& in = rinfo.first; + const auto& out = rinfo.second; + OPENVINO_ASSERT(m_input_port_descriptors.size() == in.size(), "Incorrect count of input physical registers"); + OPENVINO_ASSERT(m_output_port_descriptors.size() == out.size(), "Incorrect count of output physical registers"); + for (size_t i = 0; i < m_input_port_descriptors.size(); ++i) { + m_input_port_descriptors[i]->set_reg(in[i]); + } + for (size_t i = 0; i < m_output_port_descriptors.size(); ++i) { + m_output_port_descriptors[i]->set_reg(out[i]); + } +} + +void Expression::init_emitter(const std::shared_ptr& target) { + m_emitter = target->get(m_source_node->get_type_info())(m_source_node); +} + +void Expression::validate() const { + OPENVINO_ASSERT(m_input_port_descriptors.size() == m_input_port_connectors.size(), + "The count of input ports and input port connectors must be equal"); + OPENVINO_ASSERT(m_output_port_descriptors.size() == m_output_port_connectors.size(), + "The count of output ports and output port connectors must be equal"); + OPENVINO_ASSERT(m_source_node != nullptr, + "The expression has null source node"); +} + +void Expression::replace_input(size_t port, PortConnectorPtr to) { + OPENVINO_ASSERT(port < m_input_port_connectors.size(), "Failed to replace: target input port must be less than input count!"); + m_input_port_connectors[port] = std::move(to); +} + +void Expression::set_loop_id(size_t id, size_t idx) { + if (id != LOOP_NULL_ID) { + OPENVINO_ASSERT((std::find(m_loop_ids.begin(), m_loop_ids.end(), id) == m_loop_ids.end()), + "Expression cannot have several the same Loops"); + } + if (m_loop_ids.size() <= idx) { + m_loop_ids.resize(idx + 1, LOOP_NULL_ID); + } + m_loop_ids[idx] = id; +} + +void Expression::remove_loop_id(size_t id) { + auto it = std::find(m_loop_ids.begin(), m_loop_ids.end(), id); + OPENVINO_ASSERT(it == m_loop_ids.end(), "Expression doesn't have the Loop with ID " + std::to_string(id)); + *it = Expression::LOOP_NULL_ID; +} + +ExpressionPort Expression::get_input_port(size_t i) { + return ExpressionPort(this->shared_from_this(), ExpressionPort::Type::Input, i); +} + +ExpressionPort Expression::get_output_port(size_t i) { + return ExpressionPort(this->shared_from_this(), ExpressionPort::Type::Output, i); +} + +IOExpression::IOExpression(const std::shared_ptr& par, int64_t index) + : Expression(par), m_index(index), m_type{io_type::INPUT} {} +IOExpression::IOExpression(const std::shared_ptr& res, int64_t index) + : Expression(res), m_index(index), m_type{io_type::OUTPUT} {} + +}// namespace lowered +}// namespace snippets +}// namespace ov diff --git a/src/common/snippets/src/lowered/expression_factory.cpp b/src/common/snippets/src/lowered/expression_factory.cpp new file mode 100644 index 00000000000000..70303f1a879877 --- /dev/null +++ b/src/common/snippets/src/lowered/expression_factory.cpp @@ -0,0 +1,128 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/expression_factory.hpp" + +#include "snippets/snippets_isa.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +void LinearIR::ExpressionFactory::create_expression_inputs(const LinearIR& linear_ir, const ExpressionPtr& expr) { + OPENVINO_ASSERT(expr != nullptr, "Failed expression inputs creation: expression is null"); + const auto& node = expr->get_node(); + + expr->m_input_port_connectors.resize(node->get_input_size(), nullptr); + for (const auto& input : node->inputs()) { + const auto input_source = input.get_source_output(); + const auto in_index = input.get_index(); + const auto& parent_expr = linear_ir.get_expr_by_node(input_source.get_node_shared_ptr()); + const auto& port_connector = parent_expr->get_output_port_connector(input_source.get_index()); + port_connector->add_consumer(expr->get_input_port(in_index)); + expr->m_input_port_connectors[in_index] = port_connector; + } +} + +void LinearIR::ExpressionFactory::create_expression_outputs(const ExpressionPtr& expr) { + OPENVINO_ASSERT(expr != nullptr, "Failed expression outputs creation: expression is null"); + const auto& node = expr->get_node(); + + expr->m_output_port_connectors.resize(node->get_output_size(), nullptr); + for (const auto& output : node->outputs()) { + const auto out_index = output.get_index(); + const auto source = expr->get_output_port(out_index); + expr->m_output_port_connectors[out_index] = std::make_shared(source); + } +} + +// The method verifies of input port connectors to availability of the expression as consumer and add it if missed +void LinearIR::ExpressionFactory::init_expression_inputs(const ExpressionPtr& expr, const std::vector& inputs) { + for (size_t i = 0; i < inputs.size(); ++i) { + const auto& input = inputs[i]; + const auto consumers = input->get_consumers(); + const auto found = std::find_if(consumers.begin(), consumers.end(), + [&](const ExpressionPort& desc) { + return desc.get_index() == i && desc.get_expr() == expr; + }); + if (found == consumers.end()) { + input->add_consumer(expr->get_input_port(i)); + } + } + expr->m_input_port_connectors = inputs; +} + +ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr& par, + const LinearIR& linear_ir, const std::shared_ptr& model) { + // Note: ctor of shared_ptr isn't friend class for Expression -> we cannot use directly make_shared(args) + OPENVINO_ASSERT(model != nullptr, "To create IOExpression from Parameter there must be inited model!"); + auto expr = std::make_shared(IOExpression(par, model->get_parameter_index(par))); + create_expression_outputs(expr); + expr->validate(); + return expr; +} + +ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr& res, + const LinearIR& linear_ir, const std::shared_ptr& model) { + // Note: ctor of shared_ptr isn't friend class for Expression -> we cannot use directly make_shared(args) + OPENVINO_ASSERT(model != nullptr, "To create IOExpression from Result there must be inited model!"); + auto expr = std::make_shared(IOExpression(res, model->get_result_index(res))); + create_expression_inputs(linear_ir, expr); + // The Result node don't need output port (because of sense of the node). But each node in ngraph must have one output at least. + // The port descriptors are automatically created in constructor. We manually clean output ports. + expr->m_output_port_descriptors.clear(); + expr->validate(); + return expr; +} + +ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr& n, const LinearIR& linear_ir, + const std::shared_ptr& model) { + OPENVINO_ASSERT(!ov::is_type(n), "Default expression builder doesn't support LoopBegin and LoopEnd"); + // Note: ctor of shared_ptr isn't friend class for Expression + auto expr = std::make_shared(Expression(n)); + create_expression_inputs(linear_ir, expr); + create_expression_outputs(expr); + expr->validate(); + return expr; +} + +ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr& n, const std::vector& inputs) { + OPENVINO_ASSERT(inputs.empty(), "LoopBegin cannot have inputs"); + auto expr = std::make_shared(Expression(n)); + init_expression_inputs(expr, inputs); + create_expression_outputs(expr); + expr->validate(); + return expr; +} + +ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr& n, const std::vector& inputs) { + auto expr = std::make_shared(Expression(n)); + expr->m_input_port_descriptors.resize(inputs.size(), nullptr); + for (size_t i = 0; i < inputs.size() - 1; ++i) { + expr->m_input_port_descriptors[i] = std::make_shared(); + } + const auto& last_input = inputs.back()->get_source(); + OPENVINO_ASSERT(ov::is_type(last_input.get_expr()->get_node()), "LoopEnd expression expects LoopBegin on last input"); + expr->m_input_port_descriptors[inputs.size() - 1] = last_input.get_descriptor_ptr()->clone(); + init_expression_inputs(expr, inputs); + // The LoopEnd node don't need output port (because of sense of the node). But each node in ngraph must have one output at least. + // The port descriptors are automatically created in constructor. We manually clean output ports. + expr->m_output_port_descriptors.clear(); + expr->validate(); + return expr; +} + +ExpressionPtr LinearIR::ExpressionFactory::create(const std::shared_ptr& n, const std::vector& inputs) { + OPENVINO_ASSERT(!ov::is_type(n) && + !ov::is_type(n), + "Expression builder with inputs doesn't support Result and Parameter"); + auto expr = std::make_shared(Expression(n)); + init_expression_inputs(expr, inputs); + create_expression_outputs(expr); + expr->validate(); + return expr; +} +}// namespace lowered +}// namespace snippets +}// namespace ov diff --git a/src/common/snippets/src/lowered/expression_port.cpp b/src/common/snippets/src/lowered/expression_port.cpp new file mode 100644 index 00000000000000..abbab639a8c6e0 --- /dev/null +++ b/src/common/snippets/src/lowered/expression_port.cpp @@ -0,0 +1,57 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/expression_port.hpp" + +#include "snippets/utils.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { + +ExpressionPort::ExpressionPort(const std::shared_ptr& expr, Type type, size_t port) + : m_expr(expr), m_type(type), m_port_index(port) {} + +const PortDescriptorPtr& ExpressionPort::get_descriptor_ptr() const { + const auto& descs = m_type == Type::Input ? m_expr->m_input_port_descriptors + : m_expr->m_output_port_descriptors; + OPENVINO_ASSERT(m_port_index < descs.size(), "Incorrect index of port"); + return descs[m_port_index]; +} + +const std::shared_ptr& ExpressionPort::get_port_connector_ptr() const { + const auto& connectors = m_type == Type::Input ? m_expr->m_input_port_connectors + : m_expr->m_output_port_connectors; + OPENVINO_ASSERT(m_port_index < connectors.size(), "Incorrect index of port"); + return connectors[m_port_index]; +} + +std::set ExpressionPort::get_connected_ports() const { + if (ExpressionPort::m_type == Type::Input) { + return { m_expr->m_input_port_connectors[m_port_index]->get_source() }; + } + if (ExpressionPort::m_type == Type::Output) { + return m_expr->m_output_port_connectors[m_port_index]->get_consumers(); + } + OPENVINO_THROW("ExpressionPort supports only Input and Output types"); +} + +bool operator==(const ExpressionPort& lhs, const ExpressionPort& rhs) { + if (&lhs == &rhs) + return true; + OPENVINO_ASSERT(lhs.get_type() == rhs.get_type(), "Incorrect ExpressionPort comparison"); + return lhs.get_index() == rhs.get_index() && lhs.get_expr() == rhs.get_expr(); +} +bool operator!=(const ExpressionPort& lhs, const ExpressionPort& rhs) { + return !(lhs == rhs); +} +bool operator<(const ExpressionPort& lhs, const ExpressionPort& rhs) { + OPENVINO_ASSERT(lhs.get_type() == rhs.get_type(), "Incorrect ExpressionPort comparison"); + return (lhs.get_index() < rhs.get_index()) || (lhs.get_index() == rhs.get_index() && lhs.get_expr() < rhs.get_expr()); +} + +}// namespace lowered +}// namespace snippets +}// namespace ov diff --git a/src/common/snippets/src/lowered/linear_ir.cpp b/src/common/snippets/src/lowered/linear_ir.cpp new file mode 100644 index 00000000000000..a2b126c6f958fa --- /dev/null +++ b/src/common/snippets/src/lowered/linear_ir.cpp @@ -0,0 +1,263 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/linear_ir.hpp" + +#include "snippets/itt.hpp" + +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/lowered/expression_factory.hpp" +#include "snippets/op/serialization_node.hpp" +#include "snippets/utils.hpp" + +#include "openvino/core/graph_util.hpp" +#include "openvino/core/type.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +LinearIR::LinearIR(const std::shared_ptr& model, Config config) + : m_io_expressions{}, m_config{std::move(config)}, m_loop_manager(std::make_shared()) { + constExprIt last_param = m_expressions.end(); + for (const auto& n : get_ordered_ops(model)) { + constExprIt insertion_pos = m_expressions.end(); + const auto expr = create_expression(n, model); + + // Scalar should be on the Linear IR beginning after Parameters to have valid expression order after Loop passes. + // After these passes we must call pass MoveScalarToConsumer() to have a correct accuracy. + // For more details, please see the pass description + if (const auto& scalar = as_type_ptr(n)) { + insertion_pos = std::next(last_param); + } + + register_expression(expr, true); + const auto& it = m_expressions.insert(insertion_pos, expr); + + if (const auto io_expr = std::dynamic_pointer_cast(expr)) { + m_io_expressions.push_back(io_expr); + if (ov::is_type(n)) + last_param = it; + } + } +} + +ExpressionPtr LinearIR::create_expression(const std::shared_ptr& n, const std::shared_ptr& model) { + return ExpressionFactory::build(n, *this, model); +} + +ExpressionPtr LinearIR::create_expression(const std::shared_ptr& n, const std::vector& inputs) { + return ExpressionFactory::build(n, inputs); +} + +ov::NodeVector LinearIR::get_ordered_ops(const std::shared_ptr& m) { + if (!m->get_sinks().empty()) + OPENVINO_THROW("Linear IR is not supposed to work for model with sinks. Check your transformation pipeline."); + + // Note that an important difference between this impl and Model::get_ordered_ops is that Results and Parameters + // are added in REVERSE order, so they will be visited in DIRECT order compared to get_parameters() and get_results() + NodeVector nodes; + const auto& results = m->get_results(); + std::copy(results.rbegin(), results.rend(), std::back_inserter(nodes)); + const auto& params = m->get_parameters(); + std::copy(params.rbegin(), params.rend(), std::back_inserter(nodes)); + + return ov::topological_sort(nodes); +} + +void LinearIR::serialize(const std::string& xml, const std::string& bin) { + auto first_node = std::make_shared(element::f32, Shape{}); + first_node->set_friendly_name("Start"); + first_node->get_rt_info()["execTimeMcs"] = 0; + std::shared_ptr body_node = first_node; + for (const auto& expr : m_expressions) { + body_node = std::make_shared(body_node, expr); + } + auto last_node = std::make_shared(body_node); + last_node->set_friendly_name("End"); + const auto tmp_model = std::make_shared(ResultVector {last_node}, + ParameterVector {first_node}, + "Lowered_IR_Serialization"); + ov::pass::Serialize(xml, bin).run_on_model(tmp_model); +} + +LinearIR::container LinearIR::deep_copy_range(LinearIR::container::const_iterator begin, LinearIR::container::const_iterator end) { + auto deep_clone_ports = [](std::vector& ports) { + for (auto& port : ports) { port = port->clone(); } + }; + LinearIR::container result; + NodeVector original_nodes; + for (auto it = begin; it != end; it++) + original_nodes.push_back((*it)->get_node()); + ngraph::NodeMap node_map; + ngraph::clone_nodes(original_nodes, node_map); + for (auto it = begin; it != end; it++) { + // copy by value, so result shared_pointer point to new objects + Expression new_expr = **it; + new_expr.m_source_node = node_map[(*it)->get_node().get()]; + deep_clone_ports(new_expr.m_input_port_descriptors); + deep_clone_ports(new_expr.m_output_port_descriptors); + result.emplace_back(std::make_shared(new_expr)); + } + return result; +} + +void LinearIR::debug_print(bool tds_as_pointers) const { + auto print_rinfo = [](const RegInfo& rinfo) { + std::cerr << " : {"; + for (auto i : rinfo.first) + std::cerr << i << " "; + std::cerr << " => "; + for (auto i : rinfo.second) + std::cerr << i << " "; + std::cerr << "}"; + }; + std::map td2int; + int td_counter = 0; + int counter = 0; + for (const auto& expr : m_expressions) { + const auto& node = expr->get_node(); + std::cerr << counter++ << " : " << + node->get_friendly_name() << " : "; + if (tds_as_pointers) { + for (const auto& in : expr->m_input_port_connectors) { + if (td2int.count(in) == 0) + OPENVINO_THROW("Undefined input descriptor for op"); + std::cerr << td2int.at(in) << ", "; + } + std::cerr << "\b\b => "; + for (const auto& out : expr->m_output_port_connectors) { + if (td2int.count(out) == 0) + td2int.insert({out, td_counter++}); + std::cerr << td2int.at(out) << ", "; + } + } else { + for (const auto& port_desc : expr->m_input_port_descriptors) + std::cerr << port_desc << ", "; + std::cerr << "\b\b => "; + for (const auto& port_desc : expr->m_output_port_descriptors) + std::cerr << port_desc << ", "; + } + std::cerr << "\b\b"; + const auto& rinfo = expr->get_reg_info(); + if (!rinfo.first.empty() || !rinfo.second.empty()) + print_rinfo(expr->get_reg_info()); + std::cerr << "\n"; + } +} + +void LinearIR::init_emitters(const std::shared_ptr& target) { + for (auto& expr : m_expressions) { + if (!expr->get_emitter()) + expr->init_emitter(target); + } +} + +const ExpressionPtr& LinearIR::get_expr_by_node(const std::shared_ptr& n) const { + auto found = m_node2expression_map.find(n); + OPENVINO_ASSERT(found != m_node2expression_map.end(), "The node " + n->get_friendly_name() + " hasn't been found in Linear IR"); + return found->second; +} + +void LinearIR::replace_input(const std::set& consumers, const PortConnectorPtr& to) { + for (const auto& consumer_input : consumers) { + replace_input(consumer_input, to); + } +} + +void LinearIR::replace_input(const ExpressionPort& expr_port, const PortConnectorPtr& to) { + const auto port = expr_port.get_index(); + const auto& expr = expr_port.get_expr(); + + OPENVINO_ASSERT(expr_port.get_type() == ExpressionPort::Type::Input, "Failed to replace: target input port must have Input type"); + OPENVINO_ASSERT(expr_port.get_index() < expr->get_input_count(), "Failed to replace: target input port must be less than input count!"); + + const auto& from = expr->get_input_port_connector(port); + if (from == to) + return; + + if (!to->found_consumer(expr_port)) { + to->add_consumer(expr_port); + } + from->remove_consumer(expr_port); + expr->replace_input(port, to); +} + +void LinearIR::register_expression(const ExpressionPtr& expr, bool io_allowed) { + const auto& node = expr->get_node(); + if (!io_allowed && (is_type(node) || is_type(node))) + OPENVINO_THROW("LinearIR::insert can't be used to add Parameters or Results to IR"); + { + const auto& res = m_node2expression_map.insert({node, expr}); + if (!res.second) + OPENVINO_THROW("Duplicate node is detected in linear IR: " + std::string(node->get_friendly_name())); + } +} + +void LinearIR::unregister_expression(const ExpressionPtr& expr) { + for (size_t i = 0; i < expr->get_input_count(); ++i) { + const auto& input = expr->get_input_port_connector(i); + input->remove_consumer(expr->get_input_port(i)); + } + + m_node2expression_map.erase(expr->get_node()); +} + +LinearIR::exprIt LinearIR::insert(constExprIt pos, container::value_type&& value) { + register_expression(value); + return m_expressions.insert(pos, value); +} + +LinearIR::exprIt LinearIR::insert(constExprIt pos, const container::value_type& value) { + register_expression(value); + return m_expressions.insert(pos, value); +} + +LinearIR::exprIt LinearIR::insert(constExprIt pos, exprIt begin, exprIt end) { + constExprIt cbegin = begin; + constExprIt cend = end; + return insert(pos, cbegin, cend); +} + +LinearIR::exprIt LinearIR::insert(constExprIt pos, constExprIt begin, constExprIt end) { + for (auto b = begin; b != end; b++) + register_expression(*b); + return m_expressions.insert(pos, begin, end); +} + +LinearIR::exprIt LinearIR::insert(LinearIR::constExprIt pos, const NodeVector& nodes) { + auto ret = m_expressions.end(); + for (const auto& n : nodes) { + const auto& expr = create_expression(n); + register_expression(expr); + ret = m_expressions.insert(pos, expr); + } + // Need to return iterator to the first of the inserted values + return std::prev(ret, static_cast(nodes.size())); +} + +LinearIR::exprIt LinearIR::insert(LinearIR::constExprIt pos, const std::shared_ptr& n) { + const auto& expr = create_expression(n); + register_expression(expr); + return m_expressions.insert(pos, expr); +} + +LinearIR::exprIt LinearIR::erase(LinearIR::exprIt pos) { + unregister_expression(*pos); + return m_expressions.erase(pos); +} + +LinearIR::exprIt LinearIR::erase(LinearIR::constExprIt pos) { + unregister_expression(*pos); + return m_expressions.erase(pos); +} + +void LinearIR::move(LinearIR::constExprIt from, LinearIR::constExprIt to) { + // Instead of `insert()` + `erase()`, we use `splice()` for the same list + m_expressions.splice(to, m_expressions, from); +} + +}// namespace lowered +}// namespace snippets +}// namespace ov diff --git a/src/common/snippets/src/lowered/loop_manager.cpp b/src/common/snippets/src/lowered/loop_manager.cpp new file mode 100644 index 00000000000000..c2273a11982a24 --- /dev/null +++ b/src/common/snippets/src/lowered/loop_manager.cpp @@ -0,0 +1,198 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/loop_manager.hpp" + +#include "snippets/lowered/expression.hpp" +#include "snippets/utils.hpp" + +#include "openvino/core/graph_util.hpp" +#include "openvino/core/type.hpp" + +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +size_t LinearIR::LoopManager::add_loop_info(const LoopInfoPtr &loop) { + const auto index = next_id; + m_map[index] = loop; + next_id++; + return index; +} + +void LinearIR::LoopManager::remove_loop_info(size_t index) { + m_map.erase(index); +} + +using LoopInfoPtr = LinearIR::LoopManager::LoopInfoPtr; + +const std::map &LinearIR::LoopManager::get_map() const { + return m_map; +} + +LoopInfoPtr LinearIR::LoopManager::get_loop_info(size_t index) const { + const auto it = m_map.find(index); + OPENVINO_ASSERT(it != m_map.end(), "LoopInformation hasn't been found!"); + return it->second; +} + +void LinearIR::LoopManager::get_loop_bounds(const LinearIR &linear_ir, + size_t loop_id, + LinearIR::constExprIt &loop_begin_pos, + LinearIR::constExprIt &loop_end_pos) const { + const auto loop_info = get_loop_info(loop_id); + get_loop_bounds(linear_ir, loop_info->entry_exprs, loop_info->exit_exprs, loop_begin_pos, loop_end_pos, loop_id); +} + +void LinearIR::LoopManager::get_loop_bounds(const LinearIR &linear_ir, + const std::vector &entries, + const std::vector &exits, + LinearIR::constExprIt &loop_begin_pos, + LinearIR::constExprIt &loop_end_pos, + size_t loop_id) { + OPENVINO_ASSERT(!entries.empty(), "Loop must have entry points"); + OPENVINO_ASSERT(!exits.empty(), "Loop must have entry points"); + const auto& entry_expr = entries.front().get_expr(); + loop_begin_pos = std::find(linear_ir.begin(), linear_ir.end(), entry_expr); + OPENVINO_ASSERT(loop_begin_pos != linear_ir.end(), "Loop begin hasn't been found!"); + + // Some operations in Loop can be before first entry points: Scalars, VectorBuffer. + // We should iterate by them till the expr is in the corresponding Loop + auto prev_loop_ids = (*std::prev(loop_begin_pos))->get_loop_ids(); + while (std::find(prev_loop_ids.begin(), prev_loop_ids.end(), loop_id) != prev_loop_ids.end()) { + loop_begin_pos = std::prev(loop_begin_pos); + prev_loop_ids = (*std::prev(loop_begin_pos))->get_loop_ids(); + } + + // At the moment all Loops must have exit points + const auto& exit_expr = exits.back().get_expr(); + loop_end_pos = std::next(std::find(loop_begin_pos, linear_ir.end(), exit_expr)); + OPENVINO_ASSERT(loop_end_pos != linear_ir.end(), "Loop end hasn't been found!"); +} + +void LinearIR::LoopManager::get_io_loop_ports(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + std::vector &entries, + std::vector &exits) { + entries.clear(); + exits.clear(); + for (auto expr_it = loop_begin_pos; expr_it != loop_end_pos; ++expr_it) { + const auto& expr = *expr_it; + for (size_t i = 0; i < expr->get_input_count(); ++i) { + const auto in_port = expr->get_input_port(i); + const auto& parent_expr = in_port.get_connected_ports().begin()->get_expr(); + if (!ov::is_type(parent_expr->get_node()) && + std::find(loop_begin_pos, expr_it, parent_expr) == expr_it) { + entries.push_back(in_port); + } + } + for (size_t i = 0; i < expr->get_output_count(); ++i) { + const auto out_port = expr->get_output_port(i); + const auto consumer_ports = out_port.get_connected_ports(); + for (const auto& consumer : consumer_ports) { + const auto& consumer_expr = consumer.get_expr(); + if (std::find(expr_it, loop_end_pos, consumer_expr) == loop_end_pos) { + exits.push_back(out_port); + break; + } + } + } + } +} + +void LinearIR::LoopManager::mark_loop(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + size_t loop_depth, size_t vector_size) { + std::vector loop_entry_points, loop_exit_points; + LoopManager::get_io_loop_ports(loop_begin_pos, loop_end_pos, loop_entry_points, loop_exit_points); + + auto broadcast = [](std::vector& lhs, const std::vector& rhs, size_t index) -> void { + if (rhs == lhs) + return; + const auto lhs_size = lhs.size(); + const auto rhs_size = rhs.size(); + const auto size = std::max(lhs_size, rhs_size); + lhs.resize(size, 1); + OPENVINO_ASSERT(index < size, "Incorrect index for broadcasting"); + const auto lhs_value = index < lhs_size ? *(lhs.crbegin() + index) : 1; + const auto rhs_value = index < rhs_size ? *(rhs.crbegin() + index) : 1; + OPENVINO_ASSERT(lhs_value == rhs_value || lhs_value == 1 || rhs_value == 1, + "Output shapes of Loop must be broadcastable!"); + *(lhs.rbegin() + index) = std::max(lhs_value, rhs_value); + }; + + auto is_outside_loop = [](const std::vector& subtensor) { + return std::all_of(subtensor.begin(), subtensor.end(), [](size_t lhs) { return lhs == PortDescriptor::ServiceDimensions::FULL_DIM; }); + }; + + std::vector loop_subtensor; + std::vector loop_tensor(loop_depth, 1); + for (const auto& exit_point : loop_exit_points) { + const auto& desc = exit_point.get_descriptor_ptr(); + const auto shape = utils::get_reordered_planar_shape(ov::PartialShape(desc->get_shape()), desc->get_layout()).get_shape(); + auto subtensor = desc->get_subtensor(); + if (subtensor.empty()) { + subtensor.resize(loop_depth, 1); + subtensor[subtensor.size() - 1] = vector_size; + } + + const size_t resizing_value = is_outside_loop(subtensor) ? PortDescriptor::ServiceDimensions::FULL_DIM : 1; + while (subtensor.size() < loop_depth) + subtensor.insert(subtensor.begin(), resizing_value); + if (loop_subtensor.empty()) + loop_subtensor = subtensor; + + OPENVINO_ASSERT(std::equal(loop_subtensor.crbegin(), loop_subtensor.crbegin() + loop_depth, subtensor.crbegin()), + "Incorrect scheduling parameters for loop"); + + for (size_t dim_idx = 0; dim_idx < loop_depth; ++dim_idx) { + if (*(subtensor.rbegin() + dim_idx) != PortDescriptor::ServiceDimensions::FULL_DIM) { + broadcast(loop_tensor, shape, dim_idx); + } + } + } + + for (size_t dim_idx = 0; dim_idx < loop_depth; ++dim_idx) { + if (*(loop_subtensor.rbegin() + dim_idx) == PortDescriptor::ServiceDimensions::FULL_DIM) { + exprs_marking(loop_begin_pos, loop_end_pos, Expression::LOOP_NULL_ID, loop_depth - dim_idx - 1); + continue; + } + + OPENVINO_ASSERT(dim_idx < loop_tensor.size(), "Incorrect indexes of Loop for markup"); + const auto work_amount = + loop_tensor.size() > dim_idx ? *(loop_tensor.rbegin() + dim_idx) + : 0; + const auto work_amount_increment = + loop_subtensor.size() > dim_idx ? *(loop_subtensor.rbegin() + dim_idx) + : (dim_idx == 0 ? vector_size : 1); + mark_loop(loop_begin_pos, loop_end_pos, loop_depth - dim_idx - 1, work_amount, + work_amount_increment, loop_entry_points, loop_exit_points); + } +} + +void LinearIR::LoopManager::mark_loop(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + size_t idx, + size_t work_amount, + size_t work_amount_increment, + const std::vector &entries, + const std::vector &exits) { + const auto loop_info = std::make_shared(work_amount, work_amount_increment, entries, exits); + const auto loop_id = this->add_loop_info(loop_info); + exprs_marking(loop_begin_pos, loop_end_pos, loop_id, idx); +} + +void LinearIR::LoopManager::exprs_marking(LinearIR::constExprIt loop_begin_pos, + LinearIR::constExprIt loop_end_pos, + size_t loop_id, size_t idx) { + for (auto expr_it = loop_begin_pos; expr_it != loop_end_pos; ++expr_it) { + expr_it->get()->set_loop_id(loop_id, idx); + } +} + +}// namespace lowered +}// namespace snippets +}// namespace ov diff --git a/src/common/snippets/src/lowered/pass/allocate_buffers.cpp b/src/common/snippets/src/lowered/pass/allocate_buffers.cpp new file mode 100644 index 00000000000000..0c5dd32ae70281 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/allocate_buffers.cpp @@ -0,0 +1,108 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/allocate_buffers.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +void AllocateBuffers::propagate_offset(const LinearIR& linear_ir, const ExpressionPtr& buffer_expr, const size_t offset) { + // If Buffer has offset We set this offset in the connected MemoryAccess ops + // to correctly read and write data because all Buffers has the common data pointer on buffer scratchpad + + const auto buffer = ov::as_type_ptr(buffer_expr->get_node()); + + // Propagate to up: in Store. Buffer can have only one Store + { + if (buffer->is_intermediate_memory()) { + OPENVINO_ASSERT(buffer_expr->get_input_port_connectors().size() == 1, "Buffer with intermediate memory must have one parent"); + const auto& parent_output = buffer_expr->get_input_port_connector(0)->get_source(); + const auto& parent_expr = parent_output.get_expr(); + const auto port = parent_output.get_index(); + const auto& parent_node = parent_expr->get_node(); + auto memory_access = ov::as_type_ptr(parent_node); + if (memory_access && memory_access->is_memory_access_output_port(port)) { + memory_access->set_output_offset(offset, port); + } else { + OPENVINO_THROW( + "Buffer::set_offset() was called when Buffer didn't have the corresponding MemoryAccess op for offset propagation"); + } + } + } + // Propagate to down: in Load. Buffer can have several Load + const auto& buffer_out = buffer_expr->get_output_port_connector(0); + for (const auto& child_expr_input : buffer_out->get_consumers()) { + const auto& child_expr = child_expr_input.get_expr(); + const auto port = child_expr_input.get_index(); + const auto& child_node = child_expr->get_node(); + auto memory_access = ov::as_type_ptr(child_node); + if (memory_access && memory_access->is_memory_access_input_port(port)) { + memory_access->set_input_offset(offset, port); + } else if (ov::is_type(child_node)) { + // After Loop initialization, Buffer can be connected to LoopEnd - it's ok + continue; + } else { + OPENVINO_THROW( + "Buffer::set_offset() was called when Buffer didn't have the corresponding MemoryAccess op for offset propagation"); + } + } +} + + +bool AllocateBuffers::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::AllocateBuffers"); + + bool modified = false; + size_t offset = 0; + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto& expr = *expr_it; + if (auto buffer = as_type_ptr(expr->get_node())) { + const auto buffer_size = buffer->get_byte_size(); + // If it's the first buffer, offsets are zero => nothing to propagate, can continue + if (m_buffer_scratchpad_size == 0) { + m_buffer_scratchpad_size += buffer_size; + continue; + } + + if (buffer->is_intermediate_memory()) { + const auto& parent_expr = expr->get_input_port_connector(0)->get_source().get_expr(); + const auto& parent_node = parent_expr->get_node(); + // Full MemoryAccess ops need new memory. Previous logic is to check for parent isn't Loop + // TODO: It should be unified in MemoryManager with memory reuse in the near future + const auto ma = ov::as_type_ptr(parent_node); + if (ma && ma->is_full_memory_access_op()) { + offset = m_buffer_scratchpad_size; + buffer->set_offset(static_cast(offset)); + propagate_offset(linear_ir, *expr_it, offset); + m_buffer_scratchpad_size += buffer_size; + continue; + } + const auto current_allocated_memory_size = m_buffer_scratchpad_size - offset; + if (buffer_size > current_allocated_memory_size) { + m_buffer_scratchpad_size += (buffer_size - current_allocated_memory_size); + // Note: we don't update offset because we just add memory to needed size + } + propagate_offset(linear_ir, *expr_it, offset); + } else { + // Single Buffer without input should allocate new memory + offset = m_buffer_scratchpad_size; + buffer->set_offset(static_cast(offset)); + propagate_offset(linear_ir, *expr_it, offset); + m_buffer_scratchpad_size += buffer_size; + } + modified = true; + } + } + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/pass/assign_registers.cpp b/src/common/snippets/src/lowered/pass/assign_registers.cpp similarity index 57% rename from src/common/snippets/src/pass/assign_registers.cpp rename to src/common/snippets/src/lowered/pass/assign_registers.cpp index d9aa93a378c45c..293d80437ce1d1 100644 --- a/src/common/snippets/src/pass/assign_registers.cpp +++ b/src/common/snippets/src/lowered/pass/assign_registers.cpp @@ -1,120 +1,135 @@ -// Copyright (C) 2018-2023 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include -#include "snippets/pass/assign_registers.hpp" +#include "snippets/lowered/pass/assign_registers.hpp" + +#include "snippets/lowered/linear_ir.hpp" #include "snippets/snippets_isa.hpp" -#include +#include "snippets/itt.hpp" -#if defined(__clang__) -# pragma clang diagnostic push -# pragma clang diagnostic ignored "-Wunused-lambda-capture" -#endif +// This header is needed to avoid MSVC warning "C2039: 'inserter': is not a member of 'std'" +#include -namespace { -constexpr size_t reg_count = 16lu; -using opRegType = ngraph::snippets::Generator::opRegType; -} // namespace +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { -bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr& f) { - RUN_ON_MODEL_SCOPE(AssignRegisters); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::AssignRegisters") +bool AssignRegisters::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::AssignRegisters") using Reg = size_t; - using tensor = std::shared_ptr; - auto ops = f->get_ordered_ops(); + using tensor = PortConnectorPtr; + const auto& expressions = linear_ir.get_ops(); - std::vector>> typed_ops; - for (const auto& op : ops) { - typed_ops.emplace_back(std::make_pair(m_reg_type_mapper(op), op)); + std::vector> typed_ops; + NodeVector ops; + Reg num_parameters = 0; + Reg num_results = 0; + Reg num_expressions = 0; + for (auto& expr : expressions) { + auto op = expr->get_node(); + auto reg_type = m_reg_type_mapper(op); + typed_ops.emplace_back(reg_type, expr); + num_parameters += is_type(op); + num_results += is_type(op); + ops.push_back(op); + num_expressions++; } - size_t counter_vec = 0; size_t counter_gpr = 0; std::map regs_vec, regs_gpr; // Define a set of immune tensors that will be ignored by auto reg allocation => their reg allocation is done manually std::map manually_assigned_gprs, manually_assigned_vecs; const auto IS_MANUALLY_ALLOCATED_REG = SIZE_MAX; - const auto num_parameters = f->get_parameters().size(); - const auto num_results = f->get_results().size(); auto accumulator_reg = 0lu; - for (const auto& op : ops) { - if (const auto& param = ov::as_type_ptr(op)) { - manually_assigned_gprs[op->output(0).get_tensor_ptr()] = - static_cast(f->get_parameter_index(param)); - } else if (const auto& result = ov::as_type_ptr(op)) { - // here we use the fact that Result input & output tensors are identical by construction - manually_assigned_gprs[op->output(0).get_tensor_ptr()] = - static_cast(f->get_result_index(result) + num_parameters); - } else if (const auto buffer = ov::as_type_ptr(op)) { + for (const auto& expr : expressions) { + auto op = expr->get_node(); + if (const auto io_expr = std::dynamic_pointer_cast(expr)) { + if (io_expr->get_type() == IOExpression::io_type::INPUT) + manually_assigned_gprs[expr->get_output_port_connector(0)] = io_expr->get_index(); + else if (io_expr->get_type() == IOExpression::io_type::OUTPUT) + manually_assigned_gprs[expr->get_input_port_connector(0)] = num_parameters + io_expr->get_index(); + else + OPENVINO_THROW("Unsupported io_type detected"); + } else if (const auto& buffer = ov::as_type_ptr(op)) { + const auto buffer_id = buffer->get_id(); // All buffers have one common data pointer if (buffer->is_intermediate_memory()) { - manually_assigned_gprs[op->input(0).get_tensor_ptr()] = - static_cast(num_results + num_parameters); + manually_assigned_gprs[expr->get_input_port_connector(0)] = + static_cast(num_results + num_parameters + buffer_id); } - manually_assigned_gprs[op->output(0).get_tensor_ptr()] = - static_cast(num_results + num_parameters); + manually_assigned_gprs[expr->get_output_port_connector(0)] = + static_cast(num_results + num_parameters + buffer_id); } else if (ov::is_type(op) || ov::is_type(op)) { // Only in SoftmaxDecomposition ReduceMax and ReduceSum use HorizonMax/HorizonSum and VectorBuffer. // We should manually set the one vector register for VectorBuffer and Max/Sum output to simulate a accumulator // TODO [96351]: We should rewrite accumulator pattern using another way - const auto input = op->get_input_node_shared_ptr(0); // input - it's accumulator math op: Add or Max - for (size_t i = 0; i < input->get_input_size(); ++i) { - if (ov::is_type(input->get_input_node_shared_ptr(i))) { - manually_assigned_vecs[input->input(i).get_tensor_ptr()] = - static_cast(accumulator_reg); + const auto& input_tensor = expr->get_input_port_connector(0); + const auto& input_expr = input_tensor->get_source().get_expr(); + const auto& input_expr_input_tensors = input_expr->get_input_port_connectors(); + for (const auto& tensor : input_expr_input_tensors) { + if (ov::is_type(tensor->get_source().get_expr()->get_node())) { + manually_assigned_vecs[tensor] = static_cast(accumulator_reg); + } + } + const auto& output_tensor = expr->get_output_port_connector(0); + manually_assigned_vecs[input_tensor] = static_cast(accumulator_reg); + manually_assigned_vecs[output_tensor] = static_cast(accumulator_reg); + for (const auto& child_expr_input : output_tensor->get_consumers()) { + if (ov::is_type(child_expr_input.get_expr()->get_node())) { + manually_assigned_vecs[child_expr_input.get_expr()->get_output_port_connector(0)] = + static_cast(accumulator_reg); } } - manually_assigned_vecs[input->output(0).get_tensor_ptr()] = - static_cast(accumulator_reg); - manually_assigned_vecs[op->output(0).get_tensor_ptr()] = - static_cast(accumulator_reg); - - // If there is Broadcast, it should have the same register as Horizon op - // because it's a result of the accumulator as well - for (auto& out : op->output(0).get_target_inputs()) { - const auto child = out.get_node()->shared_from_this(); - if (ov::is_type(child)) { - manually_assigned_vecs[child->output(0).get_tensor_ptr()] = + // TODO: Fix via common pipeline using LoopEnd: + // All operations `outside loop` after Horizon ops should have the same register to avoid using it in the next Loop + const auto current_loops_ids = expr->get_loop_ids(); + auto next_expr = output_tensor->get_consumers().begin()->get_expr(); + while (next_expr->get_loop_ids() == current_loops_ids) { + manually_assigned_vecs[next_expr->get_output_port_connector(0)] = static_cast(accumulator_reg); - } + next_expr = next_expr->get_output_port_connector(0)->get_consumers().begin()->get_expr(); } + accumulator_reg++; } } - auto enumerate_out_tensors = [IS_MANUALLY_ALLOCATED_REG] (const std::shared_ptr& op, - decltype(regs_vec)& reg_map, - const std::map& manually_assigned_regs, - size_t& counter) { - for (const auto& output : op->outputs()) { - const auto& t = output.get_tensor_ptr(); + // Note: have to specify default capture "=" due to MSVC bug (it doesn't capture const expressions implicitly) + // Otherwise WIN build fails with "IS_MANUALLY_ALLOCATED_REG cannot be implicitly captured because no default capture mode has been specified" + // the same problem with all the other lambdas in this file + auto enumerate_out_tensors = [=] (const ExpressionPtr& expr, + decltype(regs_vec)& reg_map, + const std::map& manually_assigned_regs, + size_t& counter) { + for (const auto& out_tensor : expr->get_output_port_connectors()) { // Note that some ops might have identical input&output tensors (Result and Tile* for ex.) // so we have to check that the tensor has not been enumerated already - if (reg_map.count(t) == 0) { - reg_map[t] = manually_assigned_regs.count(t) == 0 ? counter++ : IS_MANUALLY_ALLOCATED_REG; + if (reg_map.count(out_tensor) == 0) { + reg_map[out_tensor] = manually_assigned_regs.count(out_tensor) == 0 ? counter++ : IS_MANUALLY_ALLOCATED_REG; } } }; for (const auto& t_op : typed_ops) { switch (t_op.first) { - case opRegType::vec2vec: - case opRegType::gpr2vec: + case Generator::opRegType::vec2vec: + case Generator::opRegType::gpr2vec: enumerate_out_tensors(t_op.second, regs_vec, manually_assigned_vecs, counter_vec); break; - case opRegType::gpr2gpr: - case opRegType::vec2gpr: + case Generator::opRegType::gpr2gpr: + case Generator::opRegType::vec2gpr: enumerate_out_tensors(t_op.second, regs_gpr, manually_assigned_gprs, counter_gpr); break; } } // todo: make one for gpr and one for vector - std::vector> used_gpr(ops.size(), std::set()); // used = used as an input - std::vector> defined_gpr(ops.size(), std::set()); // defined = used as output - std::vector> used_vec(ops.size(), std::set()); - std::vector> defined_vec(ops.size(), std::set()); + std::vector> used_gpr(num_expressions, std::set()); // used = used as an input + std::vector> defined_gpr(num_expressions, std::set()); // defined = used as output + std::vector> used_vec(num_expressions, std::set()); + std::vector> defined_vec(num_expressions, std::set()); - auto tensor2reg = [IS_MANUALLY_ALLOCATED_REG] (const std::vector& tensors, const std::map& reg_map) { + auto tensor2reg = [=] (const std::vector& tensors, const std::map& reg_map) { std::set result; for (const auto& t : tensors) { if (reg_map.count(t) == 0) @@ -128,25 +143,24 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr for (size_t i = 0; i < typed_ops.size(); i++) { const auto& t_op = typed_ops[i]; std::vector used_tensors, defined_tensors; - for (const auto& in : t_op.second->inputs()) { - used_tensors.push_back(in.get_tensor_ptr()); - } - for (const auto& out : t_op.second->outputs()) - defined_tensors.push_back(out.get_tensor_ptr()); + for (const auto& in : t_op.second->get_input_port_connectors()) + used_tensors.push_back(in); + for (const auto& out : t_op.second->get_output_port_connectors()) + defined_tensors.push_back(out); switch (t_op.first) { - case opRegType::vec2vec: + case Generator::opRegType::vec2vec: used_vec[i] = tensor2reg(used_tensors, regs_vec); defined_vec[i] = tensor2reg(defined_tensors, regs_vec); break; - case opRegType::gpr2gpr: + case Generator::opRegType::gpr2gpr: used_gpr[i] = tensor2reg(used_tensors, regs_gpr); defined_gpr[i] = tensor2reg(defined_tensors, regs_gpr); break; - case opRegType::gpr2vec: + case Generator::opRegType::gpr2vec: used_gpr[i] = tensor2reg(used_tensors, regs_gpr); defined_vec[i] = tensor2reg(defined_tensors, regs_vec); break; - case opRegType::vec2gpr: + case Generator::opRegType::vec2gpr: used_vec[i] = tensor2reg(used_tensors, regs_vec); defined_gpr[i] = tensor2reg(defined_tensors, regs_gpr); break; @@ -174,19 +188,28 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr std::inserter(life_in_vec[n], life_in_vec[n].begin())); } for (size_t n = 0; n < typed_ops.size(); n++) { - auto op = typed_ops[n].second; - for (const auto& out : op->outputs()) { - for (const auto& port : out.get_target_inputs()) { - size_t k = std::find(ops.begin(), ops.end(), port.get_node()->shared_from_this()) - ops.begin(); - if (k == ops.size()) + const auto& expr = typed_ops[n].second; + if (is_type(expr->get_node()) || is_type(expr->get_node())) + continue; + for (const auto& out : expr->get_output_port_connectors()) { + for (const auto& child_expr_input : out->get_consumers()) { + const auto& child_expr = child_expr_input.get_expr(); + auto child_it = linear_ir.begin(); + std::advance(child_it, n); + size_t k = n; + while (child_it != linear_ir.end() && *child_it != child_expr) { + child_it++; + k++; + } + if (k == typed_ops.size()) OPENVINO_THROW("assign registers can't find target op in the body"); switch (typed_ops[k].first) { - case opRegType::vec2vec: - case opRegType::vec2gpr: + case Generator::opRegType::vec2vec: + case Generator::opRegType::vec2gpr: life_out_vec[n].insert(life_in_vec[k].begin(), life_in_vec[k].end()); break; - case opRegType::gpr2gpr: - case opRegType::gpr2vec: + case Generator::opRegType::gpr2gpr: + case Generator::opRegType::gpr2vec: life_out_gpr[n].insert(life_in_gpr[k].begin(), life_in_gpr[k].end()); break; } @@ -281,8 +304,7 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr std::map assigned_regs(std::move(manually_assigned_gprs)); assigned_regs.insert(manually_assigned_vecs.begin(), manually_assigned_vecs.end()); - auto register_assigned_regs = [IS_MANUALLY_ALLOCATED_REG, &assigned_regs](const std::map& unique_regs, - const std::map& unique2reused) { + auto register_assigned_regs = [=, &assigned_regs](const std::map& unique_regs, const std::map& unique2reused) { for (const auto& reg : unique_regs) { if (reg.second == IS_MANUALLY_ALLOCATED_REG) continue; @@ -294,16 +316,22 @@ bool ngraph::snippets::pass::AssignRegisters::run_on_model(const std::shared_ptr register_assigned_regs(regs_vec, unique2reused_map_vec); register_assigned_regs(regs_gpr, unique2reused_map_gpr); - for (const auto& t_op : typed_ops) { - for (const auto& out : t_op.second->outputs()) { - const auto& t = out.get_tensor_ptr(); - auto& rt = t->get_rt_info(); - rt["reginfo"] = static_cast(assigned_regs[t]); + for (auto& t_op : typed_ops) { + RegInfo rinfo; + const auto& expr = t_op.second; + for (const auto& in : expr->get_input_port_connectors()) { + rinfo.first.push_back(assigned_regs[in]); } + for (const auto& out : expr->get_output_port_connectors()) { + rinfo.second.push_back(assigned_regs[out]); + } + t_op.second->set_reg_info(rinfo); } return false; } -#if defined(__clang__) -# pragma clang diagnostic pop -#endif +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov + diff --git a/src/common/snippets/src/lowered/pass/clean_repeated_ptr_shifts.cpp b/src/common/snippets/src/lowered/pass/clean_repeated_ptr_shifts.cpp new file mode 100644 index 00000000000000..644a5dd1509f7f --- /dev/null +++ b/src/common/snippets/src/lowered/pass/clean_repeated_ptr_shifts.cpp @@ -0,0 +1,109 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/clean_repeated_ptr_shifts.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +bool CleanRepeatedDataPointerShifts::reuse_increments(const LinearIR& linear_ir, const ExpressionPtr& loop_end_expr) { + const auto loop_end = ov::as_type_ptr(loop_end_expr->get_node()); + if (!loop_end) + return false; + + const auto loop_connectors = loop_end_expr->get_input_port_connectors(); + const auto input_count = loop_end->get_input_num(); + const auto output_count = loop_end->get_output_num(); + + std::set resetting_data_indexes; + std::set buffers_ids; + // We count expressions only on inputs of Loop because we can only read from the same data but not write to the same data. + // Parameter + // | | + // Load_0 Load_1 + std::set read_data_exprs; + for (size_t i = 0; i < input_count; ++i) { + const auto& parent_output = loop_connectors[i]->get_source().get_expr(); + if (const auto buffer = ov::as_type_ptr(parent_output->get_node())) { + // If Buffer is missed in set, Just save - it's first meeting + if (buffers_ids.count(buffer->get_id()) == 0) { + buffers_ids.insert(buffer->get_id()); + } else { + // The Buffer with the same ID is in set - need to add this Buffer idx to set of Buffers for resetting + resetting_data_indexes.insert(i); + } + } else { + // Remember the current expression if missed + if (read_data_exprs.count(parent_output) == 0) { + read_data_exprs.insert(parent_output); + } else { + // Otherwise we have several Load-semantic expressions which read from the same data. + // Have to zero ptr increments and finalization offsets for all expression except one. + resetting_data_indexes.insert(i); + } + } + } + for (size_t i = 0; i < output_count; ++i) { + const auto consumer_inputs = loop_connectors[input_count + i]->get_consumers(); + size_t buffer_count = 0; + size_t loop_count = 0; + for (const auto& consumer_input : consumer_inputs) { + const auto& child_node = consumer_input.get_expr()->get_node(); + if (const auto buffer = ov::as_type_ptr(child_node)) { + buffer_count++; + // If Buffer is missed in set, Just save - it's first meeting + if (buffers_ids.count(buffer->get_id()) == 0) { + buffers_ids.insert(buffer->get_id()); + } else { + // The Buffer with the same ID is in set - need to add this Buffer idx to set of Buffers for resetting + resetting_data_indexes.insert(input_count + i); + } + } else if (ov::is_type(child_node)) { + loop_count++; + } + } + if (buffer_count > 0) { + OPENVINO_ASSERT((buffer_count == 1) && (buffer_count + loop_count == consumer_inputs.size()), + "Loop output must have not more than 1 Buffer"); + } + } + + if (resetting_data_indexes.empty()) + return false; + + auto new_ptr_increments = loop_end->get_ptr_increments(); + auto new_finalization_offsets = loop_end->get_finalization_offsets(); + for (auto idx_to_drop : resetting_data_indexes) { + new_ptr_increments[idx_to_drop] = 0; + new_finalization_offsets[idx_to_drop] = 0; + } + loop_end->set_ptr_increments(new_ptr_increments); + loop_end->set_finalization_offsets(new_finalization_offsets); + return true; +} + +bool CleanRepeatedDataPointerShifts::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::CleanRepeatedDataPointerShifts") + bool modified = false; + + for (const auto& expr : linear_ir) { + const auto& node = expr->get_node(); + if (ov::is_type(node)) { + modified |= reuse_increments(linear_ir, expr); + } + } + + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/cleanup_loop_offsets.cpp b/src/common/snippets/src/lowered/pass/cleanup_loop_offsets.cpp new file mode 100644 index 00000000000000..17822dc67ad868 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/cleanup_loop_offsets.cpp @@ -0,0 +1,66 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/cleanup_loop_offsets.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +bool CleanupLoopOffsets::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::CleanupLoopOffsets") + if (linear_ir.empty()) + return false; + bool is_modified = false; + // Note: it doesn't make sense to check the last expression - it must always be Result + const auto before_last = std::prev(linear_ir.end()); + for (auto expr_it = linear_ir.begin(); expr_it != before_last; expr_it++) { + const auto& node = expr_it->get()->get_node(); + if (auto loop_end = as_type_ptr(node)) { + auto next_expr_it = std::next(expr_it); + const auto& next_node = next_expr_it->get()->get_node(); + // Note: Finalization offsets before the Result can be safely disregarded + // TODO: Need verify that Buffers on the inputs doesn't have other consumers (other Loops) + // and this Loop doesn't have Buffer on other outputs. + if (is_type(next_node)) { + const auto& fin_offsets = loop_end->get_finalization_offsets(); + loop_end->set_finalization_offsets(std::vector(fin_offsets.size(), 0)); + is_modified = true; + } + if (auto outer_loop_end = as_type_ptr(next_node)) { + auto fin_offsets = loop_end->get_finalization_offsets(); + std::unordered_map per_port_connector_offset; + const auto& loop_inputs = expr_it->get()->get_input_port_connectors(); + for (size_t i = 0; i < fin_offsets.size(); i++) + per_port_connector_offset[loop_inputs[i]] = i; + + auto outer_ptr_increments = outer_loop_end->get_ptr_increments(); + const auto& outer_loop_inputs = next_expr_it->get()->get_input_port_connectors(); + for (size_t i = 0; i < outer_ptr_increments.size(); i++) { + const auto& managed_connector = outer_loop_inputs[i]; + const auto& found = per_port_connector_offset.find(managed_connector); + if (found != per_port_connector_offset.end()) { + outer_ptr_increments[i] += fin_offsets[found->second]; + fin_offsets[found->second] = 0; + is_modified = true; + } + } + outer_loop_end->set_ptr_increments(outer_ptr_increments); + loop_end->set_finalization_offsets(fin_offsets); + } + } + } + return is_modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov + diff --git a/src/common/snippets/src/lowered/pass/fuse_loops.cpp b/src/common/snippets/src/lowered/pass/fuse_loops.cpp new file mode 100644 index 00000000000000..a7d3f6260bec2d --- /dev/null +++ b/src/common/snippets/src/lowered/pass/fuse_loops.cpp @@ -0,0 +1,343 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/fuse_loops.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +using LoopManager = LinearIR::LoopManager; +using LoopInfoPtr = LoopManager::LoopInfoPtr; + +FuseLoops::FuseLoops() : Pass() {} + +bool FuseLoops::can_be_fused(const LoopInfoPtr& loop_current, const LoopInfoPtr& loop_target) { + auto current_work_amount = loop_current->work_amount; + auto current_increment = loop_current->increment; + auto target_work_amount = loop_target->work_amount; + auto target_increment = loop_target->increment; + // Loop fusion is supported only if Loops have equal increments and the equal/broadcastable work amounts. + // Note: For example, Broadcastable work amounts are possible in the following case: + // Relu_0 [16x1] Relu_1 [16x128] + // \ / + // Add [16x128] + // Because of expression order in linear IR and work of MarkLoop algorithm, there are 2 Inner Loops: + // - Relu_0 with work amount `1` and increment `vector size` + // - Relu_1 and Add with work amount `128` and increment `vector size` + // We can fuse them into one Loop with work amount `128` and increment `vector size` + const auto supported_work_amount = current_work_amount == target_work_amount || current_work_amount == 1 || target_work_amount == 1; + const auto supported_increment = current_increment == target_increment; + return supported_work_amount && supported_increment; +} + +void FuseLoops::fuse_points(std::vector& exit_points, std::vector& entry_points, + LinearIR::constExprIt loop_begin_pos, LinearIR::constExprIt loop_end_pos) { + std::vector new_exit_points; + for (const auto& exit_point : exit_points) { + const auto consumers_inputs = exit_point.get_connected_ports(); + + std::set mapped_entry_points; + std::set outside_consumers; + for (const auto& consumer_input : consumers_inputs) { + const auto entry_point_it = std::find(entry_points.begin(), entry_points.end(), consumer_input); + if (entry_point_it != entry_points.end()) { + mapped_entry_points.insert(*entry_point_it); + continue; + } + + const auto& consumer = consumer_input.get_expr(); + const auto inside_it = std::find(loop_begin_pos, loop_end_pos, consumer); + if (inside_it == loop_end_pos) { + outside_consumers.insert(consumer); + } + } + + // Remove entry points which are mapped + auto last_point = entry_points.end(); + for (const auto& mapped_entry_point : mapped_entry_points) { + last_point = std::remove(entry_points.begin(), last_point, mapped_entry_point); + } + entry_points.resize(entry_points.size() - mapped_entry_points.size()); + + // Leave exit point if there are consumers outside after fusion + if (!outside_consumers.empty()) { + new_exit_points.push_back(exit_point); + } + } + + exit_points = new_exit_points; +} + +bool FuseLoops::fuse_upper_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_entry_point, + size_t current_loop_id, size_t target_loop_id, size_t dim_idx, + LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos) { + const auto& loop_current = loop_manager->get_loop_info(current_loop_id); + const auto& loop_target = loop_manager->get_loop_info(target_loop_id); + if (!can_be_fused(loop_current, loop_target)) + return false; + + LinearIR::constExprIt target_loop_begin_pos, target_loop_end_pos; + loop_manager->get_loop_bounds(linear_ir, target_loop_id, target_loop_begin_pos, target_loop_end_pos); + + // We can fuse Loop_up to Loop_down only in cases when other consumers of Loop_up are after Loop_down + // Because Loop_up should be explicitly moved before Loop_down in linear IR, and we must save control dependency + bool is_fusion_allowed = true; + for (size_t i = 0; i < loop_target->exit_exprs.size() && is_fusion_allowed; ++i) { + const auto target_exit_point = loop_target->exit_exprs[i]; + const auto consumer_inputs = target_exit_point.get_connected_ports(); + for (const auto& consumer_input : consumer_inputs) { + const auto& consumer = consumer_input.get_expr(); + if (ov::is_type(consumer->get_node()) || consumer == current_entry_point.get_expr()) + continue; + // The fusing is only valid if target Loop consumer (the Consumer is outside of target Loop) + // is after current Loop (after Loop_down). + is_fusion_allowed = consumer->get_loop_ids()[dim_idx] == target_loop_id || // is inside target Loop + consumer->get_loop_ids()[dim_idx] == current_loop_id || // is inside current Loop + std::find(current_loop_end_pos, linear_ir.cend(), consumer) != linear_ir.end(); // is after current Loop + } + } + + if (!is_fusion_allowed) + return false; + + // Update entry and exit points in current Loop information before moving till Loop iterators are valid + auto current_entry_points = loop_current->entry_exprs; + auto current_exit_points = loop_current->exit_exprs; + auto target_entry_points = loop_target->entry_exprs; + auto target_exit_points = loop_target->exit_exprs; + fuse_points(target_exit_points, current_entry_points, target_loop_begin_pos, target_loop_end_pos); + + const auto insertion_place = current_loop_begin_pos; + const auto is_move_needed = target_loop_end_pos != current_loop_begin_pos; + for (auto it = target_loop_begin_pos; it != target_loop_end_pos;) { + auto expr_it = it; + const auto& expr = *expr_it; + // After moving we will have `it` in new place in the current Loop, + // but for markup we need have the expression from the target Loop. + // Because of that we manually increment iterator before moving + it = std::next(it); + expr->set_loop_id(current_loop_id, dim_idx); + if (is_move_needed) + linear_ir.move(expr_it, insertion_place); + } + + // Update current Loop bounds: + current_loop_begin_pos = target_loop_begin_pos; + + // Update work_amount for Loop (increment is constant because increments must be the identical for fusion): + loop_current->work_amount = std::max(loop_current->work_amount, loop_target->work_amount); + + std::vector new_entries = target_entry_points; + new_entries.insert(new_entries.end(), current_entry_points.begin(), current_entry_points.end()); + std::vector new_exits = target_exit_points; + new_exits.insert(new_exits.end(), current_exit_points.begin(), current_exit_points.end()); + + loop_current->entry_exprs = new_entries; + loop_current->exit_exprs = new_exits; + + return true; +} + +bool FuseLoops::fuse_lower_into_current(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, const ExpressionPort& current_exit_point, + size_t current_loop_id, size_t target_loop_id, size_t dim_idx, + LinearIR::constExprIt& current_loop_begin_pos, LinearIR::constExprIt& current_loop_end_pos) { + const auto& loop_current = loop_manager->get_loop_info(current_loop_id); + const auto& loop_target = loop_manager->get_loop_info(target_loop_id); + if (!can_be_fused(loop_current, loop_target)) + return false; + + // We can fuse Loop_down to Loop_up only in cases when other parents of Loop_down are before Loop_up + // Because Loop_down should be explicitly moved after Loop_up in linear IR, and we must save control dependency + bool is_fusion_allowed = true; + for (size_t i = 0; i < loop_target->entry_exprs.size() && is_fusion_allowed; ++i) { + const auto target_entry_point = loop_target->entry_exprs[i]; + const auto parent_expr_output = *target_entry_point.get_connected_ports().begin(); + const auto& parent_expr = parent_expr_output.get_expr(); + if (ov::is_type(parent_expr->get_node()) || parent_expr == current_exit_point.get_expr()) + continue; + is_fusion_allowed = parent_expr->get_loop_ids()[dim_idx] == current_loop_id || // The parent expr is from the same current Loop + std::find(linear_ir.cbegin(), current_loop_begin_pos, parent_expr) != current_loop_begin_pos; // The parent is before current Loop + } + + if (!is_fusion_allowed) + return false; + + LinearIR::constExprIt target_loop_begin_pos, target_loop_end_pos; + loop_manager->get_loop_bounds(linear_ir, target_loop_id, target_loop_begin_pos, target_loop_end_pos); + + // Update entry and exit points in current Loop information before moving till Loop iterators are valid + auto current_entry_points = loop_current->entry_exprs; + auto current_exit_points = loop_current->exit_exprs; + auto target_entry_points = loop_target->entry_exprs; + auto target_exit_points = loop_target->exit_exprs; + fuse_points(current_exit_points, target_entry_points, current_loop_begin_pos, current_loop_end_pos); + + const auto insertion_place = current_loop_end_pos; + const auto is_move_needed = insertion_place != target_loop_begin_pos; + for (auto it = target_loop_begin_pos; it != target_loop_end_pos;) { + auto expr_it = it; + const auto& expr = *expr_it; + // After moving we will have `it` in new place in the current Loop, + // but for markup we need have the expression from the target Loop. + // Because of that we manually increment iterator before moving + it = std::next(it); + expr->set_loop_id(current_loop_id, dim_idx); + if (is_move_needed) + linear_ir.move(expr_it, insertion_place); + } + + // Update current Loop bounds: + if (!is_move_needed) + current_loop_end_pos = target_loop_end_pos; + + // Update work_amount for Loop (increment is constant because increments must be the identical for fusion): + loop_current->work_amount = std::max(loop_current->work_amount, loop_target->work_amount); + + std::vector& new_entries = current_entry_points; + new_entries.insert(new_entries.end(), target_entry_points.begin(), target_entry_points.end()); + std::vector& new_exits = current_exit_points; + new_exits.insert(new_exits.end(), target_exit_points.begin(), target_exit_points.end()); + + loop_current->entry_exprs = new_entries; + loop_current->exit_exprs = new_exits; + + return true; +} + +bool FuseLoops::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::FuseLoops") + if (linear_ir.empty()) + return false; + + const auto& loop_manager = linear_ir.get_loop_manager(); + std::vector prev_expr_loops; + + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto expr = *expr_it; + const auto& node = expr->get_node(); + if (ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node)) + continue; + + // Outer Loop ----> Inner Loop + const auto expr_loops = expr->get_loop_ids(); + const auto loop_depth = expr_loops.size(); + size_t diff_idx = 0; + if (prev_expr_loops.empty()) { + prev_expr_loops = expr_loops; + } else { + OPENVINO_ASSERT(loop_depth == prev_expr_loops.size(), + "Expressions in Linear IR must have the same count of Loop identifiers"); + for (; diff_idx < loop_depth; ++diff_idx) { + if (expr_loops[diff_idx] != prev_expr_loops[diff_idx]) + break; + } + } + + for (size_t dim_idx = diff_idx; dim_idx < loop_depth; ++dim_idx) { + const auto loop_id = expr_loops[dim_idx]; + if (loop_id == Expression::LOOP_NULL_ID) + continue; + + const auto loop_info = loop_manager->get_loop_info(loop_id); + LinearIR::constExprIt loop_begin_pos, loop_end_pos; + loop_manager->get_loop_bounds(linear_ir, loop_id, loop_begin_pos, loop_end_pos); + + // We fuse upper Loops into the current till we can do it. + // After that we fuse lower Loops into the current till we can do it. + // If we have fused on outputs we should verify possible fusions on inputs again because of new entry points + bool need_fusion_checks = true; + while (need_fusion_checks) { + // Loop_0 (Upper) | + // | => | + // Loop_1 (Current) Loop_0 + Loop_1 => new `Loop_1` + auto entry_points = loop_info->entry_exprs; + bool was_fusion_up = false; + for (size_t in_port = 0; in_port < entry_points.size() && !was_fusion_up; ++in_port) { + const auto entry_point = entry_points[in_port]; + const auto parent_expr_output = *entry_point.get_connected_ports().begin(); + const auto& parent_expr = parent_expr_output.get_expr(); + const auto parent = parent_expr->get_node(); + if (ov::is_type(parent) || + ov::is_type(parent) || + ov::is_type(parent)) { + continue; + } + const auto loop_ids_target = parent_expr->get_loop_ids(); + OPENVINO_ASSERT(loop_depth == loop_ids_target.size(), + "Expressions in Linear IR must have the same count of Loop identifiers"); + const auto loop_id_target = loop_ids_target[dim_idx]; + OPENVINO_ASSERT(loop_id != loop_id_target, + "Loops cannot have parents of entry points with the same identifier"); + if (loop_id_target == Expression::LOOP_NULL_ID) + continue; + + if (fuse_upper_into_current(linear_ir, loop_manager, entry_point, loop_id, loop_id_target, + dim_idx, loop_begin_pos, loop_end_pos)) { + was_fusion_up = true; + loop_manager->remove_loop_info(loop_id_target); + } + } + + // If Loops were fused and there are new entry_exprs, we should check for possible fusion again + if (was_fusion_up && entry_points != loop_info->entry_exprs) + continue; + + // Loop_0 (Current) Loop_0 + Loop_1 => new `Loop_0` + // | => | + // Loop_1 (Lower) | + auto exit_points = loop_info->exit_exprs; + bool was_fusion_down = false; + for (size_t out_port = 0; out_port < exit_points.size() && !was_fusion_down; ++out_port) { + const auto exit_point = exit_points[out_port]; + const auto consumer_exprs_inputs = exit_point.get_connected_ports(); + for (const auto& consumer_expr_input : consumer_exprs_inputs) { + const auto& consumer_expr = consumer_expr_input.get_expr(); + const auto consumer = consumer_expr->get_node(); + if (ov::is_type(consumer) || + ov::is_type(consumer)) { + continue; + } + + const auto loop_ids_target = consumer_expr->get_loop_ids(); + OPENVINO_ASSERT(loop_depth == loop_ids_target.size(), + "Expressions in Linear IR must have the same count of Loop identifiers"); + // The exit point of Loop can have several consumers where some of them can be in this Loop as well + // So we skip this consumer. + const auto loop_id_target = loop_ids_target[dim_idx]; + if (loop_id == loop_id_target || loop_id_target == Expression::LOOP_NULL_ID) + continue; + + if (fuse_lower_into_current(linear_ir, loop_manager, exit_point, loop_id, loop_id_target, + dim_idx, loop_begin_pos, loop_end_pos)) { + was_fusion_down = true; + loop_manager->remove_loop_info(loop_id_target); + // Need to check for possible fusion again because of new input expressions for Loop + break; + } + } + } + + // We iterated by each exit point and didn't fuse new Loops -> we can finish check for possible fusions on outputs. + if (!was_fusion_down) + need_fusion_checks = false; + } + } + } + + return true; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/identify_buffers.cpp b/src/common/snippets/src/lowered/pass/identify_buffers.cpp new file mode 100644 index 00000000000000..b2939a3cd7a31b --- /dev/null +++ b/src/common/snippets/src/lowered/pass/identify_buffers.cpp @@ -0,0 +1,181 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/identify_buffers.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +namespace { +inline size_t index(size_t col_num, size_t row, size_t col) { + return row * col_num + col; +} +} // namespace + +std::vector IdentifyBuffers::create_adjacency_matrix(const LinearIR& linear_ir, const BufferSet& buffers) const { + // The sync point to check for adjacency is Loop because only in Loop we increment pointers. + // So if some Buffers in the one Loop have conflict (cannot be inplace: the different ptr increment and data sizes) + // they are called as adjacent + const auto size = buffers.size(); + // TODO: Can we use triangular matrix? Need verify using tests + std::vector adj(size * size, false); + for (size_t i = 0; i < size; ++i) + adj[index(size, i, i)] = true; + + // < ptr_increment, finalization_offset > + using ShiftPtrParams = std::pair; + + auto get_buffer_idx = [&](const std::shared_ptr& buffer) { + const auto iter = std::find(buffers.cbegin(), buffers.cend(), buffer); + NGRAPH_CHECK(iter != buffers.cend(), "Buffer wasn't find in Buffer system of Subgraph"); + return std::distance(buffers.cbegin(), iter); + }; + + auto update_adj_matrix = [&](const std::pair, ShiftPtrParams>& buffer, + const std::pair, ShiftPtrParams>& neighbour_buffer) { + const bool equal_ptr_params_shifting = buffer.second == neighbour_buffer.second; + const bool equal_element_type_sizes = buffer.first->get_element_type().size() == neighbour_buffer.first->get_element_type().size(); + if (!equal_ptr_params_shifting || ((buffer.second.first != 0 || buffer.second.second != 0) && !equal_element_type_sizes)) { + const auto buffer_idx = get_buffer_idx(buffer.first); + const auto neighbour_idx = get_buffer_idx(neighbour_buffer.first); + adj[index(size, neighbour_idx, buffer_idx)] = adj[index(size, buffer_idx, neighbour_idx)] = true; + } + }; + + for (auto expr_it = linear_ir.cbegin(); expr_it != linear_ir.cend(); expr_it++) { + const auto &expr = *expr_it; + const auto& loop_end = ov::as_type_ptr(expr->get_node()); + if (!loop_end) + continue; + + const auto input_count = loop_end->get_input_num(); + const auto output_count = loop_end->get_output_num(); + + const auto ptr_increments = loop_end->get_ptr_increments(); + const auto finalization_offsets = loop_end->get_finalization_offsets(); + + // Buffer -> + std::map, ShiftPtrParams> buffer_neighbours; + + for (size_t i = 0; i < input_count; ++i) { + const auto& parent_output = expr->get_input_port_connector(i)->get_source().get_expr(); + if (const auto buffer = ov::as_type_ptr(parent_output->get_node())) { + buffer_neighbours[buffer] = { ptr_increments[i], finalization_offsets[i] }; + } + } + for (size_t i = 0; i < output_count; ++i) { + // The consumers of the corresponding Store ops + const auto index = input_count + i; + const auto consumer_inputs = expr->get_input_port_connector(index)->get_consumers(); + size_t buffer_count = 0; + size_t loop_count = 0; + for (const auto& consumer_input : consumer_inputs) { + const auto& child_node = consumer_input.get_expr()->get_node(); + if (const auto buffer = ov::as_type_ptr(child_node)) { + buffer_neighbours[buffer] = { ptr_increments[index], finalization_offsets[index] }; + } else if (ov::is_type(child_node)) { + loop_count++; + } + } + if (buffer_count > 0) { + OPENVINO_ASSERT((buffer_count == 1) && (buffer_count + loop_count == consumer_inputs.size()), + "Loop output must have not more than 1 Buffer"); + } + } + + for (auto buffer_it = buffer_neighbours.begin(); buffer_it != buffer_neighbours.end(); ++buffer_it) { + for (auto neighbour_it = std::next(buffer_it); neighbour_it != buffer_neighbours.end(); ++neighbour_it) { + update_adj_matrix(*buffer_it, *neighbour_it); + } + } + } + + return adj; +} + +auto IdentifyBuffers::coloring(BufferSet& buffers, std::vector& adj) -> std::map { + size_t color = 0; + std::map color_groups; + const auto size = buffers.size(); + for (size_t i = 0; i < size; i++) { + // The Buffer is already colored (visited) - skip + if (!buffers[i]) + continue; + + const auto& buffer = buffers[i]; + color_groups[color].push_back(buffer); // Add to Color Group + buffers[i] = nullptr; // Remove from graph vertices + + // While Buffer `i` has non-coloured non-neighbours (while row `i` contains 0) + while (!std::accumulate(adj.begin() + i * size, adj.begin() + (i + 1) * size, true, std::logical_and())) { + size_t j = i + 1; + // Find first non-adjacent and non-visited (non-colored) Buffer to color him to the same color + for (; j < size; ++j) { + if (!adj[index(size, i, j)] && buffers[j]) + break; + } + + // If we don't have the corresponding non-adjacent and non-colored Buffers, + // we should make break - all potential Buffers for the current color are already colored + if (j == size) + break; + + const auto& neighbour_buffer = buffers[j]; + color_groups[color].push_back(neighbour_buffer); // Add to Color Group + buffers[j] = nullptr; // Remove from graph vertices + // Unite adjacency links: + // All the neighbors of Buffer `j` are added to the neighbors of Buffer `i` (the `vertices` are pulled together). + // The result is an updated i-th row of the adjacency matrix, + // in which 0 are only in columns with `vertex` numbers that are not adjacent to either the i-th or j-th `vertices`. + // Mathematically, this can be replaced by the operation of OR of Boolean vectors representing strings i and j. + std::transform(adj.begin() + i * size, adj.begin() + (i + 1) * size, adj.begin() + j * size, + adj.begin() + i * size, std::logical_or()); + } + + color++; + } + + return color_groups; +} + +bool IdentifyBuffers::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::IdentifyBuffers") + // Unite Buffers using Graph coloring algorithm. + // Notes: We identify only Buffer with Intermediate memory because Buffers with new memory are used only in Brgemm case + // so these Buffers are always IntermediateBuffer nonadjacent + BufferSet buffer_exprs; + + for (const auto& expr : linear_ir) { + if (const auto buffer = ov::as_type_ptr(expr->get_node())) { + buffer_exprs.push_back(buffer); + } + } + + // Creation of Adj matrix + auto adj = create_adjacency_matrix(linear_ir, buffer_exprs); + + // Graph coloring algorithm + const auto color_groups = coloring(buffer_exprs, adj); + + for (const auto& pair : color_groups) { + const auto color = pair.first; + const auto& united_buffers = pair.second; + for (const auto& buffer : united_buffers) { + buffer->set_id(color); + } + } + + return true; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/init_loops.cpp b/src/common/snippets/src/lowered/pass/init_loops.cpp new file mode 100644 index 00000000000000..22721bb48395c7 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/init_loops.cpp @@ -0,0 +1,202 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/init_loops.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +namespace { +void filter_ports(LinearIR& linear_ir, + std::vector& loop_entries, std::vector& loop_exits) { + std::vector new_loop_entries; + std::vector new_loop_exits; + new_loop_entries.reserve(loop_entries.size()); + new_loop_exits.reserve(loop_exits.size()); + + for (const auto& loop_entry_point : loop_entries) { + const auto& expr = loop_entry_point.get_expr(); + const auto ma = ov::as_type_ptr(expr->get_node()); + if (ma && ma->is_memory_access_input_port(loop_entry_point.get_index())) { + new_loop_entries.push_back(loop_entry_point); + } + } + + for (const auto& loop_exit_point : loop_exits) { + const auto& expr = loop_exit_point.get_expr(); + const auto ma = ov::as_type_ptr(expr->get_node()); + if (ma && ma->is_memory_access_output_port(loop_exit_point.get_index())) { + new_loop_exits.push_back(loop_exit_point); + } + } + + loop_entries = new_loop_entries; + loop_exits = new_loop_exits; +} + +int64_t get_dim_stride(const size_t dim, const std::vector& layout, const std::vector& shape) { + int64_t stride = 1; + for (int i = static_cast(layout.size()) - 1; i >= 0; i--) { + if (layout[i] == dim) + break; + stride *= static_cast(shape[layout[i]]); + } + return stride; +} +} // namespace + +InitLoops::InitLoops() : Pass() {} + +std::vector InitLoops::init_ptr_increments(const std::vector& loop_inputs, + const std::vector& loop_outputs, + size_t dim_idx) { + std::vector ptr_increments; + // Note: Need to find max relevant dim expr to account for broadcasting, collect relevant_dims as well + size_t max_relevant_dim_size = 1; + for (const auto& loop_input : loop_inputs) { + const auto& layout = loop_input.get_descriptor_ptr()->get_layout(); + const auto& shape = loop_input.get_descriptor_ptr()->get_shape(); + const auto& dim = *(layout.rbegin() + dim_idx); + max_relevant_dim_size = std::max(shape[dim], max_relevant_dim_size); + } + for (const auto& loop_output : loop_outputs) { + const auto& layout = loop_output.get_descriptor_ptr()->get_layout(); + const auto& shape = loop_output.get_descriptor_ptr()->get_shape(); + const auto& dim = *(layout.rbegin() + dim_idx); + max_relevant_dim_size = std::max(shape[dim], max_relevant_dim_size); + } + + for (const auto& loop_input : loop_inputs) { + // For strides we have to use layout from source since source writes data by special rules + const auto source = *loop_input.get_connected_ports().begin(); + const auto& layout = loop_input.get_descriptor_ptr()->get_layout(); + const auto& shape = loop_input.get_descriptor_ptr()->get_shape(); + const auto& dim = *(layout.rbegin() + dim_idx); + int64_t ptr_increment = 0; + // If relevant dim is not broadcasted, then ptr_increment is the dim stride in the new layout + if (!(shape[dim] == 1 && max_relevant_dim_size != 1)) + ptr_increment = get_dim_stride(dim, source.get_descriptor_ptr()->get_layout(), shape); + ptr_increments.push_back(ptr_increment); + } + + for (const auto& loop_output : loop_outputs) { + const auto& layout = loop_output.get_descriptor_ptr()->get_layout(); + const auto& shape = loop_output.get_descriptor_ptr()->get_shape(); + const auto& dim = *(layout.rbegin() + dim_idx); + int64_t ptr_increment = 0; + // If relevant dim is not broadcasted, then ptr_increment is the dim stride in the new layout + if (!(shape[dim] == 1 && max_relevant_dim_size != 1)) + ptr_increment = get_dim_stride(dim, layout, shape); + ptr_increments.push_back(ptr_increment); + } + + return ptr_increments; +} + +std::vector InitLoops::init_finalization_offsets(const std::vector& ptr_increments, size_t work_amount) { + std::vector finalization_offsets; + for (const auto& ptr_incr : ptr_increments) { + int64_t offset = -1 * ptr_incr * work_amount; + finalization_offsets.push_back(offset); + } + return finalization_offsets; +} + +std::vector InitLoops::init_element_type_sizes(const std::vector& loop_inputs, + const std::vector& loop_outputs) { + std::vector element_types; + element_types.reserve(loop_inputs.size() + loop_outputs.size()); + for (const auto& in : loop_inputs) { + element_types.push_back(in.get_expr()->get_node()->get_input_element_type(in.get_index()).size()); + } + for (const auto& out : loop_outputs) { + element_types.push_back(out.get_expr()->get_node()->get_output_element_type(out.get_index()).size()); + } + return element_types; +} + +void InitLoops::insertion(LinearIR& linear_ir, const LinearIR::LoopManager::LoopInfoPtr& loop_info, + size_t loop_id, size_t dim_idx, bool has_outer_loop) { + auto loop_entries = loop_info->entry_exprs; + auto loop_exits = loop_info->exit_exprs; + const auto work_amount = loop_info->work_amount; + const auto work_amount_increment = loop_info->increment; + + LinearIR::constExprIt loop_begin_pos, loop_end_pos; + LinearIR::LoopManager::get_loop_bounds(linear_ir, loop_entries, loop_exits, loop_begin_pos, loop_end_pos, loop_id); + + filter_ports(linear_ir, loop_entries, loop_exits); + + auto ptr_increments = init_ptr_increments(loop_entries, loop_exits, dim_idx); + auto finalization_offsets = init_finalization_offsets(ptr_increments, work_amount); + const auto io_data_sizes = init_element_type_sizes(loop_entries, loop_exits); + + const auto& loop_begin = std::make_shared(); + const auto& loop_begin_expr = linear_ir.create_expression(loop_begin, std::vector{}); + linear_ir.insert(loop_begin_pos, loop_begin_expr); + + const auto& loop_end = std::make_shared( + loop_begin->output(0), work_amount, work_amount_increment, ptr_increments, finalization_offsets, + io_data_sizes, loop_entries.size(), loop_exits.size()); + loop_end->has_outer_loop = has_outer_loop; + + std::vector loop_end_inputs; + for (const auto& expr_port : loop_entries) + loop_end_inputs.push_back(expr_port.get_expr()->get_input_port_connector(expr_port.get_index())); + for (const auto& expr_port : loop_exits) + loop_end_inputs.push_back(expr_port.get_expr()->get_output_port_connector(expr_port.get_index())); + loop_end_inputs.push_back(loop_begin_expr->get_output_port_connector(0)); + + const auto& loop_end_expr = linear_ir.create_expression(loop_end, loop_end_inputs); + linear_ir.insert(loop_end_pos, loop_end_expr); +} + +bool InitLoops::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::InitLoops") + if (linear_ir.empty()) + return false; + + const auto& loop_manager = linear_ir.get_loop_manager(); + + std::set inserted_loops; + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto expr = *expr_it; + const auto& node = expr->get_node(); + if (ov::is_type(node) || + ov::is_type(node) || // Need to cover Buffer + ov::is_type(node) || + ov::is_type(node)) + continue; + + // Outer Loop ----> Inner Loop + const auto expr_loops = expr->get_loop_ids(); + const auto loop_depth = expr_loops.size(); + for (size_t i = 0; i < loop_depth; ++i) { + const auto loop_id = expr_loops[i]; + if (loop_id == Expression::LOOP_NULL_ID) + continue; + bool need_to_insert = inserted_loops.find(loop_id) == inserted_loops.end(); + if (need_to_insert) { + const auto loop_info = loop_manager->get_loop_info(loop_id); + const bool has_outer_loop = i > 0 && inserted_loops.find(expr_loops[i - 1]) != inserted_loops.end(); + insertion(linear_ir, loop_info, loop_id, loop_depth - i - 1, has_outer_loop); + inserted_loops.insert(loop_id); // save Loop ID + } + } + } + + return true; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/insert_buffers.cpp b/src/common/snippets/src/lowered/pass/insert_buffers.cpp new file mode 100644 index 00000000000000..ae5097223d3acb --- /dev/null +++ b/src/common/snippets/src/lowered/pass/insert_buffers.cpp @@ -0,0 +1,238 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/insert_buffers.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +InsertBuffers::InsertBuffers(int32_t buffer_allocation_rank) + : Pass(), m_buffer_allocation_rank(buffer_allocation_rank) {} + +LinearIR::constExprIt InsertBuffers::insertion_position(const LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, + const ExpressionPtr& up_expr, const ExpressionPtr& down_expr) { + const auto up_loops = up_expr->get_loop_ids(); + const auto down_loops = down_expr->get_loop_ids(); + OPENVINO_ASSERT(up_loops.size() == down_loops.size(), "The Loop IDs must be normalized!"); + size_t loop_idx = 0; + for (; loop_idx < up_loops.size(); ++loop_idx) { + if (up_loops[loop_idx] != down_loops[loop_idx]) + break; + } + + // If loop_ids of expressions are equal and don't contain LOOP_NULL_ID, it's attempt to insert Buffer between expressions from the same Loop! + if (loop_idx == up_loops.size() && std::none_of(up_loops.begin(), up_loops.end(), [](const size_t id) { return id == Expression::LOOP_NULL_ID; })) + OPENVINO_THROW("Buffer isn't supported in Inner Loop at the moment!"); + + // If the both expressions are outside Loops, insert Buffer explicitly after first Expression + if (loop_idx == up_loops.size()) { + return std::next(std::find(linear_ir.begin(), linear_ir.end(), up_expr)); + } + + const auto up_loop_id = up_loops[loop_idx]; + const auto down_loop_id = down_loops[loop_idx]; + if (up_loop_id != Expression::LOOP_NULL_ID) { + // If upper expression is inside Loop, we should insert Buffer after this Loop + const auto loop_info = loop_manager->get_loop_info(up_loop_id); + LinearIR::constExprIt loop_begin_pos, loop_end_pos; + loop_manager->get_loop_bounds(linear_ir, up_loop_id, loop_begin_pos, loop_end_pos); + return loop_end_pos; + } else if (down_loop_id != Expression::LOOP_NULL_ID) { + // If lower expression is inside Loop, we should insert Buffer before this Loop + const auto loop_info = loop_manager->get_loop_info(down_loop_id); + LinearIR::constExprIt loop_begin_pos, loop_end_pos; + loop_manager->get_loop_bounds(linear_ir, down_loop_id, loop_begin_pos, loop_end_pos); + return loop_begin_pos; + } else { + OPENVINO_THROW("Incorrect configuration for Buffer insertion!"); + } +} + +void InsertBuffers::insertion(LinearIR& linear_ir, const LinearIR::LoopManagerPtr& loop_manager, size_t loop_id, + const std::vector& loop_entries, const std::vector& loop_exits) { + for (const auto& entry_point : loop_entries) { + const auto& expr = entry_point.get_expr(); + const auto port = entry_point.get_index(); + const auto node = expr->get_node(); + const auto& input_connector = expr->get_input_port_connector(port); + const auto& parent_expr_output = input_connector->get_source(); + const auto& parent_expr = parent_expr_output.get_expr(); + const auto parent_port = parent_expr_output.get_index(); + const auto parent = parent_expr->get_node(); + if (ov::is_type(parent) || + ov::is_type(parent) || + ov::is_type(parent) || + ov::is_type(parent)) + continue; + + // Each MemoryAccess op needs Buffer + const auto parent_ma = ov::as_type_ptr(parent); + const auto node_ma = ov::as_type_ptr(node); + bool is_buffer_needed = (parent_ma && parent_ma->is_memory_access_output_port(parent_port)) || + (node_ma && node_ma->is_memory_access_input_port(port)); + if (!is_buffer_needed) { + const auto current_loops = expr->get_loop_ids(); + const auto parent_loops = parent_expr->get_loop_ids(); + const auto current_loop_count = current_loops.size(); + const auto parent_loop_count = parent_loops.size(); + OPENVINO_ASSERT(current_loop_count == parent_loop_count); + const auto current_loop_lvl = std::distance(current_loops.begin(), std::find(current_loops.begin(), current_loops.end(), loop_id)); + for (size_t i = current_loop_lvl; i < current_loop_count; i++) { + if (current_loops[i] != parent_loops[i] && + current_loops[i] != Expression::LOOP_NULL_ID && + parent_loops[i] != Expression::LOOP_NULL_ID) { + is_buffer_needed = true; + break; + } + } + } + + if (is_buffer_needed) { + // We should insert Buffer between first different Loops. + // Example: Target Parent Loop identifies: 3, 2, 1 + // Current expr Loop identifies: 3, 4, 6 + // Need to insert between 2nd and 4th Loops - after 2nd Loop + const auto pos = insertion_position(linear_ir, loop_manager, parent_expr, expr); + const auto buffer = std::make_shared(parent->output(parent_port), m_buffer_allocation_rank); + PortDescriptorUtils::set_port_descriptor_ptr(buffer->output(0), parent_expr_output.get_descriptor_ptr()->clone()); + // Output connector is automatically filled from PortDescriptor + const auto buffer_expr = linear_ir.create_expression(buffer, {input_connector}); + linear_ir.insert(pos, buffer_expr); + linear_ir.replace_input(entry_point, buffer_expr->get_output_port_connector(0)); + } + } + + for (const auto& exit_point : loop_exits) { + const auto& expr = exit_point.get_expr(); + const auto port = exit_point.get_index(); + const auto node = expr->get_node(); + const auto output_connector = exit_point.get_port_connector_ptr(); + const auto child_exprs_inputs = output_connector->get_consumers(); + const auto current_loops = expr->get_loop_ids(); + const auto current_loop_count = current_loops.size(); + const std::vector node_outs = {output_connector}; + + std::set potential_consumers; + std::set buffers; + const auto current_loop_lvl = std::distance(current_loops.begin(), std::find(current_loops.begin(), current_loops.end(), loop_id)); + for (const auto& child_expr_input : child_exprs_inputs) { + const auto& child_expr = child_expr_input.get_expr(); + const auto child_port = child_expr_input.get_index(); + const auto& child = child_expr->get_node(); + if (ov::is_type(child)) + continue; + if (ov::is_type(child)) { + buffers.insert(child_expr); + continue; + } + // Each MemoryAccess op needs Buffer + const auto child_ma = ov::as_type_ptr(child); + const auto node_ma = ov::as_type_ptr(node); + if ((child_ma && child_ma->is_memory_access_input_port(child_port)) || + (node_ma && node_ma->is_memory_access_output_port(port))) { + potential_consumers.insert(child_expr_input); + continue; + } + + const auto child_loops = child_expr->get_loop_ids(); + const auto child_loop_count = child_loops.size(); + OPENVINO_ASSERT(current_loop_count == child_loop_count, "The Loop IDs must be normalized!"); + for (size_t i = current_loop_lvl; i < child_loop_count; i++) { + if (current_loops[i] != child_loops[i] && + current_loops[i] != Expression::LOOP_NULL_ID && + child_loops[i] != Expression::LOOP_NULL_ID) { + potential_consumers.insert(child_expr_input); + break; + } + } + } + + if (!potential_consumers.empty() || buffers.size() > 1) { + // If some of children from one common port are different Buffers, + // we should remove them to insert one common Buffer on one common port + if (!buffers.empty()) { + for (const auto& buffer : buffers) { + const auto& buffer_out = buffer->get_output_port_connector(0); + const auto buffer_consumers_inputs = buffer_out->get_consumers(); + linear_ir.replace_input(buffer_consumers_inputs, output_connector); + potential_consumers.insert(buffer_consumers_inputs.begin(), buffer_consumers_inputs.end()); + linear_ir.erase(std::find(linear_ir.begin(), linear_ir.end(), buffer)); + } + } + + // We should insert Buffer between first different Loops. + // Example: Current expr Loop identifies: 3, 2, 1 + // Target consumers Loop identifies: 3, 4, 6 + // Need to insert after 2nd Loops + // Note: All potential consumers must have the same count of first equal Loop identifies and the same count of different last identifies + // TODO: Need to verify that + const auto pos = insertion_position(linear_ir, loop_manager, expr, (*potential_consumers.begin()).get_expr()); + + auto buffer = std::make_shared(node->output(port), m_buffer_allocation_rank); + PortDescriptorUtils::set_port_descriptor_ptr(buffer->output(0), exit_point.get_descriptor_ptr()->clone()); + // We cannot insert Node output connector on Buffer output because not all consumers of Node needs Buffer + // Example: + // Add + // / \ <- It should be the same TD + // Result Buffer + // | <- It should be new TD + // Relu + // Output port connector is automatically filled from PortDescriptor + const auto buffer_expr = linear_ir.create_expression(buffer, node_outs); + linear_ir.insert(pos, buffer_expr); + linear_ir.replace_input(potential_consumers, buffer_expr->get_output_port_connector(0)); + } + } +} + +bool InsertBuffers::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::InsertBuffers") + if (linear_ir.empty()) + return false; + + const auto& loop_manager = linear_ir.get_loop_manager(); + const auto loop_data_map = loop_manager->get_map(); + for (const auto& loop_data : loop_data_map) { + const auto loop_id = loop_data.first; + const auto loop_info = loop_data.second; + const auto loop_entries = loop_info->entry_exprs; + const auto loop_exits = loop_info->exit_exprs; + insertion(linear_ir, loop_manager, loop_id, loop_entries, loop_exits); + } + + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto expr = *expr_it; + const auto node = (*expr_it)->get_node(); + const auto ma = ov::as_type_ptr(node); + if (!ma) + continue; + + const auto input_ports = ma->get_memory_access_input_ports(); + const auto output_ports = ma->get_memory_access_output_ports(); + std::vector loop_entries(input_ports.size()), loop_exits(output_ports.size()); + for (const auto& p : input_ports) { + loop_entries[p.first] = expr->get_input_port(p.first); + } + for (const auto& p : output_ports) { + loop_exits[p.first] = expr->get_output_port(p.first); + } + + insertion(linear_ir, loop_manager, Expression::LOOP_NULL_ID, loop_entries, loop_exits); + } + + return true; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/insert_load_store.cpp b/src/common/snippets/src/lowered/pass/insert_load_store.cpp new file mode 100644 index 00000000000000..962b4eed9381cb --- /dev/null +++ b/src/common/snippets/src/lowered/pass/insert_load_store.cpp @@ -0,0 +1,175 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/insert_load_store.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +namespace { +auto get_inner_loop_id(const std::vector& loop_ids) -> size_t { + size_t inner_loop = Expression::LOOP_NULL_ID; + for (int i = static_cast(loop_ids.size()) - 1; i >= 0; --i) { + if (loop_ids[i] != Expression::LOOP_NULL_ID) { + inner_loop = loop_ids[i]; + break; + } + } + return inner_loop; +} +} // namespace + +using LoopManager = LinearIR::LoopManager; +using LoopInfoPtr = LoopManager::LoopInfoPtr; + +InsertLoadStore::InsertLoadStore(size_t vector_size) : m_vector_size(vector_size) {} + +void InsertLoadStore::update_loops(const LinearIR::LoopManagerPtr& loop_manager, const std::vector& loop_ids, + const ExpressionPort& actual_port, const std::vector& target_ports, bool is_entry) { + for (auto loop_id : loop_ids) { + if (loop_id != Expression::LOOP_NULL_ID) + update_loop(loop_manager->get_loop_info(loop_id), actual_port, target_ports, is_entry); + } +} + +void InsertLoadStore::update_loop(const LinearIR::LoopManager::LoopInfoPtr& loop_info, + const ExpressionPort& actual_port, const std::vector& target_ports, bool is_entry) { + auto& ports = is_entry ? loop_info->entry_exprs : loop_info->exit_exprs; + auto port_it = std::find(ports.begin(), ports.end(), actual_port); + if (port_it == ports.end()) + return; + port_it = ports.erase(port_it); + ports.insert(port_it, target_ports.cbegin(), target_ports.cend()); +} + +size_t InsertLoadStore::get_count(const PortDescriptorPtr& port_desc) const { + const auto layout = port_desc->get_layout(); + const auto shape = port_desc->get_shape(); + // Find last dimension by layout + const auto last_dim_idx = std::find(layout.begin(), layout.end(), layout.size() - 1); + OPENVINO_ASSERT(last_dim_idx != layout.end(), "Load/Store expression have incorrect layout"); + const auto dim = shape[*last_dim_idx]; + return dim == 1 ? 1 : m_vector_size; +} + +bool InsertLoadStore::insert_load(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it) { + const auto& loop_manager = linear_ir.get_loop_manager(); + const auto& data_expr = *data_expr_it; + const auto& data_node = data_expr->get_node(); + const auto& output_connector = data_expr->get_output_port_connector(0); + const auto consumer_inputs = output_connector->get_consumers(); + + bool was_inserted = false; + for (const auto& consumer_input : consumer_inputs) { + const auto& consumer_expr = consumer_input.get_expr(); + const auto port = consumer_input.get_index(); + const auto& consumer = consumer_expr->get_node(); + const auto ma = ov::as_type_ptr(consumer); + if (ma && ma->is_memory_access_input_port(port)) + return false; + + // Find Inner Loop + const auto& loop_ids = consumer_expr->get_loop_ids(); + const auto inner_loop = get_inner_loop_id(loop_ids); + OPENVINO_ASSERT(inner_loop != Expression::LOOP_NULL_ID, "Loop hasn't been found!"); + + const auto load = std::make_shared(data_node->output(0), get_count(data_expr->get_output_port_descriptor(0))); + PortDescriptorUtils::set_port_descriptor_ptr(load->output(0), consumer_input.get_descriptor_ptr()->clone()); + const auto load_expr = linear_ir.create_expression(load, {output_connector}); + linear_ir.insert(std::find(data_expr_it, linear_ir.cend(), consumer_expr), load_expr); + linear_ir.replace_input(consumer_input, load_expr->get_output_port_connector(0)); + // Copy Loop identifies + load_expr->set_loop_ids(loop_ids); + + // Need to update all the corresponding Loops with the same Entry Point + const auto prev_entry_point = consumer_input; + const auto new_entry_point = load_expr->get_input_port(0); + update_loops(loop_manager, loop_ids, prev_entry_point, {new_entry_point}, true); + was_inserted = true; + } + + return was_inserted; +} + +bool InsertLoadStore::insert_store(LinearIR& linear_ir, const LinearIR::constExprIt& data_expr_it) { + const auto& loop_manager = linear_ir.get_loop_manager(); + const auto& data_expr = *data_expr_it; + const auto& input_connector = data_expr->get_input_port_connector(0); + const auto& parent_output = input_connector->get_source(); + const auto& parent_expr = parent_output.get_expr(); + const auto port = parent_output.get_index(); + const auto& parent = parent_expr->get_node(); + const auto ma = ov::as_type_ptr(parent); + if (ma && ma->is_memory_access_output_port(port)) + return false; + + // Find Inner Loop + const auto& loop_ids = parent_expr->get_loop_ids(); + const auto inner_loop = get_inner_loop_id(loop_ids); + OPENVINO_ASSERT(inner_loop != Expression::LOOP_NULL_ID, "Loop hasn't been found!"); + + const auto store = std::make_shared(parent->output(port), get_count(data_expr->get_input_port_descriptor(0))); + PortDescriptorUtils::set_port_descriptor_ptr(store->output(0), parent_output.get_descriptor_ptr()->clone()); + const auto store_expr = linear_ir.create_expression(store, {input_connector}); + const auto& reverse_insertion_pos = std::find(std::reverse_iterator(data_expr_it), linear_ir.crend(), parent_expr); + const auto& insertion_pos = reverse_insertion_pos.base(); + linear_ir.insert(insertion_pos, store_expr); + linear_ir.replace_input(data_expr->get_input_port(0), store_expr->get_output_port_connector(0)); + // Copy Loop identifies + store_expr->set_loop_ids(loop_ids); + + // Need to update all the corresponding Loops with the same Exit Point + const auto prev_exit_point = parent_output; + // The previous exit point byt one output port can have several consumers that can be potential exit points + // So we should verify on the possible future exit points + const auto consumer_inputs = input_connector->get_consumers(); + const auto should_be_saved = std::any_of(consumer_inputs.begin(), consumer_inputs.end(), + [](const ExpressionPort& input_port) { + const auto& node = input_port.get_expr()->get_node(); + return ov::is_type(node) || ov::is_type(node); + }); + const auto new_exit_point = store_expr->get_output_port(0); + const auto new_exit_points = should_be_saved ? std::vector{prev_exit_point, new_exit_point} + : std::vector{new_exit_point}; + update_loops(loop_manager, loop_ids, prev_exit_point, new_exit_points, false); + return true; +} + +bool InsertLoadStore::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::InsertLoadStore") + + bool modified = false; + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto expr = *expr_it; + const auto& node = expr->get_node(); + if (ov::is_type(node)) { + modified |= insert_load(linear_ir, expr_it); + continue; + } + if (ov::is_type(node)) { + modified |= insert_store(linear_ir, expr_it); + continue; + } + if (auto buffer = ov::as_type_ptr(node)) { + modified |= insert_load(linear_ir, expr_it); + if (buffer->is_intermediate_memory()) + modified |= insert_store(linear_ir, expr_it); + continue; + } + } + + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/insert_tail_loop.cpp b/src/common/snippets/src/lowered/pass/insert_tail_loop.cpp new file mode 100644 index 00000000000000..fd64294cd2d327 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/insert_tail_loop.cpp @@ -0,0 +1,210 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/insert_tail_loop.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +void InsertTailLoop::tail_transformations(LinearIR& linear_ir, + LinearIR::container::const_iterator tail_begin, + LinearIR::container::const_iterator tail_end, + const size_t tail_size) { + const auto& config = linear_ir.get_config(); + auto insertFill = [tail_size](const ov::Input& input) -> std::shared_ptr { + std::shared_ptr fill = nullptr; + auto& rt = input.get_rt_info(); + auto fill_rt = rt.find("set_fill"); + if (fill_rt != rt.end()) { + const auto fill_value = fill_rt->second.as(); + fill = std::make_shared(input.get_source_output(), tail_size, fill_value); + input.get_node()->set_argument(input.get_index(), fill); + } + return fill; + }; + + for (auto expr_it = tail_begin; expr_it != tail_end; expr_it++) { + // We should fill vector regs by float_min and zero to have + // correct math calculations for ReduceMax and ReduceSum in scalar case. + // Note: We find Maximum and Add ops because HorizonMax and HorizonSum are outside Loop, + // so they are missed in + auto op = (*expr_it)->get_node(); + if (config.m_need_fill_tail_register && + (ov::is_type(op) || + ov::is_type(op))) { + for (size_t i = 0; i < op->inputs().size(); ++i) { + if (auto fill = insertFill(op->input(i))) { + const auto& input = expr_it->get()->get_input_port_connector(i); + const auto consumers = input->get_consumers(); + auto fill_expr = linear_ir.create_expression(fill, {input}); + linear_ir.insert(expr_it, fill_expr); + linear_ir.replace_input(consumers, fill_expr->get_output_port_connector(0)); + // in_reg == out_reg since we want to modify vector reg inplace + const auto reg = expr_it->get()->get_input_port_descriptor(0)->get_reg(); + fill_expr->get_input_port_descriptor(0)->set_reg(reg); + fill_expr->get_output_port_descriptor(0)->set_reg(reg); + } + } + } else if (const auto memory_access = std::dynamic_pointer_cast(op)) { + for (const auto p : memory_access->get_memory_access_input_ports()) { + const auto port = p.first; + if (memory_access->get_input_count(port) > 1) { + memory_access->set_input_count(tail_size, port); + } + } + for (const auto p : memory_access->get_memory_access_output_ports()) { + const auto port = p.first; + if (memory_access->get_output_count(port) > 1) { + memory_access->set_output_count(tail_size, port); + } + } + } + } +} + +bool InsertTailLoop::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::insertTailLoop") + bool modified = false; + // *1* solo vector/tail loop + empty outer loop + // => skip increments (both counter & ptr) : set evaluate_once flag + // *2* solo vector/tail loop + non-empty outer loop + // => skip counter increments but perform ptr increments : set evaluate_once, + // and perform pointer increments through finalization offsets + // *3* vector loop(s) + one tail loop + // => vector as usual, tail depends on outer loop, see *1* and *2* + auto optimize_single_evaluation = [](const std::shared_ptr& loop, bool force_ptr_increment = false) { + if (loop->get_work_amount() < 2 * loop->get_increment()) { + loop->set_evaluate_once(true); + if (force_ptr_increment || loop->has_outer_loop) { + std::vector new_finalization_offsets(loop->get_finalization_offsets()); + const auto& ptr_increments = loop->get_ptr_increments(); + const auto work_amount_incr = static_cast(loop->get_increment()); + for (size_t i = 0; i < new_finalization_offsets.size(); i++) { + new_finalization_offsets[i] += ptr_increments[i] * work_amount_incr; + } + loop->set_finalization_offsets(new_finalization_offsets); + } + return true; + } else { + return false; + } + }; + auto is_loop_with_buffers = [&linear_ir](const std::shared_ptr& loop_end) { + auto is_buffer_input = [](const PortConnectorPtr& input) { + const auto& parent_expr = input->get_source().get_expr(); + return ov::is_type(parent_expr->get_node()); + }; + auto is_buffer_output = [](const PortConnectorPtr& output) { + const auto child_exprs_inputs = output->get_consumers(); + return std::any_of(child_exprs_inputs.begin(), child_exprs_inputs.end(), + [](const ExpressionPort& lp) {return ov::is_type(lp.get_expr()->get_node());}); + }; + + const auto& loop_end_expr = linear_ir.get_expr_by_node(loop_end); + const auto inputs = loop_end_expr->get_input_port_connectors(); + const auto in_num = loop_end->get_input_num(); + const auto out_num = loop_end->get_output_num(); + OPENVINO_ASSERT(inputs.size() == (in_num + out_num + 1), + std::string("The LoopEnd expression must have the count of inputs is") + + std::string("equal to count of input and outputs of Loop plus one for work amount")); + const std::vector loop_ins(inputs.begin(), inputs.begin() + in_num); + const std::vector loop_outs(inputs.begin() + in_num, inputs.begin() + in_num + out_num); + return std::any_of(loop_ins.begin(), loop_ins.end(), is_buffer_input) || + std::any_of(loop_outs.begin(), loop_outs.end(), is_buffer_output); + }; + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end();) { + const auto& loop_begin = ov::as_type_ptr((*expr_it)->get_node()); + if (!loop_begin) { + expr_it++; + continue; + } + + // ignore outer loops and possible manual scalar loops + const auto& loop_end = loop_begin->get_loop_end(); + if (loop_end->get_increment() != 1) { + auto loop_begin_expr_it = expr_it; + const auto vector_loop_end = loop_end; + while ((*expr_it)->get_node() != vector_loop_end) + expr_it++; + // Note that exp_it points to the element AFTER loop_end + expr_it++; + const bool is_there_buffer = is_loop_with_buffers(vector_loop_end); + const auto work_amount = vector_loop_end->get_work_amount(); + const auto increment = vector_loop_end->get_increment(); + const auto tail_size = work_amount % increment; + const auto need_tail = tail_size != 0; + const auto need_vector_loop = work_amount >= increment; + // Note, that finalization_offsets could be modified inside optimize_single_evaluation, + // so need to save them here to cover (evaluate_once vector with non-zero finalization_offsets + tail) + std::vector tail_finalization_offsets = need_tail ? vector_loop_end->get_finalization_offsets() + : std::vector {}; + // vector loops are required => Just copy the body, original loop is already a vector one + if (need_vector_loop) { + // Note that finalization offsets should be applied after the last iteration. + // So if there is a tail, then we should apply offsets after it, but not now. + if (need_tail) + vector_loop_end->set_finalization_offsets( + std::vector(tail_finalization_offsets.size(), 0)); + + // force ptr increments if there is tail + optimize_single_evaluation(vector_loop_end, need_tail || is_there_buffer); + } + + // tail is required => transform the body into a tail representation + // tail loop is fake loop because for tail we should calculate only + // finalization offsets which are supported by LoopEnd. + if (need_tail) { + LinearIR::constExprIt tail_begin; + LinearIR::constExprIt tail_end; + if (need_vector_loop) { + auto vector_loop_deep_copy = LinearIR::deep_copy_range(loop_begin_expr_it, expr_it); + auto is_par_or_res = [](const ExpressionPtr& expr) { + return is_type(expr->get_node()) || + is_type(expr->get_node()); + }; + // Note: It's illegal to insert Parameter or Result to the IR, but they can appear inside vector loop + // So we have to remo them before injecting tail loop into linear_ir + auto to_erase = std::remove_if(vector_loop_deep_copy.begin(), vector_loop_deep_copy.end(), is_par_or_res); + vector_loop_deep_copy.erase(to_erase, vector_loop_deep_copy.end()); + tail_begin = linear_ir.insert(expr_it, vector_loop_deep_copy.begin(), vector_loop_deep_copy.end()); + tail_end = expr_it; + } else { + tail_begin = loop_begin_expr_it; + tail_end = expr_it; + } + + tail_transformations(linear_ir, tail_begin, tail_end, tail_size); + std::shared_ptr tail_loop_end = + ov::as_type_ptr((*tail_begin)->get_node())->get_loop_end(); + tail_loop_end->set_finalization_offsets(tail_finalization_offsets); + tail_loop_end->set_increment(tail_size); + // ptr increments were set to the old increment, need to update them in accordance with the new one + tail_loop_end->set_work_amount(tail_size); + tail_loop_end->has_outer_loop = vector_loop_end->has_outer_loop; + + // Note: despite the fact that the tail loop is always executed once, we still need + // to keep finalization_offsets to reset Buffer + optimize_single_evaluation(tail_loop_end, is_there_buffer); + } + modified = true; + } else { + // if there is a loop, then exprt_it already points to the next statement (after loop end) + // so we need to increment iterator only if there was no loop + expr_it++; + } + } + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov + diff --git a/src/common/snippets/src/lowered/pass/load_movebroadcast_to_broadcastload.cpp b/src/common/snippets/src/lowered/pass/load_movebroadcast_to_broadcastload.cpp new file mode 100644 index 00000000000000..c7bf1aab25c052 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/load_movebroadcast_to_broadcastload.cpp @@ -0,0 +1,65 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/load_movebroadcast_to_broadcastload.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + + +bool LoadMoveBroadcastToBroadcastLoad::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::LoadMoveBroadcastToBroadcastLoad") + bool modified = false; + + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto& expr = *expr_it; + const auto& op = expr->get_node(); + // Match on MoveBroadcast because MoveBroadcast is rare node in bodies + if (const auto move_broadcast = ov::as_type_ptr(op)) { + const auto& interm_connector = expr->get_input_port_connector(0); + const auto parent_expr = interm_connector->get_source().get_expr(); + const auto load = ov::as_type_ptr(parent_expr->get_node()); + if (!load) + continue; + + // Cannot rewrite Broadcast + Load if load has more than 1 user + // or more than one input, or if Broadcast has several inputs + const auto load_consumers_inputs = interm_connector->get_consumers(); + size_t count = 0; + for (const auto& consumer_expr_input : load_consumers_inputs) { + const auto consumer = consumer_expr_input.get_expr()->get_node(); + if (!ov::is_type(consumer)) + count++; + } + + if (count > 1) + continue; + + const auto& outshape = move_broadcast->get_output_partial_shape(0); + const auto broadcastload = std::make_shared(load->input_value(0), outshape, load->get_offset()); + const auto move_consumers = expr->get_output_port_connector(0)->get_consumers(); + PortDescriptorUtils::set_port_descriptor_ptr(broadcastload->output(0), expr->get_output_port(0).get_descriptor_ptr()->clone()); + const auto broadcastload_expr = linear_ir.create_expression(broadcastload, { parent_expr->get_input_port_connector(0) }); + const auto mv_expr_it = expr_it; + const auto insertion_pos = std::next(expr_it); + expr_it = linear_ir.insert(insertion_pos, broadcastload_expr); + linear_ir.erase(std::find(linear_ir.begin(), mv_expr_it, parent_expr)); + linear_ir.erase(mv_expr_it); + linear_ir.replace_input(move_consumers, broadcastload_expr->get_output_port_connector(0)); + modified |= true; + } + } + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/mark_loops.cpp b/src/common/snippets/src/lowered/pass/mark_loops.cpp new file mode 100644 index 00000000000000..86246ce61f1be6 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/mark_loops.cpp @@ -0,0 +1,99 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/mark_loops.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +MarkLoops::MarkLoops(size_t vector_size) : Pass(), m_vector_size(vector_size) {} + +bool MarkLoops::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::MarkLoops") + if (linear_ir.empty()) + return false; + + const auto& lowering_config = linear_ir.get_config(); + const auto& loop_manager = linear_ir.get_loop_manager(); + auto loop_depth = lowering_config.m_loop_depth; + + // Parameters Results or Constants are ignored. They can't be used as a loop starting point + auto is_not_start_point = [](const std::shared_ptr& node) { + return ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node); + }; + + auto are_conflicted = [](const ExpressionPort& lhs, const ExpressionPort& rhs) { + const auto& lhs_desc = lhs.get_descriptor_ptr(); + const auto& rhs_desc = rhs.get_descriptor_ptr(); + return lhs_desc->get_subtensor() != rhs_desc->get_subtensor() || + lhs_desc->get_layout() != rhs_desc->get_layout() || + lhs_desc->get_shape() != rhs_desc->get_shape(); + }; + + for (auto expr_it = linear_ir.cbegin(); expr_it != linear_ir.cend(); expr_it++) { + const auto expr = *expr_it; + const auto& node = expr->get_node(); + if (is_not_start_point(node)) + continue; + + auto loop_begin_pos = expr_it; + auto loop_end_pos = loop_begin_pos; + + bool collapse = true; + do { + const auto& prev_expr = *loop_end_pos; + loop_end_pos++; + // If iterator is the last, we should finish Loop + if (loop_end_pos == linear_ir.end()) + break; + + // If iterator is the last, we should finish Loop + const auto& current_expr = *loop_end_pos; + const auto& current_node = current_expr->get_node(); + if (ov::is_type(current_node) || + ov::is_type(current_node)) + break; + + // We finish Loop if + // - the next expr isn't real consumer + // - the is conflict between the corresponding ports + bool is_connected = false; + bool is_conflicted = false; + for (size_t i = 0; i < prev_expr->get_output_count(); ++i) { + const auto& connector = prev_expr->get_output_port_connector(i); + const auto consumers = connector->get_consumers(); + const auto found = std::find_if(consumers.begin(), consumers.end(), [&loop_end_pos](const ExpressionPort& consumer) { + return consumer.get_expr() == *loop_end_pos; + }); + if (found != consumers.end()) { + if (are_conflicted(*found, connector->get_source())) { + is_conflicted = true; + break; + } + is_connected = true; + } + } + collapse = is_connected && !is_conflicted; + } while (collapse); + + loop_manager->mark_loop(loop_begin_pos, loop_end_pos, loop_depth, m_vector_size); + expr_it = std::prev(loop_end_pos); + } + + return true; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/move_result_out_of_loop.cpp b/src/common/snippets/src/lowered/pass/move_result_out_of_loop.cpp new file mode 100644 index 00000000000000..6aaa6493dd45e9 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/move_result_out_of_loop.cpp @@ -0,0 +1,74 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/move_result_out_of_loop.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +bool MoveResultOutOfLoop::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::MoveResultOutOfLoop") + if (linear_ir.empty()) + return false; + + bool modified = false; + const auto loop_manager = linear_ir.get_loop_manager(); + // Visit expressions in reverse order, so we'll move Result to an already visited area. + // This is needed to avoid extra hits, when we match to the same Result twice + for (auto expr_it = linear_ir.crbegin(); expr_it != linear_ir.crend(); expr_it++) { + const auto& forward_it = std::prev(expr_it.base()); + const auto& expr = *expr_it; + const auto& node = expr->get_node(); + if (!ov::is_type(node)) { + continue; + } + + const auto& input_connector = expr->get_input_port_connector(0); + const auto& parent_expr = input_connector->get_source().get_expr(); + const auto parent_loop_ids = parent_expr->get_loop_ids(); + int outer_loop_id = static_cast(parent_loop_ids.size()) - 1; + for (; outer_loop_id >= 0; --outer_loop_id) { + if (parent_loop_ids[outer_loop_id] != Expression::LOOP_NULL_ID) { + break; + } + } + + // Parent is out of Loop: just verify that Result is after Parent + if (outer_loop_id < 0) { + const auto parent_it = std::find(forward_it, linear_ir.cend(), parent_expr); + // If Parent is found after Result, we should move Result + if (parent_it != linear_ir.cend()) { + const auto insertion_pos = std::next(parent_it); + const auto result_it = forward_it; + expr_it = std::prev(expr_it); // save iterator before moving + linear_ir.move(result_it, insertion_pos); + modified = true; + } + continue; + } + + LinearIR::constExprIt loop_begin_pos, loop_end_pos; + loop_manager->get_loop_bounds(linear_ir, parent_loop_ids[outer_loop_id], loop_begin_pos, loop_end_pos); + // If the Result isn't found after Outer LoopEnd, need to move it to there + if (std::find(loop_end_pos, linear_ir.cend(), expr) == linear_ir.cend()) { + expr_it = std::prev(expr_it); // save iterator before moving + linear_ir.move(forward_it, loop_end_pos); + modified = true; + } + } + + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/move_scalar_to_consumer.cpp b/src/common/snippets/src/lowered/pass/move_scalar_to_consumer.cpp new file mode 100644 index 00000000000000..60a87d84eab0f2 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/move_scalar_to_consumer.cpp @@ -0,0 +1,50 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/move_scalar_to_consumer.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +bool MoveScalarToConsumer::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::MoveScalarToConsumer") + if (linear_ir.empty()) + return false; + + bool modified = false; + // Visit expressions in reverse order, so we'll move Scalar to an already visited area. + // This is needed to avoid extra hits, when we match to the same Scalar twice + for (auto expr_it = linear_ir.rbegin(); expr_it != linear_ir.rend(); expr_it++) { + const auto expr = expr_it->get(); + if (ov::is_type(expr->get_node())) { + const auto consumers = expr->get_output_port_connector(0)->get_consumers(); + OPENVINO_ASSERT(consumers.size() == 1, "Scalar expression is expected to have a single consumer"); + + const auto& consumer_expr = consumers.begin()->get_expr(); + // Move something only if consumer is not already the next one (previous since the iterator is a reverse one) + auto forward_it = std::prev(expr_it.base()); + if (consumer_expr != *std::next(forward_it)) { + expr_it = std::prev(expr_it); // save iterator before moving + auto consumer_it = forward_it; + while (*consumer_it != consumer_expr) + consumer_it++; + linear_ir.move(forward_it, consumer_it); + modified = true; + } + } + } + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/pass.cpp b/src/common/snippets/src/lowered/pass/pass.cpp new file mode 100644 index 00000000000000..32bfd138bb9716 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/pass.cpp @@ -0,0 +1,26 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/pass.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +void PassPipeline::register_pass(const std::shared_ptr& pass) { + m_passes.push_back(pass); +} + +void PassPipeline::run(LinearIR& linear_ir) { + for (const auto& pass : m_passes) { + pass->run(linear_ir); + } +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/propagate_layout.cpp b/src/common/snippets/src/lowered/pass/propagate_layout.cpp new file mode 100644 index 00000000000000..7b69c82777d90e --- /dev/null +++ b/src/common/snippets/src/lowered/pass/propagate_layout.cpp @@ -0,0 +1,62 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/propagate_layout.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +bool PropagateLayout::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::PropagateLayout") + if (linear_ir.empty()) + return false; + + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto& expr = *expr_it; + const auto io_expr = std::dynamic_pointer_cast(expr); + if (!io_expr) + continue; + + const bool is_input = io_expr->get_type() == IOExpression::io_type::INPUT; + const auto& connectors = is_input ? expr->get_output_port_connectors() : expr->get_input_port_connectors(); + if (connectors.size() != 1) + OPENVINO_THROW("Parameter/Results should have exactly one output/input"); + + // If input - we should be looking downstream, if output - upstream + const auto& target_connector = connectors.front(); + if (is_input) { + const auto consumer_inputs = target_connector->get_consumers(); + // Note that here we consider only the first child (which is usually load), + // but often there is another child - LoopEnd + std::set> child_layouts; + for (const auto& child_input : consumer_inputs) { + const auto& child = child_input.get_expr(); + const auto port = child_input.get_index(); + const auto& n = child->get_node(); + const auto ma = ov::as_type_ptr(n); + if (ma && ma->is_memory_access_input_port(port)) { + child_layouts.insert(child_input.get_descriptor_ptr()->get_layout()); + } + } + OPENVINO_ASSERT(child_layouts.size() == 1, "All children of an input expression must have the same layout"); + io_expr->get_output_port_descriptor(0)->set_layout(*child_layouts.begin()); + } else { + io_expr->get_input_port_descriptor(0)->set_layout(target_connector->get_source().get_descriptor_ptr()->get_layout()); + } + } + + return true; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/pass/softmax_decomposition.cpp b/src/common/snippets/src/lowered/pass/softmax_decomposition.cpp new file mode 100644 index 00000000000000..b5ad5864a2e614 --- /dev/null +++ b/src/common/snippets/src/lowered/pass/softmax_decomposition.cpp @@ -0,0 +1,158 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/pass/softmax_decomposition.hpp" + +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/loop_manager.hpp" +#include "snippets/lowered/pass/mark_loops.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/itt.hpp" + +#include "openvino/pass/pattern/op/wrap_type.hpp" +#include "openvino/pass/pattern/matcher.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { +namespace pass { + +SoftmaxDecomposition::SoftmaxDecomposition(size_t vector_size) : m_vector_size{vector_size} {} + +bool SoftmaxDecomposition::run(LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::SoftmaxDecompositionLowered") + bool modified = false; + const auto& loop_manager = linear_ir.get_loop_manager(); + + auto match_softmax = ov::pass::pattern::wrap_type(); + auto matcher = std::make_shared(match_softmax, "SoftmaxDecompositionLowered"); + + for (auto expr_it = linear_ir.begin(); expr_it != linear_ir.end(); expr_it++) { + const auto& op = (*expr_it)->get_node(); + if (matcher->match(op)) { + const auto& pm = matcher->get_pattern_map(); + const auto softmax = pm.at(match_softmax); + const auto softmax_expr = *expr_it; + const auto softmax_loop_ids = softmax_expr->get_loop_ids(); + const auto& input_connector = softmax_expr->get_input_port_connector(0); + const auto& output_connector = softmax_expr->get_output_port_connector(0); + const auto tensor_out = softmax_expr->get_output_port_descriptor(0)->get_shape(); + const auto inner_work_amount = *(tensor_out.rbegin()); + + expr_it = linear_ir.erase(expr_it); // Remove Softmax + + std::vector outer_exprs; + + // We need an iterator to the inserted element + auto push_node = [&linear_ir, &expr_it, &softmax_loop_ids](const std::shared_ptr& n) { + const auto expr = linear_ir.insert(expr_it, n); + (*expr)->set_loop_ids(softmax_loop_ids); + return std::make_pair(expr, n); + }; + + // Note: VectorBuffer is a special case, since it should go before the initial Load. So we handle it separately + const auto& vector_buffer_max = push_node(std::make_shared()); + outer_exprs.push_back(*vector_buffer_max.first); + // ReduceMax loop + const auto& max = push_node(std::make_shared(softmax->get_input_source_output(0), vector_buffer_max.second)); + + const auto horizon_max = push_node(std::make_shared(max.second)); + outer_exprs.push_back(*horizon_max.first); + + // Markup of ReduceMax Loop + loop_manager->mark_loop(max.first, horizon_max.first, 1, inner_work_amount, m_vector_size, + std::vector{(*max.first)->get_input_port(0), + (*max.first)->get_input_port(1)}, + std::vector{(*max.first)->get_output_port(0)}); + + const auto broadcast_horizon_max = push_node( + std::make_shared(horizon_max.second, horizon_max.second->get_input_partial_shape(0))); + const auto vector_buffer_sum = push_node(std::make_shared()); + outer_exprs.push_back(*broadcast_horizon_max.first); + outer_exprs.push_back(*vector_buffer_sum.first); + + // Sub + Exp + ReduceSum Loop + const auto sub = push_node(std::make_shared(softmax->get_input_source_output(0), broadcast_horizon_max.second)); + const auto exp = push_node(std::make_shared(sub.second)); + const auto sum = push_node(std::make_shared(exp.second, vector_buffer_sum.second)); + + const auto horizon_sum = push_node(std::make_shared(sum.second)); + outer_exprs.push_back(*horizon_sum.first); + + // Markup of ReduceMax Loop + loop_manager->mark_loop(sub.first, horizon_sum.first, 1, inner_work_amount, m_vector_size, + std::vector{(*sub.first)->get_input_port(0), + (*sub.first)->get_input_port(1), + (*sum.first)->get_input_port(1)}, + std::vector{(*exp.first)->get_output_port(0), + (*sum.first)->get_output_port(0)}); + + // Divide is expensive operation, so we decompose it into 1 / x * y, where 1 / x is executed outside loop + const auto pow = push_node(std::make_shared(horizon_sum.second, -1.f)); + const auto broadcast_pow = push_node(std::make_shared(pow.second, horizon_sum.second->get_input_partial_shape(0))); + outer_exprs.push_back(*pow.first); + outer_exprs.push_back(*broadcast_pow.first); + + // Mul (pseudo-Divide loop) + const auto mul = push_node(std::make_shared(exp.second, broadcast_pow.second)); + + // Transfer original ExpressionPorts + linear_ir.replace_input((*max.first)->get_input_port(0), input_connector); + linear_ir.replace_input((*sub.first)->get_input_port(0), input_connector); + linear_ir.replace_input(output_connector->get_consumers(), (*mul.first)->get_output_port_connector(0)); + + // Markup of Mul Loop + loop_manager->mark_loop(mul.first, expr_it, 1, inner_work_amount, m_vector_size, + std::vector{(*mul.first)->get_input_port(0), + (*mul.first)->get_input_port(1)}, + std::vector{(*mul.first)->get_output_port(0)}); + + // Markup inner loop for outside expression with null loop id + for (const auto& expr : outer_exprs) { + expr->set_loop_id(Expression::LOOP_NULL_ID, 1); + } + + auto update_loop_bounds = [&softmax_expr](std::vector& points, + const std::vector& new_points, + const LinearIR::LoopManager::LoopInfoPtr& loop_info) { + auto entry_found = std::find_if(points.begin(), points.end(), [&softmax_expr](const ExpressionPort& desc) { + return desc.get_expr() == softmax_expr; + }); + if (entry_found != points.end()) { + entry_found = points.erase(entry_found); + points.insert(entry_found, new_points.begin(), new_points.end()); + } + }; + + // Update Loop info for outer loops + for (auto loop_id : softmax_loop_ids) { + if (loop_id == Expression::LOOP_NULL_ID) + continue; + const auto loop_info = loop_manager->get_loop_info(loop_id); + update_loop_bounds(loop_info->entry_exprs, std::vector{(*max.first)->get_input_port(0), + (*sub.first)->get_input_port(0)}, loop_info); + update_loop_bounds(loop_info->exit_exprs, std::vector{(*mul.first)->get_output_port(0)}, loop_info); + } + + /* =========================================== */ + + /* ============= Runtime Info ================ */ + + // For tail loop we should fill input of Max by float min and + // input of Sum by zero to avoid math incorrect calculations + // TODO [111383]: It should be covered via general pipeline (for example, via analyze in InsertTailLoop?) + max.second->input(0).get_rt_info()["set_fill"] = uint32_t(0xff7fffff); + sum.second->input(0).get_rt_info()["set_fill"] = uint32_t(0x00000000); + modified = true; + } + } + + return modified; +} + +} // namespace pass +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/lowered/port_connector.cpp b/src/common/snippets/src/lowered/port_connector.cpp new file mode 100644 index 00000000000000..105747a9eddacb --- /dev/null +++ b/src/common/snippets/src/lowered/port_connector.cpp @@ -0,0 +1,52 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/port_connector.hpp" + +#include "snippets/itt.hpp" +#include "snippets/utils.hpp" + + +namespace ov { +namespace snippets { +namespace lowered { + +PortConnector::PortConnector(ExpressionPort source_descriptor, const std::set& consumer_descriptors) + : m_source_port(std::move(source_descriptor)), m_consumer_ports(consumer_descriptors) {} + +std::set::const_iterator PortConnector::find_consumer(const ExpressionPort& consumer) const { + // Note: Find by shared ptr and index port is enough since these parameters must be unique + return std::find_if(m_consumer_ports.cbegin(), m_consumer_ports.cend(), + [&consumer](const ExpressionPort& td) { + return consumer.get_expr() == td.get_expr() && consumer.get_index() == td.get_index(); + }); +} + +std::set::iterator PortConnector::find_consumer(const ExpressionPort& consumer) { + // Note: Find by shared ptr and index port is enough since these parameters must be unique + return std::find_if(m_consumer_ports.begin(), m_consumer_ports.end(), + [&consumer](const ExpressionPort& td) { + return consumer.get_expr() == td.get_expr() && consumer.get_index() == td.get_index(); + }); +} + +bool PortConnector::found_consumer(const ExpressionPort& consumer) const { + return find_consumer(consumer) != m_consumer_ports.end(); +} + +void PortConnector::add_consumer(const ExpressionPort& consumer) { + OPENVINO_ASSERT(!found_consumer(consumer), "Consumer has been already added to PortConnector!"); + const auto res = m_consumer_ports.insert(consumer); + OPENVINO_ASSERT(res.second, "Consumer hasn't been added to the PortConnector"); +} + +void PortConnector::remove_consumer(const ExpressionPort& consumer) { + const auto& found = find_consumer(consumer); + OPENVINO_ASSERT(found != m_consumer_ports.end(), "Consumer is missed in PortConnector!"); + m_consumer_ports.erase(found); +} + +}// namespace lowered +}// namespace snippets +}// namespace ov diff --git a/src/common/snippets/src/lowered/port_descriptor.cpp b/src/common/snippets/src/lowered/port_descriptor.cpp new file mode 100644 index 00000000000000..719f77e7a56fb5 --- /dev/null +++ b/src/common/snippets/src/lowered/port_descriptor.cpp @@ -0,0 +1,152 @@ +// Copyright (C) 2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/lowered/port_descriptor.hpp" + +namespace ov { +namespace snippets { +namespace lowered { + +size_t PortDescriptor::ServiceDimensions::FULL_DIM = SIZE_MAX; + +PortDescriptor::PortDescriptor(const ov::Input& in, std::vector subtensor_shape, std::vector layout) + : PortDescriptor(ov::Input(in.get_node(), in.get_index()), std::move(subtensor_shape), std::move(layout)) {} +PortDescriptor::PortDescriptor(const ov::Input& in, std::vector subtensor_shape, std::vector layout) + : PortDescriptor(in.get_shape(), std::move(subtensor_shape), std::move(layout)) {} + +PortDescriptor::PortDescriptor(const ov::Output& out, std::vector subtensor_shape, std::vector layout) + : PortDescriptor(ov::Output(out.get_node(), out.get_index()), std::move(subtensor_shape), std::move(layout)) {} +PortDescriptor::PortDescriptor(const ov::Output& out, std::vector subtensor_shape, std::vector layout) + : PortDescriptor(out.get_shape(), std::move(subtensor_shape), std::move(layout)) {} + +PortDescriptor::PortDescriptor(std::vector shape, std::vector subtensor_shape, std::vector layout) + : m_tensor_shape(std::move(shape)), m_layout(std::move(layout)), m_subtensor_shape(std::move(subtensor_shape)) { + validate_arguments(); +} + +void PortDescriptor::validate_arguments() { + if (!m_tensor_shape.empty() && m_layout.empty()) { + m_layout.resize(m_tensor_shape.size()); + // NCHW layout by default + std::iota(m_layout.begin(), m_layout.end(), 0); + } else if (m_layout.size() != m_tensor_shape.size()) { + OPENVINO_THROW("Snippets tensor descriptor: Layout size must be equal to the shape size"); + } +} + +PortDescriptorPtr PortDescriptor::clone() const { + const auto desc = std::make_shared(m_tensor_shape, m_subtensor_shape, m_layout); + desc->set_reg(m_reg); + return desc; +} + +std::string PortDescriptor::serialize() const { + std::stringstream ss; + ss << m_tensor_shape.size() << " "; + for (auto val : m_tensor_shape) + ss << val << " "; + ss << m_subtensor_shape.size() << " "; + for (auto val : m_subtensor_shape) + ss << val << " "; + ss << m_layout.size() << " "; + for (auto val : m_layout) + ss << val << " "; + return ss.str(); +} +bool operator==(const PortDescriptor& lhs, const PortDescriptor& rhs) { + return lhs.m_tensor_shape == rhs.m_tensor_shape && + lhs.m_layout == rhs.m_layout && + lhs.m_subtensor_shape == rhs.m_subtensor_shape; +} + +void PortDescriptorUtils::init_default(std::vector& in_descs, + std::vector& out_descs, + const std::shared_ptr& node) { + in_descs.resize(node->get_input_size()); + out_descs.resize(node->get_output_size()); + for (size_t i = 0; i < node->get_input_size(); ++i) { + in_descs[i] = std::make_shared(node->input(i)); + } + for (size_t i = 0; i < node->get_output_size(); ++i) { + out_descs[i] = std::make_shared(node->output(i)); + } +} + +void PortDescriptorUtils::set_port_descriptor_ptr(const ov::Input& in, const PortDescriptorPtr& desc) { + const auto& node = in.get_node()->shared_from_this(); + auto& rt_info = node->get_rt_info(); + const auto& key = PortDescriptorVectorAttribute::get_type_info_static(); + const auto& found = rt_info.find(key); + if (found == rt_info.end()) { + std::vector in_descs, out_descs; + init_default(in_descs, out_descs, node); + in_descs[in.get_index()] = desc; + rt_info[key] = PortDescriptorVectorAttribute(in_descs, out_descs); + } else { + auto& in_descs = found->second.as().inputs; + if (in_descs.size() != node->get_input_size()) + OPENVINO_THROW("Set input port descriptor is failed: incorrect count"); + in_descs[in.get_index()] = desc; + } +} + +void PortDescriptorUtils::set_port_descriptor_ptr(const ov::Output& out, const PortDescriptorPtr& desc) { + const auto& node = out.get_node_shared_ptr(); + auto& rt_info = node->get_rt_info(); + const auto& key = PortDescriptorVectorAttribute::get_type_info_static(); + const auto& found = rt_info.find(key); + if (found == rt_info.end()) { + std::vector in_descs, out_descs; + init_default(in_descs, out_descs, node); + out_descs[out.get_index()] = desc; + rt_info[key] = PortDescriptorVectorAttribute(in_descs, out_descs); + } else { + auto& out_descs = found->second.as().outputs; + if (out_descs.size() != node->get_output_size()) + OPENVINO_THROW("Set output port descriptor is failed: incorrect count"); + out_descs[out.get_index()] = desc; + } +} + +PortDescriptorPtr PortDescriptorUtils::get_port_descriptor_ptr(const ov::Input& in) { + return get_port_descriptor_ptr(ov::Input(in.get_node(), in.get_index())); +} +PortDescriptorPtr PortDescriptorUtils::get_port_descriptor_ptr(const ov::Input& in) { + const auto& node = in.get_node(); + auto& rt_info = node->get_rt_info(); + const auto& key = PortDescriptorVectorAttribute::get_type_info_static(); + const auto& found = rt_info.find(key); + if (found == rt_info.end()) { + return std::make_shared(in); + } + const auto& in_descs = found->second.as().inputs; + if (in_descs.size() != node->get_input_size()) + OPENVINO_THROW("Get input port descriptor is failed: incorrect count"); + return in_descs[in.get_index()]; +} + +PortDescriptorPtr PortDescriptorUtils::get_port_descriptor_ptr(const Output& out) { + return get_port_descriptor_ptr(ov::Output(out.get_node(), out.get_index())); +} +PortDescriptorPtr PortDescriptorUtils::get_port_descriptor_ptr(const Output& out) { + const auto& node = out.get_node(); + const auto& rt_info = node->get_rt_info(); + const auto& key = PortDescriptorVectorAttribute::get_type_info_static(); + const auto& found = rt_info.find(key); + if (found == rt_info.end()) { + return std::make_shared(out); + } + const auto& out_descs = found->second.as().outputs; + if (out_descs.size() != node->get_output_size()) + OPENVINO_THROW("Get output port descriptor is failed: incorrect count"); + return out_descs[out.get_index()]; +} + +void PortDescriptorUtils::clean(const std::shared_ptr& node) { + auto& rt_info = node->get_rt_info(); + rt_info.erase(PortDescriptorVectorAttribute::get_type_info_static()); +} +} // namespace lowered +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/brgemm.cpp b/src/common/snippets/src/op/brgemm.cpp index d37d17aa604b25..e02e0699a80b53 100644 --- a/src/common/snippets/src/op/brgemm.cpp +++ b/src/common/snippets/src/op/brgemm.cpp @@ -1,48 +1,65 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include "snippets/itt.hpp" #include "snippets/op/brgemm.hpp" -#include "ngraph/runtime/host_tensor.hpp" -#include "openvino/core/rt_info.hpp" + +#include "snippets/itt.hpp" #include "snippets/utils.hpp" -namespace ngraph { +#include "openvino/core/rt_info.hpp" + +namespace ov { namespace snippets { namespace op { Brgemm::Brgemm(const Output& A, const Output& B, - const size_t offset_a, const size_t offset_b, const size_t offset_c) : MemoryAccess({A, B}, 2, 1) { + const size_t offset_a, const size_t offset_b, const size_t offset_c, + std::vector layout_a, std::vector layout_b, std::vector layout_c) + : MemoryAccess({A, B}, std::set{0, 1}, std::set{0}) { set_output_size(1); set_input_offset(offset_a, 0); set_input_offset(offset_b, 1); - set_output_offset(offset_a, 0); - constructor_validate_and_infer_types(); + set_output_offset(offset_c, 0); + custom_constructor_validate_and_infer_types(std::move(layout_a), std::move(layout_b), std::move(layout_c)); } -void Brgemm::validate_and_infer_types() { - INTERNAL_OP_SCOPE(Brgemm_validate_and_infer_types); +void Brgemm::custom_constructor_validate_and_infer_types(std::vector layout_a, std::vector layout_b, std::vector layout_c) { + INTERNAL_OP_SCOPE(BrgemmCPU_constructor_validate_and_infer_types); + validate_inputs(); + + // During ctor call, Brgemm doesn't know his port descriptors. + // So we use explicit layouts from parameters + const auto planar_input_shapes = + std::vector{ ov::snippets::utils::get_reordered_planar_shape(get_input_partial_shape(0), layout_a), + ov::snippets::utils::get_reordered_planar_shape(get_input_partial_shape(1), layout_b) }; + auto output_shape = get_output_partial_shape(planar_input_shapes); + set_output_type(0, get_output_type(), ov::snippets::utils::get_reordered_planar_shape(output_shape, layout_c)); +} + +void Brgemm::validate_inputs() const { // If no leading dimensions are provided, assume dense row-major inputs-outputs NODE_VALIDATION_CHECK(this, get_input_partial_shape(0).is_static() && get_input_partial_shape(1).is_static(), "Brgemm currently supports only static shapes."); +} - std::vector planar_input_shapes = { - utils::get_port_planar_shape(input_value(0)), - utils::get_port_planar_shape(input_value(1)) - }; +void Brgemm::validate_and_infer_types() { + INTERNAL_OP_SCOPE(Brgemm_validate_and_infer_types); + validate_inputs(); + const auto planar_input_shapes = get_planar_input_shapes(inputs()); auto output_shape = get_output_partial_shape(planar_input_shapes); - const auto& output_layout = utils::get_node_output_layout(this); - set_output_type(0, - get_output_type(), - utils::get_reordered_planar_shape(output_shape, output_layout)); + set_output_type(0, get_output_type(), get_planar_output_shape(output_shape)); } std::shared_ptr Brgemm::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(Brgemm_clone_with_new_inputs); check_new_args_count(this, new_args); - return std::make_shared(new_args.at(0), new_args.at(1), get_offset_a(), get_offset_b(), get_offset_c()); + return std::make_shared(new_args.at(0), new_args.at(1), + get_offset_a(), get_offset_b(), get_offset_c(), + lowered::PortDescriptorUtils::get_port_descriptor_ptr(input(0))->get_layout(), + lowered::PortDescriptorUtils::get_port_descriptor_ptr(input(1))->get_layout(), + lowered::PortDescriptorUtils::get_port_descriptor_ptr(output(0))->get_layout()); } ov::element::Type Brgemm::get_output_type() const { @@ -63,6 +80,26 @@ ov::element::Type Brgemm::get_output_type() const { } } +std::vector Brgemm::get_planar_input_shapes(const std::vector>& inputs) const { + OPENVINO_ASSERT(inputs.size() == 2, "Brgemm::get_planar_input_shapes() expects 2 inputs"); + return { utils::get_port_planar_shape(inputs[0]), utils::get_port_planar_shape(inputs[1]) }; +} + +ov::PartialShape Brgemm::get_planar_output_shape(const ov::PartialShape& output_shape) const { + // This method can be safely called from validate_and_infer_types() before output creation + const auto& key = lowered::PortDescriptorVectorAttribute::get_type_info_static(); + auto& rt_info = get_rt_info(); + const auto& found = rt_info.find(key); + if (found != rt_info.end()) { + const auto& out_descs = found->second.as().outputs; + if (out_descs.size() != get_output_size()) + OPENVINO_THROW("Get output port descriptor is failed: incorrect count"); + const auto& port_desc = out_descs[0]; + return utils::get_reordered_planar_shape(output_shape, port_desc->get_layout()); + } + return output_shape; +} + ov::PartialShape Brgemm::get_output_partial_shape(const std::vector& input_shapes) const { NGRAPH_CHECK(input_shapes.size() == 2, "BRGEMM expects 2 input shapes for shape inference"); @@ -128,4 +165,4 @@ ov::PartialShape Brgemm::get_output_partial_shape(const std::vector - #include "snippets/op/broadcastload.hpp" +#include "snippets/itt.hpp" #include -using namespace std; -using namespace ngraph; +namespace ov { +namespace snippets { +namespace op { -snippets::op::BroadcastLoad::BroadcastLoad(const Output& x, ov::PartialShape shape, size_t offset) - : MemoryAccess({x}, 1, 0), output_shape(std::move(shape)) { +BroadcastLoad::BroadcastLoad(const Output& x, ov::PartialShape shape, size_t offset) + : MemoryAccess({x}, std::set{0}, std::set{}), output_shape(std::move(shape)) { set_input_port_descriptor({1, offset}, 0); constructor_validate_and_infer_types(); } -bool snippets::op::BroadcastLoad::visit_attributes(AttributeVisitor& visitor) { +bool BroadcastLoad::visit_attributes(AttributeVisitor& visitor) { MemoryAccess::visit_attributes(visitor); return true; } -std::shared_ptr snippets::op::BroadcastLoad::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr BroadcastLoad::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(BroadcastLoad); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0), output_shape, get_offset()); } -void snippets::op::BroadcastLoad::validate_and_infer_types() { +void BroadcastLoad::validate_and_infer_types() { + // BroadcastLoad has memory access port only on input + const auto input_ma_ports = get_memory_access_input_ports(); + const auto output_ma_ports = get_memory_access_output_ports(); + OPENVINO_ASSERT(input_ma_ports.size() == 1 && is_memory_access_input_port(0), "BroadcastLoad node must have memory access input port"); + OPENVINO_ASSERT(output_ma_ports.size() == 0, "BroadcastLoad node mustn't have memory access output port"); set_output_type(0, get_input_element_type(0), output_shape); } + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/broadcastmove.cpp b/src/common/snippets/src/op/broadcastmove.cpp index 17910d3c642bad..12242d4ba76c44 100644 --- a/src/common/snippets/src/op/broadcastmove.cpp +++ b/src/common/snippets/src/op/broadcastmove.cpp @@ -2,31 +2,34 @@ // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/broadcastmove.hpp" -#include -#include -using namespace std; -using namespace ngraph; +namespace ov { +namespace snippets { +namespace op { -snippets::op::BroadcastMove::BroadcastMove(const Output& x, ov::PartialShape shape) : Op({x}), output_shape(std::move(shape)) { +BroadcastMove::BroadcastMove(const Output& x, ov::PartialShape shape) : Op({x}), output_shape(std::move(shape)) { constructor_validate_and_infer_types(); } -bool snippets::op::BroadcastMove::visit_attributes(AttributeVisitor& visitor) { +bool BroadcastMove::visit_attributes(AttributeVisitor& visitor) { visitor.on_attribute("output_shape", output_shape); return true; } -std::shared_ptr snippets::op::BroadcastMove::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr BroadcastMove::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(BroadcastMove); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0), output_shape); } -void snippets::op::BroadcastMove::validate_and_infer_types() { +void BroadcastMove::validate_and_infer_types() { set_output_type(0, get_input_element_type(0), this->output_shape); -} \ No newline at end of file +} + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/buffer.cpp b/src/common/snippets/src/op/buffer.cpp index 6684c66974175e..c1cecddd86228d 100644 --- a/src/common/snippets/src/op/buffer.cpp +++ b/src/common/snippets/src/op/buffer.cpp @@ -1,34 +1,32 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include #include "snippets/op/buffer.hpp" -#include "snippets/snippets_isa.hpp" + +#include "snippets/itt.hpp" #include "snippets/utils.hpp" -using namespace std; -using namespace ngraph; +namespace ov { +namespace snippets { +namespace op { -auto normalize_rank(int32_t allocation_rank, const size_t shape_rank) -> int32_t { - return allocation_rank < 0 ? allocation_rank + static_cast(shape_rank) : allocation_rank; -} -snippets::op::Buffer::Buffer(const ov::Shape& shape) - : Op(), m_type(Type::NewMemory), m_shape(shape) { +Buffer::Buffer(const ov::Shape& shape, size_t id) + : Op(), m_type(Type::NewMemory), m_shape(shape), m_offset(0), m_id(id) { constructor_validate_and_infer_types(); } -snippets::op::Buffer::Buffer(const ov::Output& arg, const ov::Shape& shape) - : Op({arg}), m_type(Type::IntermediateMemory), m_shape(shape) { +Buffer::Buffer(const ov::Output& arg, const ov::Shape& shape, size_t id) + : Op({arg}), m_type(Type::IntermediateMemory), m_shape(shape), m_offset(0), m_id(id) { constructor_validate_and_infer_types(); } -snippets::op::Buffer::Buffer(const ov::Output& arg, int32_t allocation_rank) - : Op({arg}), m_type(Type::IntermediateMemory) { - const auto pshape = arg.get_partial_shape(); +Buffer::Buffer(const ov::Output& arg, int32_t allocation_rank, size_t id) + : Op({arg}), m_type(Type::IntermediateMemory), m_offset(0), m_id(id) { + const auto& pshape = arg.get_partial_shape(); OPENVINO_ASSERT(pshape.is_static(), "Buffer supports only static input shape"); const auto shape = pshape.get_shape(); const auto normalize_rank = utils::normalize_rank(static_cast(allocation_rank), shape.size()); @@ -37,13 +35,15 @@ snippets::op::Buffer::Buffer(const ov::Output& arg, int32_t allocation constructor_validate_and_infer_types(); } -bool snippets::op::Buffer::visit_attributes(AttributeVisitor& visitor) { +bool Buffer::visit_attributes(AttributeVisitor& visitor) { INTERNAL_OP_SCOPE(Buffer_visit_attributes); visitor.on_attribute("allocation_shape", m_shape); + visitor.on_attribute("offset", m_offset); + visitor.on_attribute("id", m_id); return true; } -void snippets::op::Buffer::validate_and_infer_types() { +void Buffer::validate_and_infer_types() { INTERNAL_OP_SCOPE(Buffer_validate_and_infer_types); ov::element::Type output_type; ov::Shape output_shape; @@ -52,7 +52,7 @@ void snippets::op::Buffer::validate_and_infer_types() { output_shape = m_shape; output_type = ov::element::u8; // 1Byte } else if (m_type == Type::IntermediateMemory) { - const auto input_shape = get_input_partial_shape(0); + const auto& input_shape = get_input_partial_shape(0); OPENVINO_ASSERT(input_shape.is_static(), "Buffer supports only static input shape"); output_type = get_input_element_type(0); output_shape = input_shape.get_shape(); @@ -62,18 +62,26 @@ void snippets::op::Buffer::validate_and_infer_types() { set_output_type(0, output_type, output_shape); } -std::shared_ptr snippets::op::Buffer::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr Buffer::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(Buffer_clone_with_new_inputs); check_new_args_count(this, new_args); + std::shared_ptr new_buffer = nullptr; if (m_type == Type::NewMemory) { - return std::make_shared(m_shape); + new_buffer = std::make_shared(m_shape, m_id); } else if (m_type == Type::IntermediateMemory) { - return std::make_shared(new_args.at(0), m_shape); + new_buffer = std::make_shared(new_args.at(0), m_shape, m_id); + } else { + OPENVINO_THROW("Buffer supports only the following types: NewMemory and IntermediateMemory"); } - OPENVINO_THROW("Buffer supports only the following types: NewMemory and IntermediateMemory"); + new_buffer->m_offset = m_offset; + return new_buffer; } -size_t ngraph::snippets::op::Buffer::get_byte_size() const { +size_t Buffer::get_byte_size() const { const auto shape = get_allocation_shape(); - return ngraph::shape_size(shape) * get_element_type().size(); + return ov::shape_size(shape) * get_element_type().size(); } + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/convert_saturation.cpp b/src/common/snippets/src/op/convert_saturation.cpp index 4b2a24053535d7..eab5ba76e484f2 100644 --- a/src/common/snippets/src/op/convert_saturation.cpp +++ b/src/common/snippets/src/op/convert_saturation.cpp @@ -6,14 +6,11 @@ #include "snippets/op/convert_saturation.hpp" -#include "ngraph/runtime/host_tensor.hpp" - - -ngraph::snippets::op::ConvertSaturation::ConvertSaturation(const Output& x, const ov::element::Type& destination_type) +ov::snippets::op::ConvertSaturation::ConvertSaturation(const Output& x, const ov::element::Type& destination_type) : ov::op::v0::Convert({x}, destination_type) { } -std::shared_ptr ngraph::snippets::op::ConvertSaturation::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr ov::snippets::op::ConvertSaturation::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(ConvertSaturation_clone_with_new_inputs); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0), m_destination_type); diff --git a/src/common/snippets/src/op/convert_truncation.cpp b/src/common/snippets/src/op/convert_truncation.cpp index 261010c0d418dc..a117ed295d80c6 100644 --- a/src/common/snippets/src/op/convert_truncation.cpp +++ b/src/common/snippets/src/op/convert_truncation.cpp @@ -6,13 +6,11 @@ #include "snippets/op/convert_truncation.hpp" -#include "ngraph/runtime/host_tensor.hpp" - -ngraph::snippets::op::ConvertTruncation::ConvertTruncation(const Output& x, const ov::element::Type& destination_type) +ov::snippets::op::ConvertTruncation::ConvertTruncation(const Output& x, const ov::element::Type& destination_type) : ov::op::v0::Convert({x}, destination_type) { } -std::shared_ptr ngraph::snippets::op::ConvertTruncation::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr ov::snippets::op::ConvertTruncation::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(ConvertTruncation_clone_with_new_inputs); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0), m_destination_type); diff --git a/src/common/snippets/src/op/fill.cpp b/src/common/snippets/src/op/fill.cpp index ac93a501aad5ce..437f594cdfc519 100644 --- a/src/common/snippets/src/op/fill.cpp +++ b/src/common/snippets/src/op/fill.cpp @@ -1,38 +1,42 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/fill.hpp" -#include -using namespace std; -using namespace ngraph; +namespace ov { +namespace snippets { +namespace op { -snippets::op::Fill::Fill(const Output& x, const size_t offset, const uint32_t fill_value) +Fill::Fill(const Output& x, const size_t offset, const uint32_t fill_value) : Op({x}), m_offset(offset), m_fill_value(fill_value) { constructor_validate_and_infer_types(); } -bool snippets::op::Fill::visit_attributes(AttributeVisitor& visitor) { +bool Fill::visit_attributes(AttributeVisitor& visitor) { INTERNAL_OP_SCOPE(Fill_visit_attributes); visitor.on_attribute("offset", m_offset); visitor.on_attribute("fill_value", m_fill_value); return true; } -std::shared_ptr snippets::op::Fill::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr Fill::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(Fill_clone_with_new_inputs); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0), m_offset, m_fill_value); } -void snippets::op::Fill::validate_and_infer_types() { +void Fill::validate_and_infer_types() { INTERNAL_OP_SCOPE(Fill_validate_and_infer_types); const auto in_type = get_input_element_type(0); NGRAPH_CHECK(in_type.size() == 4, "Fill operation supports only element types with 4 byte size but got:" + std::to_string(in_type.size())); set_output_type(0, get_input_element_type(0), get_input_partial_shape(0)); } +} // namespace op +} // namespace snippets +} // namespace ov + diff --git a/src/common/snippets/src/op/horizon_max.cpp b/src/common/snippets/src/op/horizon_max.cpp index 37e6e3f3c55daa..a5e92e49ac558e 100644 --- a/src/common/snippets/src/op/horizon_max.cpp +++ b/src/common/snippets/src/op/horizon_max.cpp @@ -1,24 +1,25 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/horizon_max.hpp" -using namespace std; -using namespace ngraph; +namespace ov { +namespace snippets { +namespace op { -snippets::op::HorizonMax::HorizonMax(const Output& x) : Op({x}) { +HorizonMax::HorizonMax(const Output& x) : Op({x}) { constructor_validate_and_infer_types(); } -std::shared_ptr snippets::op::HorizonMax::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr HorizonMax::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(HorizonMax_clone_with_new_inputs); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0)); } -void snippets::op::HorizonMax::validate_and_infer_types() { +void HorizonMax::validate_and_infer_types() { INTERNAL_OP_SCOPE(HorizonMax_validate_and_infer_types); auto new_shape = get_input_partial_shape(0); if (!ov::is_scalar(new_shape)) { @@ -26,3 +27,7 @@ void snippets::op::HorizonMax::validate_and_infer_types() { } set_output_type(0, get_input_element_type(0), new_shape); } + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/horizon_sum.cpp b/src/common/snippets/src/op/horizon_sum.cpp index fa791dec2342f3..da0b863842b2af 100644 --- a/src/common/snippets/src/op/horizon_sum.cpp +++ b/src/common/snippets/src/op/horizon_sum.cpp @@ -1,24 +1,25 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/horizon_sum.hpp" -using namespace std; -using namespace ngraph; +namespace ov { +namespace snippets { +namespace op { -snippets::op::HorizonSum::HorizonSum(const Output& x) : Op({x}) { +HorizonSum::HorizonSum(const Output& x) : Op({x}) { constructor_validate_and_infer_types(); } -std::shared_ptr snippets::op::HorizonSum::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr HorizonSum::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(HorizonSum_clone_with_new_inputs); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0)); } -void snippets::op::HorizonSum::validate_and_infer_types() { +void HorizonSum::validate_and_infer_types() { INTERNAL_OP_SCOPE(HorizonSum_validate_and_infer_types); auto new_shape = get_input_partial_shape(0); if (!ov::is_scalar(new_shape)) { @@ -26,3 +27,7 @@ void snippets::op::HorizonSum::validate_and_infer_types() { } set_output_type(0, get_input_element_type(0), new_shape); } + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/kernel.cpp b/src/common/snippets/src/op/kernel.cpp index 7003d3ba28c11e..9a5ae03bbad0db 100644 --- a/src/common/snippets/src/op/kernel.cpp +++ b/src/common/snippets/src/op/kernel.cpp @@ -3,16 +3,13 @@ // #include "snippets/op/kernel.hpp" -#include "snippets/generator.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { -Kernel::Kernel(std::vector nested, std::shared_ptr m) -: Op(), region(std::move(nested)), model(std::move(m)) { -} +Kernel::Kernel(lowered::LinearIR nested) : Op(), region(std::move(nested)) {} } // namespace op } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/src/op/load.cpp b/src/common/snippets/src/op/load.cpp index f1f5bc42c7a3da..84dfb000e1c5e2 100644 --- a/src/common/snippets/src/op/load.cpp +++ b/src/common/snippets/src/op/load.cpp @@ -2,25 +2,31 @@ // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/load.hpp" -#include -namespace ngraph { +namespace ov { namespace snippets { namespace op { -Load::Load(const Output& x, const size_t count, const size_t offset) : MemoryAccess({x}, 1, 0) { +Load::Load(const Output& x, const size_t count, const size_t offset) + : MemoryAccess({x}, std::set{0}, std::set{}) { set_input_port_descriptor({count, offset}, 0); constructor_validate_and_infer_types(); } -void snippets::op::Load::validate_and_infer_types() { +void Load::validate_memory_access_params() const { // Load has memory access port only on output - OPENVINO_ASSERT(get_input_port_count() == 1, "Load node must have memory access input port"); - OPENVINO_ASSERT(get_output_port_count() == 0, "Load node mustn't have memory access output port"); + const auto input_ma_ports = get_memory_access_input_ports(); + const auto output_ma_ports = get_memory_access_output_ports(); + OPENVINO_ASSERT(input_ma_ports.size() == 1 && is_memory_access_input_port(0), "Load node must have memory access input port"); + OPENVINO_ASSERT(output_ma_ports.size() == 0, "Load node mustn't have memory access output port"); +} + +void Load::validate_and_infer_types() { + validate_memory_access_params(); set_output_type(0, get_input_element_type(0), get_input_partial_shape(0)); } @@ -40,12 +46,11 @@ LoadReshape::LoadReshape(const Output& x, const size_t count, const si *std::min_element(m_order.begin(), m_order.end()) == 0, "LoadReshape detected invalid values in new_order"); const std::set unique_dims(order.begin(), order.end()); NGRAPH_CHECK(unique_dims.size() == order.size(), "LoadReshape order must not contain repeated elements"); - m_input_ports.resize(get_input_size()); - set_input_port_descriptor({count, offset}, 0); constructor_validate_and_infer_types(); } -void snippets::op::LoadReshape::validate_and_infer_types() { +void LoadReshape::validate_and_infer_types() { + validate_memory_access_params(); const auto& old_shape = get_input_partial_shape(0); ov::PartialShape new_shape; for (const auto idx : m_order) @@ -53,13 +58,13 @@ void snippets::op::LoadReshape::validate_and_infer_types() { set_output_type(0, get_input_element_type(0), new_shape); } -bool snippets::op::LoadReshape::visit_attributes(AttributeVisitor& visitor) { +bool LoadReshape::visit_attributes(AttributeVisitor& visitor) { Load::visit_attributes(visitor); visitor.on_attribute("order", m_order); return true; } -std::shared_ptr snippets::op::LoadReshape::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr LoadReshape::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(LoadReshape); check_new_args_count(this, new_args); return std::make_shared(new_args.at(0), get_count(), get_offset(), m_order); @@ -67,4 +72,4 @@ std::shared_ptr snippets::op::LoadReshape::clone_with_new_inputs(const Out }// namespace op }// namespace snippets -}// namespace ngraph +}// namespace ov diff --git a/src/common/snippets/src/op/loop.cpp b/src/common/snippets/src/op/loop.cpp index c8c704fd350913..2204efb05bf388 100644 --- a/src/common/snippets/src/op/loop.cpp +++ b/src/common/snippets/src/op/loop.cpp @@ -1,76 +1,41 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #include "snippets/op/loop.hpp" #include "snippets/generator.hpp" -using namespace std; -namespace ngraph { +namespace ov { namespace snippets { namespace op { -LoopBase::LoopBase(const std::vector> &args, size_t work_amount, size_t increment) - : Op(args), work_amount(work_amount), work_amount_increment(increment), evaluate_once(false) { +LoopBase::LoopBase(const std::vector> &args) : Op(args) { } -bool LoopBase::visit_attributes(AttributeVisitor &visitor) { - visitor.on_attribute("work_amount", work_amount); - visitor.on_attribute("increment", work_amount_increment); - return true; -} - -size_t LoopBase::get_work_amount() const { - return work_amount; -} - -bool LoopBase::get_evaluate_once() const { - return evaluate_once; -} - -size_t LoopBase::get_increment() const { - return work_amount_increment; -} - -LoopBegin::LoopBegin(const std::vector> &args, size_t work_amount, size_t work_amount_increment) - : LoopBase(args, work_amount, work_amount_increment), - begin_address(nullptr), input_regs({}) { - // We can only call a reduced validate_and_infer types from the constructor, since LoopEnd might not be attached - // to the LoopBegin at this point (which is usually the case: create LoopBegin first => then attach LoopEnd to it) - validate_and_infer_types_except_LoopEnd(); -} - -LoopBegin::LoopBegin(const std::vector> &args) - : LoopBase(args, 0, 0), begin_address(nullptr), input_regs({}) { +LoopBegin::LoopBegin() : LoopBase(), begin_address(nullptr) { validate_and_infer_types_except_LoopEnd(); } std::shared_ptr LoopBegin::clone_with_new_inputs(const OutputVector& inputs) const { - return std::shared_ptr(new LoopBegin(inputs, work_amount, work_amount_increment)); + return std::make_shared(); } - void LoopBegin::validate_and_infer_types_except_LoopEnd() { - const size_t num_inputs = get_input_size(); - set_output_size(num_inputs + 1); - // All outputs are by-passed from inputs, except for the last one - it connects LoopBegin and LoopEnd - for (size_t i = 0; i < num_inputs; i++) - get_output_descriptor(i).set_tensor_ptr(get_input_descriptor(i).get_output().get_tensor_ptr()); - set_output_type(num_inputs, element::f32, ov::PartialShape{ov::Shape{}}); + NODE_VALIDATION_CHECK(this, get_input_size() == 0, "LoopBegin doen't expect any inputs"); + set_output_type(0, element::f32, ov::PartialShape{ov::Shape{}}); } void LoopBegin::validate_and_infer_types() { validate_and_infer_types_except_LoopEnd(); - const auto& last_output_inputs = output(get_output_size() - 1).get_target_inputs(); - NODE_VALIDATION_CHECK(this, last_output_inputs.size() == 1, "LoopBegin must have exactly one input attached to the last output"); + OPENVINO_ASSERT(get_output_size() == 1, "LoopBegin must have only one output"); + const auto& last_output_inputs = get_output_target_inputs(0); + OPENVINO_ASSERT(last_output_inputs.size() == 1, "LoopBegin must have exactly one input attached to the last output"); const auto& loop_end = ov::as_type_ptr(last_output_inputs.begin()->get_node()->shared_from_this()); - NODE_VALIDATION_CHECK(this, loop_end != nullptr, "LoopBegin must have LoopEnd connected to its last output"); - work_amount = loop_end->get_work_amount(); - work_amount_increment = loop_end->get_increment(); + OPENVINO_ASSERT(loop_end != nullptr, "LoopBegin must have LoopEnd connected to its last output"); } -std::shared_ptr LoopBegin::get_loop_end() { - const auto& last_output_inputs = output(get_output_size() - 1).get_target_inputs(); +std::shared_ptr LoopBegin::get_loop_end() const { + const auto& last_output_inputs = get_output_target_inputs(0); if (last_output_inputs.size() != 1) throw std::invalid_argument("LoopBegin has more than one inputs attached to the last output"); const auto& loop_end = ov::as_type_ptr(last_output_inputs.begin()->get_node()->shared_from_this()); @@ -79,32 +44,50 @@ std::shared_ptr LoopBegin::get_loop_end() { return loop_end; } -LoopEnd::LoopEnd(const std::vector> &args, size_t work_amount, size_t work_amount_increment, - std::vector apply_increments, std::vector finalization_offsets) - : LoopBase(args, work_amount, work_amount_increment), +bool LoopBegin::visit_attributes(AttributeVisitor &visitor) { + return true; +} + +LoopEnd::LoopEnd(const Output& loop_begin, size_t work_amount, size_t work_amount_increment, + std::vector apply_increments, std::vector finalization_offsets, + std::vector element_type_sizes, size_t input_num, size_t output_num) + : LoopBase({loop_begin}), has_outer_loop(true), finalization_offsets(std::move(finalization_offsets)), - loop_io_size(0) { + element_type_sizes(std::move(element_type_sizes)), + work_amount(work_amount), + work_amount_increment(work_amount_increment), + input_num(input_num), + output_num(output_num), + evaluate_once(false) { ptr_increments.resize(apply_increments.size()); std::transform(apply_increments.begin(), apply_increments.end(), ptr_increments.begin(), - [work_amount_increment](bool apply) { - return apply ? work_amount_increment : 0; + [](bool apply) { + return apply ? 1 : 0; }); constructor_validate_and_infer_types(); } -LoopEnd::LoopEnd(const std::vector> &args, size_t work_amount, size_t work_amount_increment, - std::vector ptr_increments, std::vector finalization_offsets) - : LoopBase(args, work_amount, work_amount_increment), - has_outer_loop(true), - ptr_increments(std::move(ptr_increments)), - finalization_offsets(std::move(finalization_offsets)), - loop_io_size(0) { +LoopEnd::LoopEnd(const Output& loop_begin, size_t work_amount, size_t work_amount_increment, + std::vector ptr_increments, std::vector finalization_offsets, + std::vector element_type_sizes, size_t input_num, size_t output_num) + : LoopBase({loop_begin}), + has_outer_loop(true), + ptr_increments(std::move(ptr_increments)), + finalization_offsets(std::move(finalization_offsets)), + element_type_sizes(std::move(element_type_sizes)), + work_amount(work_amount), + work_amount_increment(work_amount_increment), + input_num(input_num), + output_num(output_num), + evaluate_once(false) { constructor_validate_and_infer_types(); } std::shared_ptr LoopEnd::clone_with_new_inputs(const OutputVector& inputs) const { - return std::make_shared(inputs, work_amount, work_amount_increment, ptr_increments, finalization_offsets); + check_new_args_count(this, inputs); + return std::make_shared(inputs.at(0), work_amount, work_amount_increment, ptr_increments, + finalization_offsets, element_type_sizes, input_num, output_num); } std::shared_ptr LoopEnd::get_loop_begin() { @@ -122,15 +105,27 @@ const std::vector& LoopEnd::get_ptr_increments()const { return ptr_increments; } +const std::vector& LoopEnd::get_element_type_sizes() const { + return element_type_sizes; +} + +size_t LoopEnd::get_input_num() const { + return input_num; +} + +size_t LoopEnd::get_output_num() const { + return output_num; +} + void LoopEnd::set_finalization_offsets(std::vector offsets) { - if (offsets.size() != loop_io_size) - throw std::invalid_argument("LoopEnd set_finalization_offsets is called with inconsistent offsets.size()"); + OPENVINO_ASSERT(offsets.size() == input_num + output_num, + "LoopEnd set_finalization_offsets is called with inconsistent offsets.size()"); finalization_offsets = std::move(offsets); } void LoopEnd::set_ptr_increments(std::vector new_ptr_increments) { - if (new_ptr_increments.size() != loop_io_size) - throw std::invalid_argument("LoopEnd set_ptr_increments is called with inconsistent new_ptr_increments.size()"); + OPENVINO_ASSERT(new_ptr_increments.size() == input_num + output_num, + "LoopEnd set_finalization_offsets is called with inconsistent offsets.size()"); ptr_increments = std::move(new_ptr_increments); } @@ -143,44 +138,54 @@ void LoopEnd::update_ptr_increments(int64_t new_increment) { void LoopEnd::set_work_amount(size_t new_work_amount) { work_amount = new_work_amount; - // Update LoopBegin to maintain consistency between the Loops - get_loop_begin()->work_amount = new_work_amount; } void LoopEnd::set_increment(size_t new_increment) { work_amount_increment = new_increment; - // Update LoopBegin to maintain consistency between the Loops - get_loop_begin()->work_amount_increment = new_increment; } void LoopEnd::set_evaluate_once(bool once) { evaluate_once = once; - // Update LoopBegin to maintain consistency between the Loops - get_loop_begin()->evaluate_once = once; } void LoopEnd::validate_and_infer_types() { - const size_t num_inputs = get_input_size(); - const auto loop_begin = ov::as_type_ptr(input(get_input_size() - 1).get_source_output().get_node_shared_ptr()); + NODE_VALIDATION_CHECK(this, get_input_size() == 1, "LoopEnd must have one input"); + const auto loop_begin = ov::as_type_ptr(get_input_node_shared_ptr(0)); + const auto io_size = input_num + output_num; NODE_VALIDATION_CHECK(this, loop_begin != nullptr, "LoopEnd must have LoopBegin as the last argument"); - // Note: have to -2 because the LoopBegin->LoopEnd edge is counted twice - loop_io_size = get_input_size() + loop_begin->get_output_size() - 2; - NODE_VALIDATION_CHECK(this, ptr_increments.empty() || ptr_increments.size() == loop_io_size, + NODE_VALIDATION_CHECK(this, ptr_increments.empty() || ptr_increments.size() == io_size, "ptr_increments must be either empty or defined per every input & output of joined Loop. Expected size: ", - loop_io_size, " got ", ptr_increments.size()); - NODE_VALIDATION_CHECK(this, finalization_offsets.empty() || finalization_offsets.size() == loop_io_size, + io_size, " got ", ptr_increments.size()); + NODE_VALIDATION_CHECK(this, finalization_offsets.empty() || finalization_offsets.size() == io_size, "finalization_offsets must be either empty or defined per every input & output of joined Loop. Expected size: ", - loop_io_size, " got ", finalization_offsets.size()); + io_size, " got ", finalization_offsets.size()); if (ptr_increments.empty()) - ptr_increments.resize(loop_io_size, static_cast(work_amount_increment)); + ptr_increments.resize(io_size, 1); if (finalization_offsets.empty()) - finalization_offsets.resize(loop_io_size, 0); - set_output_size(num_inputs - 1); - // All outputs are by-passed from inputs, except for the last one - it connects LoopBegin and LoopEnd - for (size_t i = 0; i < num_inputs - 1; i++) - get_output_descriptor(i).set_tensor_ptr(get_input_descriptor(i).get_output().get_tensor_ptr()); + finalization_offsets.resize(io_size, 0); + set_output_type(0, element::f32, ov::PartialShape{ov::Shape{}}); +} + +bool LoopEnd::visit_attributes(AttributeVisitor &visitor) { + visitor.on_attribute("work_amount", work_amount); + visitor.on_attribute("increment", work_amount_increment); + visitor.on_attribute("ptr_incr", ptr_increments); + visitor.on_attribute("fin_offset", finalization_offsets); + return true; +} + +size_t LoopEnd::get_work_amount() const { + return work_amount; +} + +bool LoopEnd::get_evaluate_once() const { + return evaluate_once; +} + +size_t LoopEnd::get_increment() const { + return work_amount_increment; } } // namespace op } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/src/op/memory_access.cpp b/src/common/snippets/src/op/memory_access.cpp index ea0e4649f9e5de..af91db3bf00948 100644 --- a/src/common/snippets/src/op/memory_access.cpp +++ b/src/common/snippets/src/op/memory_access.cpp @@ -1,55 +1,95 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/memory_access.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace op { MemoryAccess::MemoryAccess(const OutputVector& arguments, size_t input_count, size_t output_count) : Op(arguments) { - while (m_input_ports.size() < input_count) { - m_input_ports.push_back({0, 0, m_input_ports.size()}); + auto init_iota_set = [](size_t num) { + if (num == 0) + return std::set{}; + std::vector vec(num); + std::iota(vec.begin(), vec.end(), 0); + return std::set(vec.begin(), vec.end()); + }; + ctor_initialize(init_iota_set(input_count), init_iota_set(output_count)); +} + +MemoryAccess::MemoryAccess(const OutputVector& arguments, const std::set& input_ports, const std::set& output_ports) : Op(arguments) { + ctor_initialize(input_ports, output_ports); +} + +void MemoryAccess::ctor_initialize(const std::set& input_ports, const std::set& output_ports) { + for (auto port : input_ports) { + m_input_ports[port] = {0, 0, port}; + } + for (auto port : output_ports) { + m_output_ports[port] = {0, 0, port}; + } +} + +bool MemoryAccess::is_full_memory_access_op() const { + for (size_t i = 0; i < get_input_size(); ++i) { + if (!is_memory_access_input_port(i)) + return false; } - while (m_output_ports.size() < output_count) { - m_output_ports.push_back({0, 0, m_output_ports.size()}); + for (size_t i = 0; i < get_output_size(); ++i) { + if (!is_memory_access_output_port(i)) + return false; } + return true; } bool MemoryAccess::visit_attributes(AttributeVisitor& visitor) { - for (size_t i = 0; i < m_input_ports.size(); ++i) { - auto port = m_input_ports[i]; - visitor.on_attribute("count_in_" + std::to_string(i), port.count); - visitor.on_attribute("offset_in_" + std::to_string(i), port.offset); + for (const auto& p : m_input_ports) { + auto idx = p.first; + auto port = p.second; + visitor.on_attribute("count_in_" + std::to_string(idx), port.count); + visitor.on_attribute("offset_in_" + std::to_string(idx), port.offset); } - for (size_t i = 0; i < m_output_ports.size(); ++i) { - auto port = m_output_ports[i]; - visitor.on_attribute("count_out_" + std::to_string(i), port.count); - visitor.on_attribute("offset_out_" + std::to_string(i), port.offset); + for (const auto& p : m_output_ports) { + auto idx = p.first; + auto port = p.second; + visitor.on_attribute("count_out_" + std::to_string(idx), port.count); + visitor.on_attribute("offset_out_" + std::to_string(idx), port.offset); } return true; } +bool MemoryAccess::is_memory_access_input_port(size_t idx) const { + return m_input_ports.find(idx) != m_input_ports.end(); +} +bool MemoryAccess::is_memory_access_output_port(size_t idx) const { + return m_output_ports.find(idx) != m_output_ports.end(); +} + void MemoryAccess::set_input_port_descriptor(const PortDescriptor& desc, const size_t i) { - NGRAPH_CHECK(i < m_input_ports.size(), "Index of input port descriptor should be less than count of input ports"); - m_input_ports[i] = { desc.count, desc.offset, i}; + const auto it = m_input_ports.find(i); + NGRAPH_CHECK(it != m_input_ports.end(), "Index of input port descriptor should be less than count of input ports"); + (*it).second = { desc.count, desc.offset, i}; } void MemoryAccess::set_output_port_descriptor(const PortDescriptor& desc, const size_t i) { - NGRAPH_CHECK(i < m_output_ports.size(), "Index of output port descriptor should be less than count of output ports"); - m_output_ports[i] = { desc.count, desc.offset, i}; + const auto it = m_output_ports.find(i); + NGRAPH_CHECK(it != m_output_ports.end(), "Index of output port descriptor should be less than count of output ports"); + (*it).second = { desc.count, desc.offset, i}; } const MemoryAccess::PortDescriptor& MemoryAccess::get_input_port_descriptor(const size_t i) const { - NGRAPH_CHECK(i < m_input_ports.size(), "Index of input port descriptor should be less than count of input ports"); - return m_input_ports[i]; + const auto it = m_input_ports.find(i); + NGRAPH_CHECK(it != m_input_ports.end(), "Index of input port descriptor should be less than count of input ports"); + return (*it).second; } const MemoryAccess::PortDescriptor& MemoryAccess::get_output_port_descriptor(const size_t i) const { - NGRAPH_CHECK(i < m_output_ports.size(), "Index of output port descriptor should be less than count of output ports"); - return m_output_ports[i]; + const auto it = m_output_ports.find(i); + NGRAPH_CHECK(it != m_output_ports.end(), "Index of output port descriptor should be less than count of output ports"); + return (*it).second; } void MemoryAccess::set_input_count(size_t count, size_t idx) { @@ -79,4 +119,4 @@ size_t MemoryAccess::get_output_offset(size_t idx) const { } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/src/op/nop.cpp b/src/common/snippets/src/op/nop.cpp index e9462cecdca594..50be9810d0e0f4 100644 --- a/src/common/snippets/src/op/nop.cpp +++ b/src/common/snippets/src/op/nop.cpp @@ -4,10 +4,9 @@ #include "snippets/op/nop.hpp" -using namespace std; -using namespace ngraph; -snippets::op::Nop::Nop(const OutputVector& arguments, const OutputVector& results) : Op([arguments, results]() -> OutputVector { +ov::snippets::op::Nop::Nop(const OutputVector& arguments, const OutputVector& results) + : Op([arguments, results]() -> OutputVector { OutputVector x; x.insert(x.end(), arguments.begin(), arguments.end()); x.insert(x.end(), results.begin(), results.end()); diff --git a/src/common/snippets/src/op/scalar.cpp b/src/common/snippets/src/op/scalar.cpp index d89ed94b235d3a..029a2e613f28d2 100644 --- a/src/common/snippets/src/op/scalar.cpp +++ b/src/common/snippets/src/op/scalar.cpp @@ -4,15 +4,14 @@ #include "snippets/op/scalar.hpp" -using namespace ngraph; -std::shared_ptr snippets::op::Scalar::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr ov::snippets::op::Scalar::clone_with_new_inputs(const OutputVector& new_args) const { check_new_args_count(this, new_args); return std::make_shared(*this); } // Scalar currently supports only one-element constants, this could be changed in the future -void snippets::op::Scalar::validate_and_infer_types() { +void ov::snippets::op::Scalar::validate_and_infer_types() { Constant::validate_and_infer_types(); auto out_pshape = get_output_partial_shape(0); NODE_VALIDATION_CHECK(this, out_pshape.is_static(), "Scalar supports only static input shapes"); @@ -21,7 +20,7 @@ void snippets::op::Scalar::validate_and_infer_types() { " shape"); } -bool snippets::op::Scalar::visit_attributes(AttributeVisitor& visitor) { +bool ov::snippets::op::Scalar::visit_attributes(AttributeVisitor& visitor) { auto shape = get_output_shape(0); auto type = get_output_element_type(0); auto value = cast_vector(); diff --git a/src/common/snippets/src/op/serialize_node.cpp b/src/common/snippets/src/op/serialize_node.cpp new file mode 100644 index 00000000000000..3672b2203a77a1 --- /dev/null +++ b/src/common/snippets/src/op/serialize_node.cpp @@ -0,0 +1,63 @@ +// Copyright (C) 2018-2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/op/serialization_node.hpp" + + +namespace ov { +namespace snippets { +namespace op { + +SerializationNode::SerializationNode(const Output &arg, const std::shared_ptr &expr) + : Op({arg}), m_expr(expr) { + if (!m_expr || !m_expr->get_node()) + OPENVINO_THROW("SerializationNode requires a valid expression with non-null node pointer"); + const auto &node = expr->get_node(); + std::string type = node->get_type_name(); + std::string name = node->get_friendly_name(); + // If node is a parameter, show another type name, so the node will be displayed correctly + get_rt_info()["layerType"] = type == "Parameter" ? "ParameterLowered" : type; + set_friendly_name(name); + constructor_validate_and_infer_types(); +} + +void SerializationNode::validate_and_infer_types() { + set_output_type(0, element::f32, {}); +} + +std::shared_ptr SerializationNode::clone_with_new_inputs(const OutputVector &new_args) const { + check_new_args_count(this, new_args); + return std::make_shared(new_args.at(0), m_expr); +} + +bool SerializationNode::visit_attributes(AttributeVisitor &visitor) { + std::vector> shapes; + const auto &node = m_expr->get_node(); + for (size_t i = 0; i < node->get_input_size(); i++) { + const auto &pshape = node->get_input_partial_shape(i); + if (pshape.begin() != pshape.end()) + shapes.emplace_back("in_shape_" + std::to_string(i), node->get_input_partial_shape(i)); + } + for (size_t i = 0; i < node->get_output_size(); i++) { + const auto &pshape = node->get_output_partial_shape(i); + if (pshape.begin() != pshape.end()) + shapes.emplace_back("out_shape_" + std::to_string(i), pshape); + } + auto loop_ids = m_expr->get_loop_ids(); + auto rinfo = m_expr->get_reg_info(); + if (!rinfo.first.empty()) + visitor.on_attribute("in_regs", rinfo.first); + if (!rinfo.second.empty()) + visitor.on_attribute("out_regs", rinfo.second); + for (auto& s : shapes) + visitor.on_attribute(s.first, s.second); + + visitor.on_attribute("loop_ids", loop_ids); + node->visit_attributes(visitor); + return true; +} + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/store.cpp b/src/common/snippets/src/op/store.cpp index 8ac2c4cdf1704e..f7aa73b857c684 100644 --- a/src/common/snippets/src/op/store.cpp +++ b/src/common/snippets/src/op/store.cpp @@ -2,25 +2,27 @@ // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/store.hpp" -#include -namespace ngraph { +namespace ov { namespace snippets { namespace op { -snippets::op::Store::Store(const Output& x, const size_t count, const size_t offset) : MemoryAccess({x}, 0, 1) { +snippets::op::Store::Store(const Output& x, const size_t count, const size_t offset) + : MemoryAccess({x}, std::set{}, std::set{0}) { set_output_port_descriptor({count, offset}, 0); constructor_validate_and_infer_types(); } void snippets::op::Store::validate_and_infer_types() { // Store has memory access port only on output - OPENVINO_ASSERT(get_input_port_count() == 0, "Store node mustn't have memory access input port"); - OPENVINO_ASSERT(get_output_port_count() == 1, "Store node must have memory access output port"); + const auto input_ma_ports = get_memory_access_input_ports(); + const auto output_ma_ports = get_memory_access_output_ports(); + OPENVINO_ASSERT(input_ma_ports.size() == 0, "Store node mustn't have memory access input port"); + OPENVINO_ASSERT(output_ma_ports.size() == 1 && is_memory_access_output_port(0), "Store node must have memory access output port"); set_output_type(0, get_input_element_type(0), get_input_partial_shape(0)); } @@ -32,4 +34,4 @@ std::shared_ptr snippets::op::Store::clone_with_new_inputs(const OutputVec } // namespace op } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/src/op/subgraph.cpp b/src/common/snippets/src/op/subgraph.cpp index 9338f4d73092f3..feb52579a9243c 100644 --- a/src/common/snippets/src/op/subgraph.cpp +++ b/src/common/snippets/src/op/subgraph.cpp @@ -2,32 +2,43 @@ // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/remarks.hpp" #include "snippets/op/subgraph.hpp" #include "snippets/op/convert_saturation.hpp" -#include "snippets/pass/insert_load_store.hpp" + #include "snippets/pass/insert_movebroadcast.hpp" #include "snippets/pass/broadcast_to_movebroadcast.hpp" -#include "snippets/pass/load_movebroadcast_to_broadcastload.hpp" #include "snippets/pass/propagate_precision.hpp" -#include "snippets/pass/assign_registers.hpp" #include "snippets/pass/convert_constants.hpp" #include "snippets/pass/convert_power_to_powerstatic.hpp" -#include "snippets/pass/vector_to_scalar.hpp" -#include "snippets/pass/insert_loops.hpp" #include "snippets/pass/transpose_decomposition.hpp" #include "snippets/pass/transform_convert.hpp" #include "snippets/pass/matmul_to_brgemm.hpp" #include "snippets/pass/fuse_transpose_brgemm.hpp" -#include "snippets/pass/softmax_decomposition.hpp" -#include "snippets/pass/reset_buffer.hpp" -#include "snippets/pass/insert_buffer.hpp" -#include "snippets/pass/loop_fusion.hpp" +#include "snippets/pass/set_softmax_ports.hpp" + #include "snippets/utils.hpp" -#include "transformations/common_optimizations/nop_elimination.hpp" +#include "snippets/lowered/port_descriptor.hpp" +#include "snippets/lowered/linear_ir.hpp" +#include "snippets/lowered/pass/assign_registers.hpp" +#include "snippets/lowered/pass/mark_loops.hpp" +#include "snippets/lowered/pass/fuse_loops.hpp" +#include "snippets/lowered/pass/init_loops.hpp" +#include "snippets/lowered/pass/insert_buffers.hpp" +#include "snippets/lowered/pass/insert_load_store.hpp" +#include "snippets/lowered/pass/load_movebroadcast_to_broadcastload.hpp" +#include "snippets/lowered/pass/allocate_buffers.hpp" +#include "snippets/lowered/pass/propagate_layout.hpp" +#include "snippets/lowered/pass/cleanup_loop_offsets.hpp" +#include "snippets/lowered/pass/softmax_decomposition.hpp" +#include "snippets/lowered/pass/move_scalar_to_consumer.hpp" +#include "snippets/lowered/pass/move_result_out_of_loop.hpp" +#include "snippets/lowered/pass/clean_repeated_ptr_shifts.hpp" +#include "snippets/lowered/pass/identify_buffers.hpp" + #include "transformations/utils/utils.hpp" #include @@ -40,10 +51,12 @@ #include using namespace std; -using namespace ngraph; using namespace ov::op::util; -void snippets::op::Subgraph::set_generator(std::shared_ptr generator) { +namespace ov { +namespace snippets { + +void snippets::op::Subgraph::set_generator(std::shared_ptr generator) { m_generator = generator; } @@ -51,8 +64,13 @@ void snippets::op::Subgraph::set_virtual_port_count(const size_t count) { m_virtual_port_count = count; } -void snippets::op::Subgraph::set_buffer_needed(const bool need) { - m_buffer_needed = need; +auto snippets::op::Subgraph::is_domain_sensitive_op(const std::shared_ptr& op) -> bool { + return ov::is_type(op) || + ov::is_type(op) || + ov::is_type(op) || + ov::is_type(op) || + ov::is_type(op) || // Broadcast is domain sensetive op because the output shape depends on + ov::is_type(op); // the both input and broadcast shapes (the both - are inputs of op). Note: is used only in MHA pattern } void snippets::op::Subgraph::init_config() { @@ -61,13 +79,65 @@ void snippets::op::Subgraph::init_config() { config.m_is_quantized = config.m_is_quantized || ov::is_type(op); config.m_has_domain_sensitive_ops = config.m_has_domain_sensitive_ops || - ov::is_type(op) || - ov::is_type(op) || - ov::is_type(op) || - ov::is_type(op); + is_domain_sensitive_op(op); + } +} + +auto snippets::op::Subgraph::get_estimated_buffer_count(const ov::NodeVector& ops) -> size_t { + // The count of potential unique Buffers - it's hidden virtual ports as well + // We should go through Subgraph and calculate potential non-inplace Buffers count. + // These Buffers can be only around Loops (for example, around MatMul they may be inplace because MatMul doesn't change registers). + // So we should check for element type size of nodes which are used Buffer to get rating from above for unique Buffer count. + // The count is estimated because when we calculate this number, we have only original graph representation + // and where will be Loops - we can just predict. + // Note: The ops that create Buffers: MatMul, Transpose and Softmax (always FP32) + std::vector used_precision_size; + for (const auto& op : ops) { + if (const auto transpose = ov::as_type_ptr(op)) { + // At the moment Transposes are supported only on Results and Parameters but + // then we should have the different Buffers for Transpose as well (Transpose isn't inplace) + const auto consumers = transpose->get_output_target_inputs(0); + // If after Transpose there is Result it means that there won't be Buffer after Transpose. + // The same case is for Parameter before Transpose + const auto are_prev_or_next_ops = std::none_of(consumers.begin(), consumers.end(), + [](const ov::Input& in) { + return ov::is_type(in.get_node()); + }) || + !ov::is_type(transpose->get_input_node_shared_ptr(0)); + if (are_prev_or_next_ops) { + const auto prc_size = transpose->get_element_type().size(); + if (used_precision_size.empty() || used_precision_size.back() != prc_size) { + used_precision_size.push_back(prc_size); + } + } + } else if (ov::is_type(op) || ov::is_type(op)) { + // Softmax always uses 2 FP32 Buffers + const auto prc_size = ov::element::f32.size(); + if (used_precision_size.empty() || used_precision_size.back() != prc_size) { + used_precision_size.push_back(prc_size); + } + } else if (const auto matmul = ov::as_type_ptr(op)) { + // First input check is enough because MatMul requires the same prc size on inputs + if (!ov::is_type(matmul->get_input_node_shared_ptr(0)) || + !ov::is_type(matmul->get_input_node_shared_ptr(1))) { + const auto prc_size = matmul->get_input_element_type(0).size(); + if (used_precision_size.empty() || used_precision_size.back() != prc_size) { + used_precision_size.push_back(prc_size); + } + } + + const auto consumers = matmul->get_output_target_inputs(0); + if (std::none_of(consumers.begin(), consumers.end(), + [](const ov::Input& in) { return ov::is_type(in.get_node()); })) { + const auto prc_size = matmul->get_element_type().size(); + if (used_precision_size.empty() || used_precision_size.back() != prc_size) { + used_precision_size.push_back(prc_size); + } + } + } } - // Domain sensitive ops are decomposed with explicit Loops. So, we should explicitly insert Loops in Subgraph if it contains these ops - config.m_explicit_loop_insertion = config.m_has_domain_sensitive_ops; + + return used_precision_size.size(); } snippets::op::Subgraph::Subgraph(const OutputVector& args, std::shared_ptr body) @@ -122,14 +192,14 @@ std::vector snippets::op::Subgraph::reshape_body(const std::vector void snippets::op::Subgraph::validate_and_infer_types() { INTERNAL_OP_SCOPE(Subgraph); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::validate_and_infer_types") - ngraph::ParameterVector old_parameters; + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::validate_and_infer_types") + ov::ParameterVector old_parameters; for (auto op : body_ptr()->get_parameters()) { old_parameters.push_back(op); } for (size_t i = 0; i < get_input_size(); ++i) { - body_ptr()->replace_parameter(i, std::make_shared(get_input_element_type(i), get_input_partial_shape(i))); + body_ptr()->replace_parameter(i, std::make_shared(get_input_element_type(i), get_input_partial_shape(i))); } body_ptr()->validate_nodes_and_infer_types(); @@ -153,20 +223,20 @@ bool snippets::op::Subgraph::visit_attributes(AttributeVisitor& visitor) { auto snippets::op::Subgraph::wrap_node_as_subgraph(const std::shared_ptr& node) -> std::shared_ptr { INTERNAL_OP_SCOPE(Subgraph); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::wrap_node_as_subgraph") - ngraph::ParameterVector body_parameters; - ngraph::OutputVector body_inputs; + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::wrap_node_as_subgraph") + ov::ParameterVector body_parameters; + ov::OutputVector body_inputs; - ngraph::OutputVector subgraph_inputs; + ov::OutputVector subgraph_inputs; for (const auto& input : node->input_values()) { - if (ov::is_type(input.get_node_shared_ptr()) && - (ngraph::shape_size(input.get_shape()) == 1 || + if (ov::is_type(input.get_node_shared_ptr()) && + (ov::shape_size(input.get_shape()) == 1 || ov::is_type(node) || constant_input_should_be_inside_body(node))) { body_inputs.push_back(input); } else { - auto parameter = std::make_shared(input.get_element_type(), input.get_partial_shape()); + auto parameter = std::make_shared(input.get_element_type(), input.get_partial_shape()); body_parameters.push_back(parameter); body_parameters.back()->set_friendly_name(input.get_node()->get_friendly_name()); body_inputs.push_back(parameter->output(0)); @@ -185,25 +255,19 @@ auto snippets::op::Subgraph::wrap_node_as_subgraph(const std::shared_ptroutputs()) { - body_results.push_back(std::make_shared(body_node->output(output.get_index()))); + body_results.push_back(std::make_shared(body_node->output(output.get_index()))); } auto body = create_body(node->get_friendly_name(), body_results, body_parameters); auto subgraph = build_subgraph(node, subgraph_inputs, body); - bool need_buffer = false; size_t hidden_data_count = 0lu; if (auto fq_node = ov::as_type_ptr(node)) { hidden_data_count += utils::get_non_scalar_constant_count_for_fq(fq_node); - // Ops that requires Buffer - } else if (ov::is_type(node) || - ov::is_type(node)) { - need_buffer |= true; } subgraph->set_virtual_port_count(hidden_data_count); - subgraph->set_buffer_needed(need_buffer); for (size_t i = 0; i < body->get_parameters().size(); i++) { body->get_parameters()[i]->set_friendly_name(body_parameters[i]->get_friendly_name()); @@ -247,7 +311,7 @@ auto snippets::op::Subgraph::constant_input_should_be_inside_body(const std::sha ov::PartialShape snippets::op::Subgraph::canonicalize(const BlockedShapeVector& outputShapes, const BlockedShapeVector& inputShapes) { INTERNAL_OP_SCOPE(Subgraph); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::canonicalize") + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::canonicalize") NODE_VALIDATION_CHECK(this, inputShapes.size() == body_ptr()->get_parameters().size(), "Number of parameters for snippet doesn't match passed to generate method: ", inputShapes.size(), " vs ", body_ptr()->get_parameters().size(), "."); @@ -266,7 +330,7 @@ ov::PartialShape snippets::op::Subgraph::canonicalize(const BlockedShapeVector& const auto baseRank = baseShape.size(); const bool baseIsBlocked = baseOrder.size() != std::set(baseOrder.begin(), baseOrder.end()).size(); for (size_t i = 0; i < inputShapes.size(); i++) { - const auto &blockedShape = inputShapes[i]; + const auto& blockedShape = inputShapes[i]; PartialShape inShape; AxisVector inOrder; element::Type inType; @@ -294,12 +358,12 @@ ov::PartialShape snippets::op::Subgraph::canonicalize(const BlockedShapeVector& // todo: we need to generalize canonicalization for domain-sensitive ops. E.g. MatMul inputs can't be broadcasted one to another if (!config.m_has_domain_sensitive_ops) NODE_VALIDATION_CHECK(this, - PartialShape::broadcast_merge_into(tmpPShape, inShape, ::ngraph::op::AutoBroadcastType::NUMPY), + PartialShape::broadcast_merge_into(tmpPShape, inShape, ::ov::op::AutoBroadcastType::NUMPY), "Failed to create broadcastable shapes in snippets canonicalization"); const auto paramShape = body_ptr()->get_parameters()[i]->get_partial_shape(); const auto paramType = body_ptr()->get_parameters()[i]->get_element_type(); if (paramShape.size() != inShape.size() || !equal(paramShape.begin(), paramShape.end(), inShape.begin())) - body_ptr()->replace_parameter(i, std::make_shared(paramType, inShape)); + body_ptr()->replace_parameter(i, std::make_shared(paramType, inShape)); } body_ptr()->validate_nodes_and_infer_types(); auto skipStartEndOnes = [](const PartialShape& shape) { @@ -321,8 +385,8 @@ ov::PartialShape snippets::op::Subgraph::canonicalize(const BlockedShapeVector& // todo: we need a slightly more general approach for backward ROI propagation const auto& result_parent = body_results[0]->get_input_node_shared_ptr(0); if (body_results.size() == 1 && - ov::is_type(result_parent) && - ov::is_type(result_parent->get_input_node_shared_ptr(0))) { + ov::is_type(result_parent) && + ov::is_type(result_parent->get_input_node_shared_ptr(0))) { outPShape = result_parent->get_input_partial_shape(0); } else { for (size_t i = 0; i < body_results.size(); i++) { @@ -334,12 +398,12 @@ ov::PartialShape snippets::op::Subgraph::canonicalize(const BlockedShapeVector& PartialShape pShape_i(skipStartEndOnes(shape_i)); bool compatibleWithPassedShape = PartialShape::broadcast_merge_into(pShape_i, skipStartEndOnes(outputShape_i), - ::ngraph::op::AutoBroadcastType::NUMPY); + ::ov::op::AutoBroadcastType::NUMPY); NODE_VALIDATION_CHECK(this, compatibleWithPassedShape, "Inferred and passed results shapes are incompatible for snippet "); // Check that output shapes are broadcastable to each other => can be scheduled bool compatibleWithOtherOutputs = PartialShape::broadcast_merge_into(outPShape, shape_i, - ::ngraph::op::AutoBroadcastType::NUMPY); + ::ov::op::AutoBroadcastType::NUMPY); NODE_VALIDATION_CHECK(this, compatibleWithOtherOutputs, "Snippets output shapes must be numpy broadcastable"); } @@ -368,7 +432,7 @@ void snippets::op::Subgraph::align_element_types(const BlockedShapeVector& outpu for (size_t i = 0; i < outputShapes.size(); i++) { const auto needed_out_type = std::get<2>(outputShapes[i]); if (body_results[i]->get_input_element_type(0) != needed_out_type) { - const auto convert = std::make_shared( + const auto convert = std::make_shared( body_results[i]->get_input_node_shared_ptr(0), needed_out_type); body_results[i]->set_argument(0, convert); body_results[i]->validate_and_infer_types(); @@ -382,10 +446,10 @@ void snippets::op::Subgraph::align_element_types(const BlockedShapeVector& outpu const auto& parameter = parameters[i]; if (parameter->get_element_type() != needed_in_type) { const auto parameter_output = parameter->output(0); - const auto convert = std::make_shared( + const auto convert = std::make_shared( parameter_output, parameter_output.get_element_type()); - ngraph::copy_runtime_info(parameter, convert); + ov::copy_runtime_info(parameter, convert); for (const auto input : parameter_output.get_target_inputs()) { const auto& input_node = input.get_node(); @@ -401,188 +465,89 @@ void snippets::op::Subgraph::align_element_types(const BlockedShapeVector& outpu } } -void snippets::op::Subgraph::initialize_buffer_scratchpad_size() { - auto is_transpose_loop = [](const ov::Output& source_output) -> bool { - const auto parent = source_output.get_node_shared_ptr(); - // Transpose op is decomposed into LoopBegin->LoadReshape->Store->LoopEnd subgraph. LoadReshape op can be only - // in Transpose decomposition. So it's enough to verify that this Loop is Transpose pattern. - // We cannot check for non-equality of input and output shape of Transpose Loop because Transpose may have the same - // shapes on input and output. - auto loop_end = ov::as_type_ptr(parent); - if (!loop_end) - return false; - size_t idx = source_output.get_index(); - while (ov::is_type(loop_end->get_input_node_shared_ptr(idx))) { - auto consumer = loop_end->input_value(idx); - idx = consumer.get_index(); - loop_end = ov::as_type_ptr(consumer.get_node_shared_ptr()); - } - - const auto loop_begin = loop_end->get_loop_begin(); - // At the moment Transpose Loops cannot be fused with other Loops, so check for one input and one output is enough - if (loop_begin->get_input_size() != 1 || loop_end->get_output_size() != 1 || loop_begin->get_output_target_inputs(0).size() != 1) - return false; - const auto consumer = loop_begin->get_output_target_inputs(0).begin()->get_node(); - return ov::is_type(consumer); - }; - auto propagate_offset = [](const std::shared_ptr& buffer, const size_t offset) { - // If Buffer has offset We set this offset in the next Load and Store ops - // to correctly read and write data because all buffers have the one register - // Also if user sets offset to a Buffer It means that the Buffer has the corresponding Load and Store ops - - // Propagate to up: in Store. Buffer can have only one Store - { - if (buffer->is_intermediate_memory()) { - OPENVINO_ASSERT(buffer->get_input_size() == 1, "Buffer with intermediate memory must have one parent"); - auto parent = buffer->get_input_node_shared_ptr(0); - auto idx = buffer->input(0).get_source_output().get_index(); - while (ov::is_type(parent)) { - const auto source_output = parent->input_value(idx); - parent = source_output.get_node_shared_ptr(); - idx = source_output.get_index(); - } - if (auto memory_access = ov::as_type_ptr(parent)) { - memory_access->set_output_offset(offset, idx); - } else { - OPENVINO_THROW( - "Buffer::set_offset() was called when Buffer didn't have the corresponding MemoryAccess op for offset propagation"); - } - } - } - - // Propagate to down: in Load. Buffer can have several Load and Loops after himself. We should go through all target inputs - { - std::function&)> propagate_down; - propagate_down = [&](const Input& target_input) { - const auto child = target_input.get_node()->shared_from_this(); - // There may be graph with several LoopBegin and LoopEnd between Load/Brgemm and Buffer, - // so we should iterate through LoopBase - // Example: Softmax decomposition with ReduceMax - if (ov::is_type(child)) { - const auto index = target_input.get_index(); - for (const auto loop_target_output : child->output(index).get_target_inputs()) { - propagate_down(loop_target_output); - } - } else if (auto memory_access = ov::as_type_ptr(child)) { - memory_access->set_input_offset(offset, target_input.get_index()); - } else { - OPENVINO_THROW("Buffer::set_offset() was called when Buffer didn't have the corresponding MemoryAccess op for offset propagation"); - } - }; - - for (const auto target_output : buffer->output(0).get_target_inputs()) { - propagate_down(target_output); - } - } - }; - m_buffer_scratchpad = 0; - size_t offset = 0; - const auto ops = body_ptr()->get_ordered_ops(); - for (const auto& op : ops) { - if (const auto buffer = ov::as_type_ptr(op)) { - const auto buffer_size = buffer->get_byte_size(); - // We need to allocate memory for first buffer at least - if (m_buffer_scratchpad == 0) { - m_buffer_scratchpad += buffer_size; - continue; - } - - if (buffer->is_intermediate_memory()) { - // Transpose, MatMul and other non-decomposed ops should have different memories on inputs and outputs to avoid data corruption, - // so after them, we should allocate new memory. Other operations (Eltwises, Convert) can be executed inplace inside Loop. - OPENVINO_ASSERT(buffer->get_input_size() == 1, "Buffer with intermediate memory must have one parent"); - const auto parent = buffer->get_input_node_shared_ptr(0); - if (!ov::is_type(parent) || is_transpose_loop(parent)) { - offset = m_buffer_scratchpad; - propagate_offset(buffer, offset); - m_buffer_scratchpad += buffer_size; - continue; - } - - propagate_offset(buffer, offset); - } else { - // Single Buffer without input should allocate new memory - offset = m_buffer_scratchpad; - propagate_offset(buffer, offset); - m_buffer_scratchpad += buffer_size; - } - } - } -} - -void snippets::op::Subgraph::convert_to_snippet_dialect() { +void snippets::op::Subgraph::data_flow_transformations(ov::pass::Manager& pre_common, + ov::pass::Manager& post_common, + ov::pass::Manager& post_precision) { INTERNAL_OP_SCOPE(Subgraph); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::convert_to_snippet_dialect") - auto skip_matching_domain = [](const std::shared_ptr& n) -> bool { - const auto& pshape = n->get_input_partial_shape(0); - const auto& last_dim = pshape[pshape.size() - 1]; - return last_dim.is_dynamic() || last_dim.get_length() != 1; - }; - - // At the moment we support only full vector Load/Store and scalar Load/Store so that count is equal to lanes. - // Then we are going to support variadic Load/Store with different element count - const size_t count = m_generator->get_target_machine()->get_lanes(); - const auto & params = body_ptr()->get_parameters(); + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::data_flow_transformations") + const auto& params = body_ptr()->get_parameters(); bool inputs_has_dynamic_last_dims = std::any_of(params.begin(), params.end(), - [](const shared_ptr& p){ + [](const shared_ptr& p) { return p->get_partial_shape().rbegin()->is_dynamic(); }); - const auto allocationRank = static_cast(tileRank); - ngraph::pass::Manager manager; + + pre_common.run_passes(body_ptr()); + + ov::pass::Manager common_manager; if (config.m_has_domain_sensitive_ops) { - manager.register_pass(); - manager.register_pass(); - manager.register_pass(allocationRank); - manager.register_pass(count, allocationRank); - manager.register_pass(); + common_manager.register_pass(); + common_manager.register_pass(); + common_manager.register_pass(); + common_manager.register_pass(); } - manager.register_pass(); - manager.register_pass(); - manager.register_pass(); - manager.register_pass(count); - manager.register_pass(count); + common_manager.register_pass(); + common_manager.register_pass(); + common_manager.register_pass(); // todo: presently dynamic pipeline is activated even if the last two dimension are static // In general, we can use static kernels in this case, but several parameters (src and dst memory pointers for example) // should be passed as run-time args, so it's a mixed mode: kernel is shape-aware, but some additional runtime args are required // Presently Broadcasting is organized in the following way: // * ALL last dims are static => broadcasting is handled via MoveBroadcast and pointer arithmetics (even for dynamic upper dims) if (!inputs_has_dynamic_last_dims) { - manager.register_pass(); - manager.register_pass(); - // Note that, BrodacastMove is typically inserted right after the Load. Such cases are typical for - // simple subgraphs where one of the ngraph::op's inputs is broadcasted to match the larger one. However, BroadcastMove - // could also be inserted after the ngraph::op, if the op input don't need broadcasting, but the output does - // (for example, to match the larger output of a child node). In such cases, Loads (and Stores) should be replaced - // with ScalarLoads (ScalarStores) to avoid invalid read in vector Loop. Graph example: - // Parameter_0 Parameter_1 Parameter_2 - // [1,2,5,16] [1,2,5,1] [1,2,5,1] - // Load BroadcastLoad Load* Scalar - // Add Subtract - // \___________ ___________BroadcastMove - // \ / - // Multiply - // Store - // Result - // Note: Load* should be replaced with ScalarLoad in this example to avoid invalid read in vector Loop. - if (master_shape.size() != 0 && master_shape[master_shape.size() - 1] != 1) { - manager.register_pass(); - manager.register_pass(); - manager.get_pass_config()-> - set_callback(skip_matching_domain); - manager.get_pass_config()-> - set_callback(skip_matching_domain); - } - // Note that InsertLoops requires validate_and_infer_types afterwards, so add it manually if - // automatic validation will be disabled in the pass manager - manager.register_pass(master_shape, tileRank, - m_generator->get_target_machine()->get_lanes(), !config.m_explicit_loop_insertion); - if (config.m_has_domain_sensitive_ops) { - manager.register_pass(); - manager.register_pass(); - } + common_manager.register_pass(); } - manager.run_passes(body_ptr()); + common_manager.run_passes(body_ptr()); + + post_common.run_passes(body_ptr()); + + ov::pass::Manager precision_manager; + precision_manager.register_pass(m_generator->get_target_machine()); + precision_manager.register_pass(); + precision_manager.register_pass(); + precision_manager.run_passes(body_ptr()); + + post_precision.run_passes(body_ptr()); +} + +void snippets::op::Subgraph::control_flow_transformations(lowered::LinearIR& linear_ir, + lowered::pass::PassPipeline& target_pipeline) { + INTERNAL_OP_SCOPE(Subgraph); + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::control_flow_transformations") + + const size_t vector_size = get_generator()->get_target_machine()->get_lanes(); + const int32_t buffer_allocation_rank = static_cast(linear_ir.get_config().m_loop_depth); + + // Note: The pass InitLoops uses LoopInfo that contains entry and exit points of the corresponding Loop. + // To avoid the Loop information corruption, we should call the passes with Load/Store work + // (for example, LoadMoveBroadcastToBroadcastLoad()) after explicit Loop insertion (InitLoops()) + lowered::pass::PassPipeline common_pipeline; + common_pipeline.register_pass(vector_size); + common_pipeline.register_pass(vector_size); + common_pipeline.register_pass(); + common_pipeline.register_pass(); + common_pipeline.register_pass(buffer_allocation_rank); + common_pipeline.register_pass(vector_size); + common_pipeline.register_pass(); + common_pipeline.register_pass(); + common_pipeline.register_pass(); + common_pipeline.run(linear_ir); + + target_pipeline.run(linear_ir); + + const auto buffer_allocation_pass = std::make_shared(); + lowered::pass::PassPipeline buffer_pipeline; + buffer_pipeline.register_pass(); + buffer_pipeline.register_pass(); + buffer_pipeline.register_pass(buffer_allocation_pass); + buffer_pipeline.run(linear_ir); + + lowered::pass::PassPipeline final_pipeline; + final_pipeline.register_pass(); + final_pipeline.register_pass(); + final_pipeline.run(linear_ir); + + m_buffer_scratchpad = buffer_allocation_pass->get_scratchpad_size(); } snippets::Schedule snippets::op::Subgraph::generate(const BlockedShapeVector& output_shapes, @@ -594,60 +559,44 @@ snippets::Schedule snippets::op::Subgraph::generate(const BlockedShapeVector& ou snippets::Schedule snippets::op::Subgraph::generate(const BlockedShapeVector& output_shapes, const BlockedShapeVector& input_shapes, - ngraph::pass::Manager& pre_dialect, - ngraph::pass::Manager& post_dialect, - ngraph::pass::Manager& post_precision, + ov::pass::Manager& pre_common, + ov::pass::Manager& post_common, + ov::pass::Manager& post_precision, + lowered::pass::PassPipeline& target_lowered_pipeline, const void* compile_params) { canonicalize(output_shapes, input_shapes); - return generate(pre_dialect, post_dialect, post_precision, compile_params); + return generate(pre_common, post_common, post_precision, target_lowered_pipeline, compile_params); } snippets::Schedule snippets::op::Subgraph::generate(const void* compile_params) { - auto mngr = ngraph::pass::Manager(); - return generate(mngr, mngr, mngr, compile_params); + auto mngr = ov::pass::Manager(); + auto lowered = lowered::pass::PassPipeline(); + return generate(mngr, mngr, mngr, lowered, compile_params); } snippets::Schedule snippets::op::Subgraph::generate( - ngraph::pass::Manager& pre_dialect, - ngraph::pass::Manager& post_dialect, - ngraph::pass::Manager& post_precision, + ov::pass::Manager& pre_common, + ov::pass::Manager& post_common, + ov::pass::Manager& post_precision, + lowered::pass::PassPipeline& target_lowered_pipeline, const void* compile_params) { INTERNAL_OP_SCOPE(Subgraph); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::generate") + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::generate") NGRAPH_CHECK(m_generator != nullptr, "generate is called while generator is not set"); - pre_dialect.run_passes(body_ptr()); - convert_to_snippet_dialect(); - post_dialect.run_passes(body_ptr()); + data_flow_transformations(pre_common, post_common, post_precision); - ngraph::pass::Manager precision_manager; - precision_manager.register_pass(m_generator->get_target_machine()); - precision_manager.register_pass(); - precision_manager.register_pass(); - precision_manager.run_passes(body_ptr()); - - post_precision.run_passes(body_ptr()); - - // After all passes, when all optimizations are completed and all MemoryAccess ops are inserted, - // we can calculate common buffer scratchpad size and propagate offset from Buffer to the corresponding MemoryAccess ops - if (config.m_has_domain_sensitive_ops) - initialize_buffer_scratchpad_size(); + lowered::Config lowering_config; + lowering_config.m_save_expressions = config.m_has_domain_sensitive_ops; + lowering_config.m_need_fill_tail_register = config.m_has_domain_sensitive_ops; + lowering_config.m_loop_depth = tileRank; - std::function& op)> reg_type_mapper = [=](const std::shared_ptr& op) -> Generator::opRegType { - return m_generator->get_op_reg_type(op); - }; - snippets::pass::AssignRegisters(reg_type_mapper).run_on_model(body_ptr()); - - const auto ops = body_ptr()->get_ops(); - ngraph::snippets::Generator::GeneratorConfig generatorConfig; - generatorConfig.m_save_lowered_code = config.m_has_domain_sensitive_ops; - generatorConfig.m_need_fill_tail_register = config.m_has_domain_sensitive_ops; - generatorConfig.m_optimize_single_evaluation = std::none_of(ops.begin(), ops.end(), [](const std::shared_ptr& op) { - return ov::is_type(op); - }); + lowered::LinearIR linear_ir = lowered::LinearIR(body_ptr(), lowering_config); + control_flow_transformations(linear_ir, target_lowered_pipeline); // actual code emission - ngraph::snippets::code ptr = m_generator->generate(body_ptr(), generatorConfig, compile_params); + const auto& lowering_result = m_generator->generate(linear_ir, lowering_config, compile_params); + const auto ptr = lowering_result.binary_code; return {master_shape, false /*canBeLinearized*/, ptr}; } @@ -689,9 +638,9 @@ void snippets::op::Subgraph::print_statistics(bool verbose) { total += output.get_tensor().size(); } - if (auto subgraph = ngraph::as_type_ptr(n)) { + if (auto subgraph = ov::as_type_ptr(n)) { for (auto op : subgraph->body_ptr()->get_ordered_ops()) { - if (ngraph::as_type_ptr(op)) { + if (ov::as_type_ptr(op)) { total += op->output(0).get_tensor().size(); } } @@ -706,9 +655,9 @@ void snippets::op::Subgraph::print_statistics(bool verbose) { // Results and parameters are artificially introduced, // while Constants are already considered if they are inputs of other operation // this should lead to 1:1 inventory for single node operations - if (!ngraph::as_type_ptr(op) - && !ngraph::as_type_ptr(op) - && !ngraph::as_type_ptr(op)) { + if (!ov::as_type_ptr(op) + && !ov::as_type_ptr(op) + && !ov::as_type_ptr(op)) { total += getNodeInventory(op); } } @@ -718,7 +667,7 @@ void snippets::op::Subgraph::print_statistics(bool verbose) { auto countConstants = [](const ov::Model & f) -> size_t { size_t count = 0; for (auto op : f.get_ordered_ops()) { - count += !!ngraph::as_type_ptr(op) ? 1 : 0; + count += !!ov::as_type_ptr(op) ? 1 : 0; } return count; }; @@ -745,3 +694,6 @@ void snippets::op::Subgraph::serialize() const { auto m_model = xmlFile.str(); std::cout << m_model << std::endl; } + +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/op/vector_buffer.cpp b/src/common/snippets/src/op/vector_buffer.cpp index 1be69a6d9ad678..29afe437e33812 100644 --- a/src/common/snippets/src/op/vector_buffer.cpp +++ b/src/common/snippets/src/op/vector_buffer.cpp @@ -1,27 +1,30 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/op/vector_buffer.hpp" -#include +namespace ov { +namespace snippets { +namespace op { -using namespace std; -using namespace ngraph; - -snippets::op::VectorBuffer::VectorBuffer(const ov::element::Type element_type) : Op(), m_element_type(std::move(element_type)) { +VectorBuffer::VectorBuffer(const ov::element::Type element_type) : Op(), m_element_type(std::move(element_type)) { constructor_validate_and_infer_types(); } -std::shared_ptr snippets::op::VectorBuffer::clone_with_new_inputs(const OutputVector& new_args) const { +std::shared_ptr VectorBuffer::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(VectorBuffer_clone_with_new_inputs); check_new_args_count(this, new_args); return std::make_shared(m_element_type); } -void snippets::op::VectorBuffer::validate_and_infer_types() { +void VectorBuffer::validate_and_infer_types() { INTERNAL_OP_SCOPE(VectorBuffer_validate_and_infer_types); set_output_type(0, m_element_type, Shape{1lu}); } + +} // namespace op +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/pass/broadcast_to_movebroadcast.cpp b/src/common/snippets/src/pass/broadcast_to_movebroadcast.cpp index e5683d949c0c88..65fbbc162a8ada 100644 --- a/src/common/snippets/src/pass/broadcast_to_movebroadcast.cpp +++ b/src/common/snippets/src/pass/broadcast_to_movebroadcast.cpp @@ -1,25 +1,24 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/pass/broadcast_to_movebroadcast.hpp" #include "snippets/pass/insert_movebroadcast.hpp" -#include +#include "openvino/pass/pattern/op/wrap_type.hpp" -#include -#include +#include "openvino/opsets/opset1.hpp" +#include "openvino/core/rt_info.hpp" -using namespace ngraph; -ngraph::snippets::pass::BroadcastToMoveBroadcast::BroadcastToMoveBroadcast() { +ov::snippets::pass::BroadcastToMoveBroadcast::BroadcastToMoveBroadcast() { MATCHER_SCOPE(BroadcastToMoveBroadcast); - auto m_broadcast = ngraph::pattern::wrap_type(); + auto m_broadcast = ov::pass::pattern::wrap_type(); - auto callback = [](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::BroadcastToMoveBroadcast") + auto callback = [](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::BroadcastToMoveBroadcast") auto root = m.get_match_root(); if (auto broadcast_v1 = ov::as_type_ptr(root)) { if (broadcast_v1->get_broadcast_spec().m_type != ov::op::AutoBroadcastType::NUMPY) @@ -35,15 +34,15 @@ ngraph::snippets::pass::BroadcastToMoveBroadcast::BroadcastToMoveBroadcast() { return false; } - const auto broadcast_node = ngraph::snippets::pass::InsertMoveBroadcast::BroadcastNodeLastDim(root->input_value(0), + const auto broadcast_node = ov::snippets::pass::InsertMoveBroadcast::BroadcastNodeLastDim(root->input_value(0), target_shape.get_shape(), value_shape.get_shape()); replace_output_update_name(root->output(0), broadcast_node); - ngraph::copy_runtime_info(root, broadcast_node.get_node_shared_ptr()); + ov::copy_runtime_info(root, broadcast_node.get_node_shared_ptr()); return true; }; - auto m = std::make_shared(m_broadcast, matcher_name); + auto m = std::make_shared(m_broadcast, matcher_name); register_matcher(m, callback); } diff --git a/src/common/snippets/src/pass/collapse_subgraph.cpp b/src/common/snippets/src/pass/collapse_subgraph.cpp index a481d9949795ec..27bc8cd02d06e3 100644 --- a/src/common/snippets/src/pass/collapse_subgraph.cpp +++ b/src/common/snippets/src/pass/collapse_subgraph.cpp @@ -3,7 +3,7 @@ // #include "snippets/remarks.hpp" -#include +#include "snippets/itt.hpp" #include "snippets/pass/collapse_subgraph.hpp" #include "snippets/pass/tokenization.hpp" @@ -12,12 +12,11 @@ #include "snippets/op/subgraph.hpp" #include "snippets/utils.hpp" -#include -#include -#include -#include +#include "openvino/opsets/opset1.hpp" +#include "openvino/core/rt_info.hpp" #include "transformations/utils/utils.hpp" #include "ngraph/op/util/attr_types.hpp" +#include "openvino/core/validation_util.hpp" #include #include @@ -27,7 +26,7 @@ #include -namespace ngraph { +namespace ov { namespace snippets { namespace pass { @@ -47,7 +46,7 @@ auto outputs_are_not_broadcastable(const std::shared_ptr& node) -> b } auto is_supported_op(const std::shared_ptr &n) -> bool { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::is_supported_op") + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::is_supported_op") auto is_supported_matmul = [](const std::shared_ptr& n) -> bool { const auto& matmul = ov::as_type_ptr(n); const auto& out_shape = n->get_output_partial_shape(0); @@ -78,62 +77,62 @@ auto is_supported_op(const std::shared_ptr &n) -> bool { // TODO [92179]: Add support of FakeQuantize with non-constants inputs and with binarization algorithm. const auto fq = ov::as_type_ptr(n); return fq && fq->get_levels() != 2 && - is_type(n->get_input_node_shared_ptr(1)) && - is_type(n->get_input_node_shared_ptr(2)) && - is_type(n->get_input_node_shared_ptr(3)) && - is_type(n->get_input_node_shared_ptr(4)) && + is_type(n->get_input_node_shared_ptr(1)) && + is_type(n->get_input_node_shared_ptr(2)) && + is_type(n->get_input_node_shared_ptr(3)) && + is_type(n->get_input_node_shared_ptr(4)) && (fq->get_auto_broadcast() == ov::op::AutoBroadcastType::NUMPY || fq->get_auto_broadcast() == ov::op::AutoBroadcastType::NONE); }; auto is_supported_ternary_eltwise_op = [](const std::shared_ptr &n) -> bool { - return ov::is_type(n); + return ov::is_type(n); }; auto is_supported_binary_eltwise_op = [](const std::shared_ptr &n) -> bool { - return ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n); + return ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n); }; auto is_supported_unary_eltwise_op = [](const std::shared_ptr &n) -> bool { - return ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n) - || ov::is_type(n); + return ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n) + || ov::is_type(n); }; auto is_supported_softmax = [](const std::shared_ptr &n) -> bool { @@ -141,11 +140,11 @@ auto is_supported_op(const std::shared_ptr &n) -> bool { return false; int64_t axis = -1; const auto rank = n->get_input_partial_shape(0).rank(); - if (const auto softmax_v8 = ngraph::as_type_ptr(n)) { + if (const auto softmax_v8 = ov::as_type_ptr(n)) { OPENVINO_SUPPRESS_DEPRECATED_START - axis = ngraph::normalize_axis(n->get_friendly_name(), softmax_v8->get_axis(), rank); + axis = ov::normalize_axis(n->get_friendly_name(), softmax_v8->get_axis(), rank); OPENVINO_SUPPRESS_DEPRECATED_END - } else if (const auto softmax_v1 = ngraph::as_type_ptr(n)) { + } else if (const auto softmax_v1 = ov::as_type_ptr(n)) { axis = softmax_v1->get_axis(); } else { return false; @@ -179,16 +178,16 @@ auto has_supported_in_out(const std::shared_ptr &n) -> bool { // So i32 is supported exclusively for transposes and broadcast return t.get_partial_shape().is_static() && (TokenizeSnippets::supported_element_types.count(t.get_element_type()) != 0 || - (t.get_element_type() == ngraph::element::i32 && + (t.get_element_type() == ov::element::i32 && (ov::is_type(n) || ov::is_type(n)))); }; - const auto & inputs = n->inputs(); - const auto & outputs = n->outputs(); + const auto& inputs = n->inputs(); + const auto& outputs = n->outputs(); // todo: Is this check necessary? Remove if not for (const auto& out : outputs) { - for (const auto &in_out : out.get_target_inputs()) { - if (ov::is_type(in_out.get_node()->shared_from_this())) { + for (const auto& in_out : out.get_target_inputs()) { + if (ov::is_type(in_out.get_node()->shared_from_this())) { return false; } } @@ -198,8 +197,8 @@ auto has_supported_in_out(const std::shared_ptr &n) -> bool { } auto has_result_child(const std::shared_ptr &node) -> bool { - for (const auto &child : node->get_users()) { - if (ov::is_type(child)) { + for (const auto& child : node->get_users()) { + if (ov::is_type(child)) { return true; } } @@ -208,8 +207,8 @@ auto has_result_child(const std::shared_ptr &node) -> bool { auto get_num_result_children(const std::shared_ptr &node) -> size_t { size_t result = 0; - for (const auto &child : node->get_users()) { - if (ov::is_type(child)) { + for (const auto& child : node->get_users()) { + if (ov::is_type(child)) { result++; } } @@ -217,8 +216,8 @@ auto get_num_result_children(const std::shared_ptr &node) -> size_t } } // namespace -const std::set ngraph::snippets::pass::TokenizeSnippets::supported_element_types = - { ngraph::element::f32, ngraph::element::bf16, ngraph::element::i8, ngraph::element::u8 }; +const std::set ov::snippets::pass::TokenizeSnippets::supported_element_types = + { ov::element::f32, ov::element::bf16, ov::element::i8, ov::element::u8 }; bool TokenizeSnippets::AppropriateForSubgraph(const std::shared_ptr &node) { return @@ -236,17 +235,17 @@ TokenizeSnippets::TokenizeSnippets() { }; continuation_strategy strategy = continuation_strategy::reset; - auto label = std::make_shared(pattern::any_input(), + auto label = std::make_shared(ov::pass::pattern::any_input(), [](const std::shared_ptr &n) { // todo: MatMul and Transpose ops are always skipped by the SnippetsMarkSkipped pass. // This is a temporary solution. Either modify SnippetsMarkSkipped // or align this with the custom MHA tokenization pass. return (GetSnippetsNodeType(n) != SnippetsNodeType::SkippedByPlugin || - ov::is_type(n) || ov::is_type(n)) + ov::is_type(n) || ov::is_type(n)) && AppropriateForSubgraph(n); }); - ngraph::graph_rewrite_callback callback = [&, strategy](ngraph::pattern::Matcher &m) -> bool { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::CreateSubgraph_callback") + ov::graph_rewrite_callback callback = [&, strategy](ov::pass::pattern::Matcher &m) -> bool { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::CreateSubgraph_callback") auto node = m.get_match_root(); if (transformation_callback(node)) { return false; @@ -266,7 +265,7 @@ TokenizeSnippets::TokenizeSnippets() { auto create_single_node_subgraph = [&](const std::shared_ptr &node) { auto subgraph = op::Subgraph::wrap_node_as_subgraph(node); subgraph->get_rt_info()["originalLayersNames"] = getFusedNames(node) + node->get_friendly_name(); - ngraph::replace_node(node, subgraph); + ov::replace_node(node, subgraph); op::update_out_tensor_name(subgraph); }; @@ -304,7 +303,7 @@ TokenizeSnippets::TokenizeSnippets() { * / \ * subgraph--node */ - auto is_recurrent = [&input_values](const ngraph::Output& to_find) -> bool { + auto is_recurrent = [&input_values](const ov::Output& to_find) -> bool { return std::any_of(input_values.begin(), input_values.end(), [&](const ov::Output &in) {return in == to_find;}); }; @@ -314,14 +313,14 @@ TokenizeSnippets::TokenizeSnippets() { */ const auto cyclicDependencyIsIntoduced = [&node](const std::shared_ptr& nodeToExamine, std::pair& currentBounds) -> bool { assert(currentBounds.first < currentBounds.second && "Invalid currentBounds passed"); - const auto &parentNodes = ngraph::as_node_vector(nodeToExamine->input_values()); + const auto& parentNodes = ov::as_node_vector(nodeToExamine->input_values()); const int64_t maxParentOrder = std::accumulate(parentNodes.begin(), parentNodes.end(), currentBounds.first, [](int64_t maxOrder, std::shared_ptr n){ - if (ngraph::op::is_constant(n) || ngraph::op::is_parameter(n)) + if (ov::is_type(n) || ov::is_type(n)) return maxOrder; return std::max(maxOrder, GetTopologicalOrder(n)); }); - const auto &childNodes = nodeToExamine->get_users(); + const auto& childNodes = nodeToExamine->get_users(); // Skip the node being attached, since it will be a part of subgraph and can't introduce loop dependency const int64_t minChildOrder = std::accumulate(childNodes.begin(), childNodes.end(), currentBounds.second, [&node](int64_t minOrder, std::shared_ptr n){ @@ -336,7 +335,7 @@ TokenizeSnippets::TokenizeSnippets() { return true; }; - for (const auto &input_node : ngraph::as_node_vector(input_values)) { + for (const auto& input_node : ov::as_node_vector(input_values)) { if (auto subgraph = ov::as_type_ptr(input_node)) { if (!clones.count(input_node)) { auto f = subgraph->body().clone(); @@ -388,7 +387,7 @@ TokenizeSnippets::TokenizeSnippets() { // Todo: here we rely on friendly_name uniqueness. Propose a different algorithm. size_t current_input_index = body_parameters.size(); for (size_t p_ind = 0; p_ind < body_parameters.size(); p_ind++) { - const auto & p = body_parameters[p_ind]; + const auto& p = body_parameters[p_ind]; // unite two body parameters from two input subgraphs only if: // 1. two input subgraphs are connected to the same parent node/subgraph, // 2. and connected to the same output port of this parent node/subgraph. @@ -465,14 +464,14 @@ TokenizeSnippets::TokenizeSnippets() { // if there is Convert with input and output data types that aren't equal to supported exec type, // we can get conversion math errors const auto output_of_subgraph = source_result->get_input_node_shared_ptr(0); - if (!ov::is_type(node) && ov::is_type(output_of_subgraph)) { + if (!ov::is_type(node) && ov::is_type(output_of_subgraph)) { // Also we can add new node after < Parameter -> Convert -> Convert -> Convert > auto grandparent = output_of_subgraph->get_input_node_ptr(0); - while (ov::is_type(grandparent)) { + while (ov::is_type(grandparent)) { grandparent = grandparent->get_input_node_ptr(0); } - if (!ov::is_type(grandparent)) { + if (!ov::is_type(grandparent)) { return abort_with_strategy("Convert supports only as Input and as Result of subgraph. Aborting"); } } @@ -484,14 +483,14 @@ TokenizeSnippets::TokenizeSnippets() { // After ConstantFolding we will move remaining non-scalar Constants from body using ConvertConstantsToParameters pass // [*] We support Transpose with second Constant input (represents order). This Constant will not be scheduled // and will only be used to decompose Transpose into a proper Load, Store and Loop combination. - if (ov::is_type(input_node) && - (ngraph::shape_size(input_value.get_shape()) == 1 || + if (ov::is_type(input_node) && + (ov::shape_size(input_value.get_shape()) == 1 || ov::is_type(node) || op::Subgraph::constant_input_should_be_inside_body(node))) { internal_inputs.push_back(input_node->output(0)); } else { external_inputs.push_back(input_value); - auto new_parameter = std::make_shared(input_value.get_element_type(), input_value.get_partial_shape()); + auto new_parameter = std::make_shared(input_value.get_element_type(), input_value.get_partial_shape()); new_parameter->set_friendly_name(input_node->get_friendly_name()); body_parameters.push_back(new_parameter); internal_inputs.push_back(new_parameter->output(0)); @@ -518,23 +517,23 @@ TokenizeSnippets::TokenizeSnippets() { // To avoid unsupported number of non-scalar Constants in the future (plugin specific limitation) // we should calculate potentional number of non-scalar Constants that will be moved up from body. size_t hidden_data_count = 0; - bool need_buffer = false; if (const auto fq_node = ov::as_type_ptr(node)) { - hidden_data_count += ngraph::snippets::utils::get_non_scalar_constant_count_for_fq(fq_node); - // Ops require a Buffer - } else if (ov::is_type(node) || - ov::is_type(node)) { - need_buffer |= true; + hidden_data_count += ov::snippets::utils::get_non_scalar_constant_count_for_fq(fq_node); } ResultVector body_results; std::vector>> subgraph_result_inputs; + ov::NodeVector new_body_ops; for (auto subgraph : input_subgraphs) { // we should summurize additional needed data count (non-scalar Constants and Buffers) from all input subgraphs // because we will collapse them with our node and we should get total count - hidden_data_count += ov::as_type_ptr(subgraph)->get_virtual_port_count(); - need_buffer |= ov::as_type_ptr(subgraph)->is_buffer_needed(); + const auto subgraph_ptr = ov::as_type_ptr(subgraph); + hidden_data_count += subgraph_ptr->get_virtual_port_count(); + if (subgraph_ptr->has_domain_sensitive_ops()) { + const auto ops = subgraph_ptr->body_ptr()->get_ordered_ops(); + new_body_ops.insert(new_body_ops.end(), ops.begin(), ops.end()); + } for (auto output : subgraph->outputs()) { bool first_side_consumer = true; @@ -549,7 +548,8 @@ TokenizeSnippets::TokenizeSnippets() { if (!input_subgraphs.count(target_node) && target_node != node) { if (first_side_consumer) { auto& input_subgraph_body = clones[subgraph]; - body_results.push_back(std::make_shared(input_subgraph_body->get_results()[output.get_index()]->input_value(0))); + body_results.push_back(std::make_shared( + input_subgraph_body->get_results()[output.get_index()]->input_value(0))); subgraph_result_inputs.push_back({}); first_side_consumer = false; @@ -565,8 +565,12 @@ TokenizeSnippets::TokenizeSnippets() { } } + if (op::Subgraph::is_domain_sensitive_op(node)) { + new_body_ops.push_back(node); + } + for (auto output : node->outputs()) { - body_results.push_back(std::make_shared(body_node->output(output.get_index()))); + body_results.push_back(std::make_shared(body_node->output(output.get_index()))); subgraph_result_inputs.push_back(output.get_target_inputs()); } @@ -574,14 +578,18 @@ TokenizeSnippets::TokenizeSnippets() { OPENVINO_THROW("body results and node results size mismatch during subgraph collaps"); } - // todo: move this plugin-specific constraint to the plugin callback - if (body_parameters.size() + body_results.size() + hidden_data_count + static_cast(need_buffer) > 12) { + // The each data node (Parameter (and non-Scalar Constants), Result, Buffers with the same ID) requires the own unique GPR. + // At the moment, CPU Plugin has limitation for GPR registers: there are only 12 available registers. + // This limitation will be resolved once generator supports gprs spills [75622]. + // TODO [75567]: move this plugin-specific constraint to the plugin callback + const auto unique_buffer_count = op::Subgraph::get_estimated_buffer_count(new_body_ops); + if (body_parameters.size() + body_results.size() + hidden_data_count + unique_buffer_count > 12) { const std::string message_reset = "new subgraph is created. Impossible to schedule subgraph with " + std::to_string(body_parameters.size()) + " inputs, " + std::to_string(body_results.size()) + " outputs and " + - std::to_string(hidden_data_count) + " non-scalar constants and " + std::to_string(need_buffer) + "buffers."; + std::to_string(hidden_data_count) + " non-scalar constants and " + std::to_string(unique_buffer_count) + "buffers."; const std::string message_abort = "failed to continue subgraph. Impossible to schedule subgraph with " + std::to_string(body_parameters.size()) + " inputs, " + std::to_string(body_results.size()) + " outputs and " + - std::to_string(hidden_data_count) + " non-scalar constants and " + std::to_string(need_buffer) + "buffers."; + std::to_string(hidden_data_count) + " non-scalar constants and " + std::to_string(unique_buffer_count) + "buffers."; return abort_with_strategy(message_reset, message_abort); } @@ -618,7 +626,6 @@ TokenizeSnippets::TokenizeSnippets() { } subgraph->get_rt_info()["originalLayersNames"] = fusedNames; subgraph->set_virtual_port_count(hidden_data_count); - subgraph->set_buffer_needed(need_buffer); remark(1) << "Replacement (merge) done for: " << subgraph->get_friendly_name() @@ -628,9 +635,9 @@ TokenizeSnippets::TokenizeSnippets() { return true; }; - auto matcher = std::make_shared(label, matcher_name); + auto matcher = std::make_shared(label, matcher_name); register_matcher(matcher, callback); } } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/src/pass/common_optimizations.cpp b/src/common/snippets/src/pass/common_optimizations.cpp index 04278526ce8c34..da55629055a6e5 100644 --- a/src/common/snippets/src/pass/common_optimizations.cpp +++ b/src/common/snippets/src/pass/common_optimizations.cpp @@ -1,13 +1,13 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #include "snippets/pass/common_optimizations.hpp" #include -#include +#include "openvino/opsets/opset1.hpp" #include -#include +#include "openvino/pass/pattern/op/wrap_type.hpp" #include "transformations/utils/utils.hpp" #include "snippets/pass/fq_decomposition.hpp" @@ -17,16 +17,14 @@ #include "snippets/utils.hpp" #include "snippets/itt.hpp" -NGRAPH_RTTI_DEFINITION(ngraph::snippets::pass::CommonOptimizations, "Snippets::CommonOptimizations"); - -namespace ngraph { +namespace ov { namespace snippets { namespace pass { // Move up Constants which aren't scalars from body to Subgraph and replace them with Parameters inside body -void ConvertConstantsToParameters(const std::shared_ptr& subgraph) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::ConvertConstantsToParameters"); +void ConvertConstantsToParameters(const std::shared_ptr& subgraph) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::ConvertConstantsToParameters"); auto body = subgraph->body_ptr(); ParameterVector new_parameters; @@ -34,16 +32,16 @@ void ConvertConstantsToParameters(const std::shared_ptrget_ops()) { auto constant = ov::as_type_ptr(op); - if (!constant || ngraph::shape_size(constant->get_shape()) == 1ul) + if (!constant || ov::shape_size(constant->get_shape()) == 1ul) continue; const auto child = constant->get_output_target_inputs(0).begin()->get_node()->shared_from_this(); if (op::Subgraph::constant_input_should_be_inside_body(child)) continue; - auto parameter = std::make_shared(constant->get_element_type(), constant->output(0).get_partial_shape()); + auto parameter = std::make_shared(constant->get_element_type(), constant->output(0).get_partial_shape()); parameter->set_friendly_name(constant->get_friendly_name()); - ngraph::copy_runtime_info(constant, parameter); + ov::copy_runtime_info(constant, parameter); constant->output(0).replace(parameter->output(0)); new_external_inputs.push_back(constant); @@ -59,10 +57,10 @@ void ConvertConstantsToParameters(const std::shared_ptr(m.get_match_root()); + auto subgraph = ov::as_type_ptr(m.get_match_root()); if (transformation_callback(subgraph)) { return false; } @@ -72,11 +70,11 @@ CommonOptimizations::CommonOptimizations() { // Firsly we should transform all original Converts inside body to ConvertTruncation to save original behavior. // Then if Subgraph contains FakeQuantize we enable specific transformation for quantized subgraphs. - ngraph::pass::Manager manager; - manager.register_pass(); - manager.register_pass(); + ov::pass::Manager manager; + manager.register_pass(); + manager.register_pass(); if (is_quantized) { - manager.register_pass(); + manager.register_pass(); } manager.register_pass(); manager.run_passes(body); @@ -89,11 +87,11 @@ CommonOptimizations::CommonOptimizations() { return true; }; - auto m = std::make_shared(ngraph::pattern::wrap_type(), + auto m = std::make_shared(ov::pass::pattern::wrap_type(), matcher_name); this->register_matcher(m, callback); } } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/src/pass/convert_constants.cpp b/src/common/snippets/src/pass/convert_constants.cpp index 951f51825c8f5f..b5fb81b77dd98a 100644 --- a/src/common/snippets/src/pass/convert_constants.cpp +++ b/src/common/snippets/src/pass/convert_constants.cpp @@ -1,24 +1,23 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include -#include -#include +#include "snippets/itt.hpp" +#include "openvino/core/rt_info.hpp" -#include "snippets/snippets_isa.hpp" #include "snippets/pass/convert_constants.hpp" #include "snippets/op/subgraph.hpp" +#include "snippets/op/scalar.hpp" -ngraph::snippets::pass::ConvertConstantsToScalars::ConvertConstantsToScalars() { +ov::snippets::pass::ConvertConstantsToScalars::ConvertConstantsToScalars() { MATCHER_SCOPE(ConvertConstantsToScalars); - auto constants = std::make_shared(pattern::any_input(), - [](std::shared_ptr n) { - return ngraph::is_type(n); - }); - ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::ConvertConstantsToScalars") + auto constants = std::make_shared(ov::pass::pattern::any_input(), + [](std::shared_ptr n) { + return ov::is_type(n); + }); + ov::graph_rewrite_callback callback = [](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::ConvertConstantsToScalars") auto constant = as_type_ptr(m.get_match_root()); if (ov::shape_size(constant->get_output_shape(0)) != 1) return false; @@ -28,8 +27,8 @@ ngraph::snippets::pass::ConvertConstantsToScalars::ConvertConstantsToScalars() { const auto shape = constant->get_output_shape(0).size() == 0 ? ov::Shape{} : ov::Shape{1}; auto scalar = std::make_shared(ov::op::v0::Constant(*constant, shape)); scalar->set_friendly_name(constant->get_friendly_name()); - ngraph::copy_runtime_info(constant, scalar); - ngraph::replace_node(constant, scalar); + ov::copy_runtime_info(constant, scalar); + ov::replace_node(constant, scalar); return true; }; register_matcher(std::make_shared(constants, matcher_name), callback); diff --git a/src/common/snippets/src/pass/convert_power_to_powerstatic.cpp b/src/common/snippets/src/pass/convert_power_to_powerstatic.cpp index ef43e677f6f8cb..b6c96e143310eb 100644 --- a/src/common/snippets/src/pass/convert_power_to_powerstatic.cpp +++ b/src/common/snippets/src/pass/convert_power_to_powerstatic.cpp @@ -1,29 +1,29 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/snippets_isa.hpp" +#include "snippets/utils.hpp" #include "snippets/pass/convert_power_to_powerstatic.hpp" -#include -ngraph::snippets::pass::ConvertPowerToPowerStatic::ConvertPowerToPowerStatic() { +ov::snippets::pass::ConvertPowerToPowerStatic::ConvertPowerToPowerStatic() { MATCHER_SCOPE(ConvertPowerToPowerStatic); - auto scalarPower = std::make_shared(pattern::any_input(), - [](std::shared_ptr n) { - return is_type(n) && - is_type(n->get_input_node_shared_ptr(1)); - }); - ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::ConvertPowerToPowerStatic") + auto scalarPower = std::make_shared(ov::pass::pattern::any_input(), + [](std::shared_ptr n) { + return is_type(n) && + is_type(n->get_input_node_shared_ptr(1)); + }); + ov::graph_rewrite_callback callback = [](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::ConvertPowerToPowerStatic") auto power = ov::as_type_ptr(m.get_match_root()); auto scalar = ov::as_type_ptr(power->get_input_node_shared_ptr(1)); auto value = scalar->cast_vector()[0]; auto power_static = std::make_shared(power->input(0).get_source_output(), value); power_static->set_friendly_name(power->get_friendly_name()); - ngraph::copy_runtime_info(power, power_static); - ngraph::replace_node(power, power_static); + utils::safe_copy_runtime_info(power, power_static); + ov::replace_node(power, power_static); return true; }; diff --git a/src/common/snippets/src/pass/explicit_transpose_matmul_inputs.cpp b/src/common/snippets/src/pass/explicit_transpose_matmul_inputs.cpp index 07e0045d880a5c..6948f6dfcf3476 100644 --- a/src/common/snippets/src/pass/explicit_transpose_matmul_inputs.cpp +++ b/src/common/snippets/src/pass/explicit_transpose_matmul_inputs.cpp @@ -1,33 +1,30 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/pass/explicit_transpose_matmul_inputs.hpp" #include "snippets/pass/transpose_decomposition.hpp" #include "snippets/op/subgraph.hpp" -#include -#include -#include +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" - - -ngraph::snippets::pass::ExplicitTransposeMatMulInputs::ExplicitTransposeMatMulInputs() { +ov::snippets::pass::ExplicitTransposeMatMulInputs::ExplicitTransposeMatMulInputs() { MATCHER_SCOPE(ExplicitTransposeMatMulInputs); - auto m_matmul0 = std::make_shared( - ngraph::pattern::any_input(ngraph::pattern::has_static_shape()), - ngraph::pattern::any_input(ngraph::pattern::has_static_shape())); + auto m_matmul0 = std::make_shared( + ov::pass::pattern::any_input(ov::pass::pattern::has_static_shape()), + ov::pass::pattern::any_input(ov::pass::pattern::has_static_shape())); - register_matcher(std::make_shared(m_matmul0, matcher_name), - [=](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::ExplicitTransposeMatMulInputs") + register_matcher(std::make_shared(m_matmul0, matcher_name), + [=](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::ExplicitTransposeMatMulInputs") auto root = m.get_match_root(); bool rewritten = false; - auto matmul0 = ngraph::as_type_ptr(root); + auto matmul0 = ov::as_type_ptr(root); if (!matmul0) return false; @@ -38,24 +35,24 @@ ngraph::snippets::pass::ExplicitTransposeMatMulInputs::ExplicitTransposeMatMulIn continue; auto parent1 = matmul0->get_input_node_shared_ptr(i); - auto transpose1 = ngraph::as_type_ptr(parent1); - while (!transpose1 && !ov::is_type(parent1)) { + auto transpose1 = ov::as_type_ptr(parent1); + while (!transpose1 && !ov::is_type(parent1)) { // We can set supported order and transposed_b(false) only if ops have scalar shapes to avoid shape mismatching const auto parent_count = parent1->inputs().size(); bool are_weights_scalar = true; for (size_t j = 1; j < parent_count; ++j) { - are_weights_scalar = are_weights_scalar && ngraph::shape_size(parent1->get_input_shape(j)) == 1; + are_weights_scalar = are_weights_scalar && ov::shape_size(parent1->get_input_shape(j)) == 1; } if (!are_weights_scalar) break; parent1 = parent1->get_input_node_shared_ptr(0); - transpose1 = ngraph::as_type_ptr(parent1); + transpose1 = ov::as_type_ptr(parent1); } if (!transpose1) continue; - const auto transpose_pattern = ngraph::as_type_ptr(transpose1->get_input_node_shared_ptr(1)); + const auto transpose_pattern = ov::as_type_ptr(transpose1->get_input_node_shared_ptr(1)); if (!transpose_pattern) continue; @@ -64,11 +61,11 @@ ngraph::snippets::pass::ExplicitTransposeMatMulInputs::ExplicitTransposeMatMulIn if (pass::TransposeDecomposition::supported_cases.count(transposed_order) == 0) continue; - auto new_transpose_order = std::make_shared(transpose_pattern->get_element_type(), - ngraph::Shape{4}, + auto new_transpose_order = std::make_shared(transpose_pattern->get_element_type(), + ov::Shape{4}, transposed_order); new_transpose_order->set_friendly_name(transpose_pattern->get_friendly_name()); - ngraph::copy_runtime_info(transpose_pattern, new_transpose_order); + ov::copy_runtime_info(transpose_pattern, new_transpose_order); transpose1->set_argument(1, new_transpose_order); if (i == 0) { matmul0->set_transpose_a(false); diff --git a/src/common/snippets/src/pass/fq_decomposition.cpp b/src/common/snippets/src/pass/fq_decomposition.cpp index 9688e0a0e22940..c5823ae7a96abc 100644 --- a/src/common/snippets/src/pass/fq_decomposition.cpp +++ b/src/common/snippets/src/pass/fq_decomposition.cpp @@ -6,28 +6,28 @@ #include "snippets/op/convert_saturation.hpp" #include "snippets/itt.hpp" -#include -#include -#include -#include -#include -#include +#include "openvino/opsets/opset1.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/constant_folding.hpp" +#include "openvino/pass/validate.hpp" +#include "openvino/pass/manager.hpp" #include #include + #include namespace { - -bool isValidRangesInputs(const std::shared_ptr& fq) { +bool isValidRangesInputs(const std::shared_ptr& fq) { auto il = fq->input_value(1); auto ih = fq->input_value(2); - auto greater_equal = std::make_shared(il, ih); + auto greater_equal = std::make_shared(il, ih); - ngraph::OutputVector result(1); + ov::OutputVector result(1); if (!greater_equal->constant_fold(result, greater_equal->input_values())) return false; - auto res_node = std::dynamic_pointer_cast(result[0].get_node_shared_ptr()); + auto res_node = std::dynamic_pointer_cast(result[0].get_node_shared_ptr()); const std::vector comp_result = res_node->cast_vector(); @@ -35,23 +35,22 @@ bool isValidRangesInputs(const std::shared_ptr& fq return value; }); } - } // namespace -ngraph::snippets::pass::FakeQuantizeDecomposition::FakeQuantizeDecomposition() { +ov::snippets::pass::FakeQuantizeDecomposition::FakeQuantizeDecomposition() { MATCHER_SCOPE(FakeQuantizeDecomposition); - auto fake_quantize = ngraph::pattern::wrap_type( - OutputVector{ngraph::pattern::any_input(), - ngraph::pattern::wrap_type(), - ngraph::pattern::wrap_type(), - ngraph::pattern::wrap_type(), - ngraph::pattern::wrap_type()}); + auto fake_quantize = ov::pass::pattern::wrap_type( + OutputVector{ov::pass::pattern::any_input(), + ov::pass::pattern::wrap_type(), + ov::pass::pattern::wrap_type(), + ov::pass::pattern::wrap_type(), + ov::pass::pattern::wrap_type()}); - ngraph::matcher_pass_callback callback = [=](ngraph::pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::FakeQuantizeDecomposition") + ov::matcher_pass_callback callback = [=](ov::pass::pattern::Matcher& m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::FakeQuantizeDecomposition") auto& pattern_to_output = m.get_pattern_value_map(); - const auto fake_quantize_node = std::dynamic_pointer_cast( + const auto fake_quantize_node = std::dynamic_pointer_cast( pattern_to_output.at(fake_quantize).get_node_shared_ptr()); if (!fake_quantize_node || transformation_callback(fake_quantize_node) || @@ -84,80 +83,80 @@ ngraph::snippets::pass::FakeQuantizeDecomposition::FakeQuantizeDecomposition() { return val == 0.f; })) || out_scales.size() != 0)); - const bool do_rounding = do_dequantize || fake_quantize_node->get_output_element_type(0) == ngraph::element::f32; + const bool do_rounding = do_dequantize || fake_quantize_node->get_output_element_type(0) == ov::element::f32; - ngraph::NodeVector decomp_ops; + ov::NodeVector decomp_ops; if (input_type != input_low.get_element_type()) { input_type = input_low.get_element_type(); - data = std::make_shared(data, input_type); + data = std::make_shared(data, input_type); decomp_ops.push_back(data.get_node_shared_ptr()); } // if we set input_low or input_high in formula we got output = output_low and output = output_high // respectively so we just clamp x - const auto max = std::make_shared(data, input_low); - const auto min = std::make_shared(max, input_high); + const auto max = std::make_shared(data, input_low); + const auto min = std::make_shared(max, input_high); decomp_ops.push_back(max); decomp_ops.push_back(min); - std::shared_ptr result = nullptr; + std::shared_ptr result = nullptr; if (out_scales.size() != 0) { PartialShape scale_shape = input_low.get_partial_shape(); - ngraph::PartialShape::broadcast_merge_into(scale_shape, + ov::PartialShape::broadcast_merge_into(scale_shape, input_high.get_partial_shape(), broadcast_type); const auto scales = - std::make_shared(ngraph::element::f32, scale_shape.get_shape(), out_scales); + std::make_shared(ov::element::f32, scale_shape.get_shape(), out_scales); decomp_ops.push_back(scales); - result = std::make_shared(min, scales); + result = std::make_shared(min, scales); decomp_ops.push_back(result); } else { // (levels-1) const auto levels_minus_one = - std::make_shared(input_type, Shape{}, fake_quantize_node->get_levels() - 1); + std::make_shared(input_type, Shape{}, fake_quantize_node->get_levels() - 1); decomp_ops.push_back(levels_minus_one); // (input_high - input_low) - const auto subInHighLow = std::make_shared(input_high, input_low); + const auto subInHighLow = std::make_shared(input_high, input_low); // (levels-1) / (input_high - input_low) - const auto isc = std::make_shared(levels_minus_one, subInHighLow); + const auto isc = std::make_shared(levels_minus_one, subInHighLow); // input_low * (levels-1) / (input_high - input_low) - const auto ish = std::make_shared(input_low, isc); + const auto ish = std::make_shared(input_low, isc); decomp_ops.push_back(subInHighLow); decomp_ops.push_back(isc); decomp_ops.push_back(ish); // x * (levels-1) / (input_high - input_low) - const auto after_isc_apply = std::make_shared(min, isc); + const auto after_isc_apply = std::make_shared(min, isc); // x * (levels-1) / (input_high - input_low) - input_low * (levels-1) / (input_high - input_low) - result = std::make_shared(after_isc_apply, ish); + result = std::make_shared(after_isc_apply, ish); decomp_ops.push_back(after_isc_apply); decomp_ops.push_back(result); } if (do_rounding) { // round(x * (levels-1) / (input_high - input_low) - input_low * (levels-1) / (input_high - input_low)) - result = std::make_shared(result, ngraph::opset5::Round::RoundMode::HALF_TO_EVEN); + result = std::make_shared(result, ov::op::v5::Round::RoundMode::HALF_TO_EVEN); decomp_ops.push_back(result); } if (do_dequantize) { // (levels-1) const auto levels_minus_one = - std::make_shared(input_type, Shape{}, fake_quantize_node->get_levels() - 1); + std::make_shared(input_type, Shape{}, fake_quantize_node->get_levels() - 1); // (output_high - output_low) - const auto sub_out_high_low = std::make_shared(output_high, output_low); + const auto sub_out_high_low = std::make_shared(output_high, output_low); // (output_high - output_low) / (levels-1) - const auto osc = std::make_shared(sub_out_high_low, levels_minus_one); + const auto osc = std::make_shared(sub_out_high_low, levels_minus_one); decomp_ops.push_back(sub_out_high_low); decomp_ops.push_back(osc); // round(x * (levels-1) / (input_high - input_low) - input_low * (levels-1) / (input_high - input_low)) * // (output_high - output_low) / (levels-1) - const auto after_osc_apply = std::make_shared(result, osc); + const auto after_osc_apply = std::make_shared(result, osc); // round(x * (levels-1) / (input_high - input_low) - input_low * (levels-1) / (input_high - input_low)) * // (output_high - output_low) / (levels-1) + output_low - result = std::make_shared(after_osc_apply, output_low); + result = std::make_shared(after_osc_apply, output_low); decomp_ops.push_back(after_osc_apply); decomp_ops.push_back(result); } @@ -168,17 +167,17 @@ ngraph::snippets::pass::FakeQuantizeDecomposition::FakeQuantizeDecomposition() { } result->set_friendly_name(m.get_match_root()->get_friendly_name()); - ngraph::copy_runtime_info(fake_quantize_node, decomp_ops); - ngraph::replace_node(m.get_match_root(), result); + ov::copy_runtime_info(fake_quantize_node, decomp_ops); + ov::replace_node(m.get_match_root(), result); return true; }; - auto m = std::make_shared(fake_quantize, matcher_name); + auto m = std::make_shared(fake_quantize, matcher_name); register_matcher(m, callback); } -bool ngraph::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts( - const std::shared_ptr& fq_node, +bool ov::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts( + const std::shared_ptr& fq_node, std::vector& cl, std::vector& ch, std::vector& isc, @@ -186,13 +185,13 @@ bool ngraph::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts( std::vector& osc, std::vector& osh) { auto input_low_constant = - std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(1)); + std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(1)); auto input_high_constant = - std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(2)); + std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(2)); auto output_low_constant = - std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(3)); + std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(3)); auto output_high_constant = - std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(4)); + std::dynamic_pointer_cast(fq_node->get_input_node_shared_ptr(4)); if (!input_low_constant || !input_high_constant || !output_low_constant || !output_high_constant) return false; @@ -209,7 +208,7 @@ bool ngraph::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts( auto broadcast_type = fq_node->get_auto_broadcast(); // We have two ways for computations of scales and shifts to avoid model compilation time growth - // because common function "ngraph::runtime::reference::autobroadcast_binop()" is expensive: + // because common function "ov::runtime::reference::autobroadcast_binop()" is expensive: // - A usual case (weights with the same shapes or scalars) - optimal calculations without large broadcasting // - A rare case ("general broadcasting") - common computations using autobroadcast_binop() call with broadcasting support @@ -233,7 +232,7 @@ bool ngraph::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts( PartialShape scale_pshape = input_low_constant->get_output_partial_shape(0); PartialShape::broadcast_merge_into(scale_pshape, input_high_shape, broadcast_type); const auto scale_shape = scale_pshape.get_shape(); - const auto input_size = ngraph::shape_size(scale_shape); + const auto input_size = ov::shape_size(scale_shape); isc.resize(input_size, 0); ish.resize(input_size, 0); ngraph::runtime::reference::autobroadcast_binop(input_high.data(), input_low.data(), isc.data(), @@ -275,7 +274,7 @@ bool ngraph::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts( } else { // general broadcasting PartialShape scale_pshape = output_low_constant->get_output_partial_shape(0); PartialShape::broadcast_merge_into(scale_pshape, output_high_constant->get_output_partial_shape(0), broadcast_type); - const auto output_size = ngraph::shape_size(scale_pshape.get_shape()); + const auto output_size = ov::shape_size(scale_pshape.get_shape()); osc.resize(output_size, 0); ngraph::runtime::reference::autobroadcast_binop(output_high.data(), output_low.data(), osc.data(), output_high_shape, output_low_shape, broadcast_type, @@ -286,7 +285,7 @@ bool ngraph::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts( return true; } -std::vector ngraph::snippets::pass::FakeQuantizeDecomposition::calculateScales(const ngraph::element::Type& out_type, +std::vector ov::snippets::pass::FakeQuantizeDecomposition::calculateScales(const ov::element::Type& out_type, const std::vector& cl, const std::vector& ch, const std::vector& isc, @@ -294,7 +293,7 @@ std::vector ngraph::snippets::pass::FakeQuantizeDecomposition::calculateS const std::vector& osc, const std::vector& osh) { std::vector out_scales; - if (out_type == ngraph::element::u8 && + if (out_type == ov::element::u8 && std::all_of(cl.cbegin(), cl.cend(), [](float val) { @@ -317,7 +316,7 @@ std::vector ngraph::snippets::pass::FakeQuantizeDecomposition::calculateS } static const float thr = 0.0001f; - if (out_type == ngraph::element::i8 && + if (out_type == ov::element::i8 && std::all_of(ish.cbegin(), ish.cend(), [](float val) { return std::abs(val - 128.f) < thr; }) && std::all_of(osc.cbegin(), osc.cend(), [](float val) { return val == 1.f; }) && std::all_of(osh.cbegin(), osh.cend(), [](float val) { return std::abs(val + 128.f) < thr; })) { @@ -342,13 +341,13 @@ std::vector ngraph::snippets::pass::FakeQuantizeDecomposition::calculateS return out_scales; } -bool ngraph::snippets::pass::CommonFakeQuantizeDecomposition::run_on_model(const std::shared_ptr& f) { +bool ov::snippets::pass::CommonFakeQuantizeDecomposition::run_on_model(const std::shared_ptr& f) { RUN_ON_FUNCTION_SCOPE(CommonFakeQuantizeDecomposition); - ngraph::pass::Manager manager; + ov::pass::Manager manager; manager.set_per_pass_validation(false); - manager.register_pass(); - manager.register_pass(); - manager.register_pass(); + manager.register_pass(); + manager.register_pass(); + manager.register_pass(); manager.run_passes(f); return false; } diff --git a/src/common/snippets/src/pass/fuse_transpose_brgemm.cpp b/src/common/snippets/src/pass/fuse_transpose_brgemm.cpp index 62dd1292b3ffce..24a4141916e189 100644 --- a/src/common/snippets/src/pass/fuse_transpose_brgemm.cpp +++ b/src/common/snippets/src/pass/fuse_transpose_brgemm.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -9,73 +9,91 @@ #include "snippets/utils.hpp" -#include "ngraph/opsets/opset1.hpp" -#include "ngraph/rt_info.hpp" -#include "ngraph/pattern/op/wrap_type.hpp" +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" #include "openvino/pass/pattern/op/or.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { + const std::set> FuseTransposeBrgemm::supported_cases = {{0, 2, 1, 3}}; + +bool FuseTransposeBrgemm::is_supported_transpose(const Output& transpose_port) { + const auto transpose_node = transpose_port.get_node_shared_ptr(); + // it's safe to do so because of the patterns we used. alternatively we can do it through pattern_values_map + const auto& constant = as_type_ptr(transpose_node->get_input_node_shared_ptr(1)); + // if Transpose in and out layout is not empty => something was already fused on this port + auto default_layout = std::vector(transpose_port.get_shape().size()); + std::iota(default_layout.begin(), default_layout.end(), 0);// NCHW layout by default + if (lowered::PortDescriptorUtils::get_port_descriptor_ptr(transpose_port)->get_layout() != default_layout || + lowered::PortDescriptorUtils::get_port_descriptor_ptr(transpose_node->input_value(0))->get_layout() != default_layout) + return false; + const auto& transpose_order = constant->cast_vector(); + // todo: this limitation is due to the fact that offsets are calculated in Kernel, and the only way + // to calc them non-default way is to set Parameter rt_info field. This limitation can be removed if + // the rt_info is properly propagated to the corresponding parameter + return is_type(transpose_node->get_input_node_shared_ptr(0)) && + supported_cases.count(transpose_order) != 0; +} + FuseTransposeBrgemm::FuseTransposeBrgemm() { MATCHER_SCOPE(FuseTransposeBrgemm); - auto transpose_is_supported = [](const Output& transpose_port) { - const auto transpose_node = transpose_port.get_node_shared_ptr(); - // it's safe to do so because of the patterns we used. alternatively we can do it through pattern_values_map - const auto& constant = as_type_ptr(transpose_node->get_input_node_shared_ptr(1)); - // if Transpose in and out layout is not empty => something was already fused on this port - if (!utils::get_node_output_layout(transpose_node).empty() || - !utils::get_node_output_layout(transpose_node->get_input_node_shared_ptr(0)).empty()) - return false; - const auto& transpose_order = constant->cast_vector(); - // todo: this limitation is due to the fact that offsets are calculated in Kernel, and the only way - // to calc them non-default way is to set Parameter rt_info field. This limitation can be removed if - // the rt_info is properly propagated to the corresponding parameter - if (!is_type(transpose_node->get_input_node_shared_ptr(0)) || - supported_cases.count(transpose_order) == 0) - return false; - return true; - }; - auto constant = pattern::wrap_type(); - auto transpose = pattern::wrap_type({pattern::any_input(), constant}, transpose_is_supported); - auto transpose_matcher = std::make_shared(transpose); - auto brgemm_any = pattern::wrap_type({pattern::any_input(), pattern::any_input()}); + auto constant = ov::pass::pattern::wrap_type(); + auto transpose = ov::pass::pattern::wrap_type({ov::pass::pattern::any_input(), constant}, is_supported_transpose); + auto transpose_matcher = std::make_shared(transpose); - auto brgemm_in0 = pattern::wrap_type({transpose, pattern::any_input()}); - auto brgemm_in1 = pattern::wrap_type({pattern::any_input(), transpose}); - auto brgemm_out0 = pattern::wrap_type({brgemm_any, constant}); - auto brgemm_or_transpose = std::make_shared(OutputVector{brgemm_in0, brgemm_in1, brgemm_out0}); + // Pattern 0: Transpose on 0-th input of MatMul + auto brgemm_in0 = ov::pass::pattern::wrap_type({transpose, ov::pass::pattern::any_input()}); - auto callback = [=](pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "FuseTransposeBrgemm") - auto brgemm = as_type_ptr(m.get_match_root()); + // Pattern 1: Transpose on 1-st input of MatMul + auto brgemm_in1 = ov::pass::pattern::wrap_type({ov::pass::pattern::any_input(), transpose}); + + // Pattern 2: Transpose on output of MatMul + auto brgemm_out = ov::pass::pattern::wrap_type({ov::pass::pattern::any_input(), ov::pass::pattern::any_input()}); + auto transpose2 = ov::pass::pattern::wrap_type({brgemm_out, constant}); + + auto brgemm_or_transpose = std::make_shared(OutputVector{brgemm_in0, brgemm_in1, transpose2}); + + auto callback = [=](ov::pass::pattern::Matcher& m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "FuseTransposeBrgemm") + auto brgemm = ov::as_type_ptr(m.get_match_root()); // Transpose on the Brgemm's output if (!brgemm) { - brgemm = as_type_ptr(m.get_match_root()->get_input_node_shared_ptr(0)); + brgemm = ov::as_type_ptr(m.get_match_root()->get_input_node_shared_ptr(0)); const auto& brgemm_out = brgemm->output(0); const auto& transpose_out = m.get_match_value(); + const auto& const_order = ov::as_type_ptr(transpose_out.get_node_shared_ptr()->get_input_node_shared_ptr(1)); + const auto& original_port = ov::snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(brgemm_out); + original_port->set_shape(transpose_out.get_shape()); + original_port->set_layout(const_order->cast_vector()); for (const auto& in : transpose_out.get_target_inputs()) in.replace_source_output(brgemm->output(0)); - utils::set_transpose_output_layout(brgemm_out, as_type_ptr(transpose_out.get_node_shared_ptr())); } + for (size_t i = 0; i < brgemm->get_input_size(); i++) { - const auto& in_value = brgemm->input_value(i); + const auto& in = brgemm->input(i); + const auto& in_value = in.get_source_output(); if (transpose_matcher->match(in_value)) { - const auto& transpose = as_type_ptr(in_value.get_node_shared_ptr()); - utils::set_transpose_output_layout(transpose->input_value(0), transpose); + const auto& transpose = as_type_ptr(in_value.get_node_shared_ptr()); + const auto& const_order = ov::as_type_ptr(transpose->get_input_node_shared_ptr(1)); brgemm->set_argument(i, transpose->input_value(0)); + const auto& original_port = ov::snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(in); + original_port->set_shape(transpose->get_input_shape(0)); + original_port->set_layout(const_order->cast_vector()); } } + // need to run validate_and_infer_types manually: either input shapes were updated or // output Layout was updated (out shape will be updated in validate_and_infer_types()) brgemm->validate_and_infer_types(); return true; }; - register_matcher(std::make_shared(brgemm_or_transpose, matcher_name), callback); + + register_matcher(std::make_shared(brgemm_or_transpose, matcher_name), callback); } } // namespace pass } // namespace snippets -} // namespace ngraph \ No newline at end of file +} // namespace ov diff --git a/src/common/snippets/src/pass/insert_buffer.cpp b/src/common/snippets/src/pass/insert_buffer.cpp deleted file mode 100644 index 72a8b46f712aff..00000000000000 --- a/src/common/snippets/src/pass/insert_buffer.cpp +++ /dev/null @@ -1,97 +0,0 @@ -// Copyright (C) 2018-2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include "snippets/remarks.hpp" - -#include "snippets/pass/insert_buffer.hpp" -#include "snippets/snippets_isa.hpp" - -#include -#include - -ngraph::snippets::pass::InsertBuffer::InsertBuffer(const int32_t allocation_rank) { - MATCHER_SCOPE(InsertBuffer); - // The list of operations that require Buffers on their Inputs and Outputs - const auto pattern = ngraph::pattern::wrap_type(); - - register_matcher(std::make_shared(pattern, matcher_name), - [allocation_rank](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::InsertBuffer") - auto root = m.get_match_root(); - bool rewritten = false; - - // check if already has Buffer, Parameter or Constant as an input - for (const auto& input : root->inputs()) { - const auto input_node = input.get_source_output().get_node()->shared_from_this(); - if (!ov::is_type(input_node) && - !ov::is_type(input_node) && - !ov::is_type(input_node)) { - const auto buffer = std::make_shared(input_node, allocation_rank); - root->set_argument(input.get_index(), buffer); - rewritten |= true; - } - if (ov::is_type(input.get_source_output().get_node_shared_ptr()) && - input.get_source_output().get_target_inputs().size() != 1) { - OPENVINO_THROW( - "If Buffer is a input for operation output, this Buffer should be a single consumer for this port"); - } - } - - // check if already has Buffer or outputs is Result - for (const auto& output : root->outputs()) { - const auto target_inputs = output.get_target_inputs(); - if (target_inputs.size() > 1) { - for (const auto& consumer : target_inputs) { - const auto output_node = consumer.get_node()->shared_from_this(); - if (ov::is_type(output_node)) { - // If some of children from one common port are different Buffers, - // we should remove them to insert one common Buffer on one common port - replace_output_update_name(output_node->output(0), output_node->input_value(0)); - } else if (ov::is_type(output_node)) { - /* TODO: At this moment operation which is should be wrapped by Buffers doesn't support several childs where one of them is Result - * because Result and Buffer from one root port should have the same register. It's not supported at the moment - * For example, - * Buffer - * | - * Softmax - * / \ - * Buffer Result - */ - OPENVINO_THROW( - "Operation which is should be wrapped by Buffers has few children from one output port where one of them is Result"); - } - } - } - - const auto buffer = std::make_shared(output, allocation_rank); - for (const auto& consumer : output.get_target_inputs()) { - const auto output_node = consumer.get_node()->shared_from_this(); - if (output_node != buffer && - !ov::is_type(output_node) && - !ov::is_type(output_node)) { - consumer.replace_source_output(buffer); - rewritten |= true; - } - } - - const auto new_target_inputs = output.get_target_inputs(); - const auto has_buffer_on_output = std::any_of(new_target_inputs.begin(), new_target_inputs.end(), [](const ov::Input& consumer) { - const auto child = consumer.get_node()->shared_from_this(); - // We check for count of target inputs of Buffer output because - // we created Buffer op with root input previously for the next possible insertions - // Thus, if Buffer wasn't inserted, this op doesn't have target inputs on output - return ov::is_type(child) && child->output(0).get_target_inputs().size() > 0; - }); - if (has_buffer_on_output && new_target_inputs.size() != 1) { - OPENVINO_THROW( - "If Buffer is a input for operation output, this Buffer should be a single consumer for this port"); - } - } - return rewritten; - }); -} diff --git a/src/common/snippets/src/pass/insert_load_store.cpp b/src/common/snippets/src/pass/insert_load_store.cpp deleted file mode 100644 index 114393bd872f96..00000000000000 --- a/src/common/snippets/src/pass/insert_load_store.cpp +++ /dev/null @@ -1,81 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include "snippets/remarks.hpp" - -#include "snippets/pass/insert_load_store.hpp" -#include "snippets/snippets_isa.hpp" - -#include -#include -#include - -ngraph::snippets::pass::InsertLoad::InsertLoad(const size_t count) { - MATCHER_SCOPE(InsertLoad); - register_matcher(std::make_shared( - ngraph::pattern::wrap_type(), matcher_name), - [count](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::InsertLoad") - auto root = m.get_match_root(); - - // check if already has Load as an output - for (const auto& output : root->outputs()) { - for (const auto& consumer : output.get_target_inputs()) { - // if a parameter is connected to a Load => we don't need another one - // if a parameter is connected to LoopBegin => there must be Load inside the Loop - // if a parameter is connected to MatMul => we don't need Load (read/write is encapsulated into the brgemm emitter) - // (it's the responsibility of transformation that inserted the Loops) - const auto& consumer_node = consumer.get_node(); - if (ov::is_type(consumer_node) || - ov::is_type(consumer_node) || - ov::is_type(consumer_node) || - ov::is_type(consumer_node)) { - return false; - } - } - } - - auto load = std::make_shared(root, count); - ngraph::copy_runtime_info(root, load); - - bool rewritten = false; - for (const auto& output : root->outputs()) { - for (const auto& consumer : output.get_target_inputs()) { - if (consumer.get_node()->shared_from_this() != load) { - consumer.replace_source_output(load); - rewritten |= true; - } - } - } - - return rewritten; - }); -} - -ngraph::snippets::pass::InsertStore::InsertStore(const size_t count) { - MATCHER_SCOPE(InsertStore); - register_matcher(std::make_shared( - ngraph::pattern::wrap_type(), matcher_name), - [count](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::InsertStore") - auto root = m.get_match_root(); - - // check if already has Store as an input - for (const auto& input : root->inputs()) { - const auto& parent_node = input.get_source_output().get_node(); - if (ov::is_type(parent_node) || - ov::is_type(parent_node) || - ov::is_type(parent_node) || - ov::is_type(parent_node)) { - return false; - } - } - - auto store = std::make_shared(root->input_value(0), count); - ngraph::copy_runtime_info(root, store); - root->set_argument(0, store); - return true; - }); -} diff --git a/src/common/snippets/src/pass/insert_loops.cpp b/src/common/snippets/src/pass/insert_loops.cpp deleted file mode 100644 index 56ef7c80647d4d..00000000000000 --- a/src/common/snippets/src/pass/insert_loops.cpp +++ /dev/null @@ -1,285 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include "snippets/pass/insert_loops.hpp" -#include "snippets/pass/loop_helpers.hpp" -#include "snippets/snippets_isa.hpp" -#include "snippets/utils.hpp" - -#include - -namespace ngraph { -namespace snippets { -namespace pass { - -InsertLoops::InsertLoops(ov::PartialShape master_shape, size_t loop_depth, size_t vector_size, bool single_loop_body) - : m_master_shape(std::move(master_shape)), m_loop_depth(loop_depth), m_vector_size(vector_size), m_single_loop_body(single_loop_body) { - if (m_master_shape.size() < m_loop_depth) - OPENVINO_THROW("InsertLoops can't insert loops: master shape rank is too small"); -} - -std::vector InsertLoops::calculate_inner_apply_increments(const ov::PartialShape& master, - const std::vector& shapes) { - // Inner Loop applies increments if a dimension is not broadcasted - std::vector apply_increments; - apply_increments.reserve(shapes.size()); - std::transform(shapes.begin(), shapes.end(), std::back_inserter(apply_increments), - [=](const ov::PartialShape& ps) { return utils::get_inner_dim(ps) != 1 && utils::get_inner_dim(master) != 1; }); - return apply_increments; -} -std::vector InsertLoops::calculate_outer_apply_increments(const std::vector& shapes) { - // Outer Loop applies increments only if a corresponding lower dim was broadcasted (or all lower dims == 1) - std::vector apply_increments; - apply_increments.reserve(shapes.size()); - std::transform(shapes.begin(), shapes.end(), std::back_inserter(apply_increments), - [=](const ov::PartialShape& ps) { return utils::get_outer_dim(ps) != 1 && utils::get_inner_dim(ps) == 1; }); - return apply_increments; -} -std::vector InsertLoops::calculate_finalization_offsets(const ov::PartialShape& master, - const std::vector& shapes) { - const auto inner_work_amount = utils::get_inner_dim(master).get_length(); - std::vector inner_finalization_offsets(shapes.size(), 0); - std::transform(shapes.begin(), shapes.end(), inner_finalization_offsets.begin(), - [=](const ov::PartialShape& ps) { - return utils::get_outer_dim(ps) == 1 && utils::get_inner_dim(ps) != 1 ? -inner_work_amount : 0; - }); - return inner_finalization_offsets; -} - -void insert_loops_explicitly(const ov::NodeVector& ops, const size_t vector_size) { - ov::NodeVector body; - ov::NodeVector body_remainder; - ov::OutputVector body_parameters; - std::vector> body_results; - - // check for potential parameters for new Loop - auto add_body_parameters = [](const std::shared_ptr& op, ov::OutputVector& body_parameters) { - for (const auto& input : op->inputs()) { - auto parent = input.get_source_output().get_node_shared_ptr(); - if (ov::is_type(parent) || - ov::is_type(parent) || - ov::is_type(parent) || - ov::is_type(parent)) { - body_parameters.push_back(input.get_source_output()); - } - } - }; - - // check for potential results for new Loop - auto add_body_results = [](const std::shared_ptr& op, std::vector>& body_results) { - for (const auto& output : op->outputs()) { - for (const auto& target_input : output.get_target_inputs()) { - auto child = target_input.get_node(); - if (ov::is_type(child) || - ov::is_type(child) || - ov::is_type(child) || - ov::is_type(child)) { - body_results.push_back(target_input); - } - } - } - }; - - // check for potential missing body ops for new loop - std::function& op, ov::NodeVector& body)> add_missing_body_ops; - add_missing_body_ops = [&](const std::shared_ptr& op, ov::NodeVector& body) { - if (body_remainder.size()) { - for (const auto& input : op->inputs()) { - auto parent = input.get_source_output().get_node_shared_ptr(); - auto iter = std::find(body_remainder.begin(), body_remainder.end(), parent); - if (iter != body_remainder.end()) { - *std::back_inserter(body) = std::move(*iter); - add_missing_body_ops(parent, body); - add_body_parameters(parent, body_parameters); - add_body_results(op, body_results); - } - } - } - }; - - auto wrap_body_by_loop = [&](const ov::NodeVector& body, const ov::OutputVector& body_parameters, const std::vector>& body_results) { - NGRAPH_CHECK(!body_parameters.empty(), "The count of parameters for loop should be more than zero to create loop"); - NGRAPH_CHECK(!body_results.empty(), "The count of results for loop should be more than zero to create loop"); - std::vector body_shapes; - const auto count_io = body_parameters.size() + body_results.size(); - body_shapes.reserve(count_io); - std::transform(body_parameters.begin(), body_parameters.end(), std::back_inserter(body_shapes), - [](const ov::Output& out) { return out.get_partial_shape(); }); - std::transform(body_results.begin(), body_results.end(), std::back_inserter(body_shapes), - [](const ov::Input& in) { return in.get_partial_shape(); }); - - auto body_master_shape = body_shapes.front(); - for (const auto& shape : body_shapes) { - NGRAPH_CHECK(PartialShape::broadcast_merge_into(body_master_shape, shape, ::ngraph::op::AutoBroadcastType::NUMPY), - "Loop input and output must be numpy broadcastable"); - } - const auto inner_work_amount = utils::get_inner_dim(body_master_shape).get_length(); - const auto outer_work_amount = utils::get_outer_dim(body_master_shape).get_length(); - - auto apply_increments = InsertLoops::calculate_inner_apply_increments(body_master_shape, body_shapes); - std::vector inner_finalization_offsets(body_shapes.size(), 0); - if (outer_work_amount > 1) { - inner_finalization_offsets = InsertLoops::calculate_finalization_offsets(body_master_shape, body_shapes); - } - - const auto& inner_loop_begin = op::insertLoopBeginAfterOutputs(body_parameters); - const auto& inner_loop_end = op::insertLoopEndBeforeInputs( - body_results, inner_loop_begin, inner_work_amount, vector_size, - apply_increments, inner_finalization_offsets); - // set internal flag to enable scalar vs vector loop optimizations - inner_loop_end->has_outer_loop = outer_work_amount > 1; - // Due to features of topological sort, some Constants (Scalars) may appear right after Parameters in - // sorted ops (so it's between Parameters and LoopBegin). Consequently, ScalarEmitters would be called - // outside the Loop, and only the first Loop iteration would yield correct data (assuming the vector reg - // assigned to scalar will get corrupted inside the loop body). To avoid such cases, we add control dependency - // on LoopBegin to guarantee that the constants are executed inside the Loop. - for (const auto& n : body) { - if (auto c = std::dynamic_pointer_cast(n)) { - c->add_control_dependency(inner_loop_begin); - } - } - - if (outer_work_amount > 1) { - std::vector apply_increments = InsertLoops::calculate_outer_apply_increments(body_shapes); - std::vector outer_finalization_offsets(body_shapes.size(), 0); - const auto& outer_loop_begin = op::insertLoopBegin(body_parameters); - op::insertLoopEnd(body_results, outer_loop_begin, outer_work_amount, 1lu, - apply_increments, outer_finalization_offsets); - } - }; - - auto op_is_outside_loop = [](const std::shared_ptr& op) -> bool { - if (ov::is_type(op) || - ov::is_type(op) || - ov::is_type(op)) - return true; - auto& rt = op->get_rt_info(); - auto outside_rt = rt.find("outside_loop"); - bool is_outside = false; - // If rt info isn't setted it means that op should be inside loop by default - if (outside_rt != rt.end()) { - is_outside = outside_rt->second.as(); - } - return is_outside; - }; - - for (auto iter = ops.begin(); iter < ops.end(); iter++) { - const auto op = *iter; - // Need to check for that op should be inside or outside loop - if (op_is_outside_loop(op)) { - continue; - } - - // If we meet loopBegin or Brgemm, it means that all previous nodes from ordered body - // should be in one body. It's like stop signal - const auto& loop_begin = ov::as_type_ptr(op); - const auto& brgemm = ov::as_type_ptr(op); - if (loop_begin || brgemm) { - if (!body.empty()) { - if (!body_results.empty()) { - wrap_body_by_loop(body, body_parameters, body_results); - } else { - // If there aren't body results, it means that the current body ops are inputs of the next some operations in ordered_ops - // So this set of the current body ops is part of the future body loop. - // We should save them to add in body ops in the future - std::move(body.begin(), body.end(), std::back_inserter(body_remainder)); - } - } - - // we should skip the next existing Loop body - if (loop_begin) { - const auto &loop_end = loop_begin->get_loop_end(); - iter = std::find(iter, ops.end(), loop_end); - } - - // clear loop body to create the next - body.clear(); - body_parameters.clear(); - body_results.clear(); - } else { - add_missing_body_ops(op, body); - add_body_parameters(op, body_parameters); - add_body_results(op, body_results); - - body.push_back(op); - } - } - - if (!body.empty()) { - wrap_body_by_loop(body, body_parameters, body_results); - } -} - -bool InsertLoops::run_on_model(const std::shared_ptr &model) { - RUN_ON_FUNCTION_SCOPE(InsertLoops); - if (m_master_shape.is_dynamic()) - OPENVINO_THROW("InsertLoops doesn't support dynamic shapes yet"); - - const auto inner_work_amount = utils::get_inner_dim(m_master_shape).get_length(); - const auto outer_work_amount = m_loop_depth == 2 ? utils::get_outer_dim(m_master_shape).get_length() : 1; - - auto ops = model->get_ordered_ops(); - ParameterVector commonParams = model->get_parameters(); - // Note that topological sort parses node arguments in reversed order, but results are added - in direct order - // So ve need to pass the reversed results to LoopEnd to keep the original traversal order in topological sorter - const auto& orig_results = model->get_results(); - ResultVector commonResults(orig_results.rbegin(), orig_results.rend()); - std::vector ioShapes; - - const auto& body_rt_info = model->get_rt_info(); - const auto& plugin_shapes = body_rt_info.find("PluginShapesOverride"); - if (plugin_shapes == body_rt_info.end()) { - OPENVINO_THROW("InsertLoops requires PluginShapesOverride rt_info field"); - } else { - const auto& new_shapes = plugin_shapes->second.as>>(); - if (new_shapes.size() != commonResults.size() + commonParams.size()) - OPENVINO_THROW("InsertLoops got invalid number of plugin-overriden shapes"); - for (size_t i = 0; i < commonParams.size(); i++) - ioShapes.emplace_back(new_shapes[i]); - // reverse overriden_shapes for results since commonResults are reversed with respect to model->get_parameters() - for (size_t i = 0; i < commonResults.size(); i++) - ioShapes.emplace_back(new_shapes[new_shapes.size() - 1 - i]); - } - - if (inner_work_amount > 0) { - if (m_single_loop_body) { - const auto apply_increments = InsertLoops::calculate_inner_apply_increments(m_master_shape, ioShapes); - std::vector inner_finalization_offsets(ioShapes.size(), 0); - if (outer_work_amount > 1) { - inner_finalization_offsets = InsertLoops::calculate_finalization_offsets(m_master_shape, ioShapes); - } - const auto& inner_loop_begin = op::insertLoopBegin(commonParams); - const auto& inner_loop_end = insertLoopEnd(commonResults, inner_loop_begin, inner_work_amount, - m_vector_size, apply_increments, inner_finalization_offsets); - // set internal flag to enable scalar vs vector loop optimizations - inner_loop_end->has_outer_loop = outer_work_amount > 1; - // Due to features of topological sort, some Constants (Scalars) may appear right after Parameters in - // sorted ops (so it's between Parameters and LoopBegin). Consequently, ScalarEmitters would be called - // outside the Loop, and only the first Loop iteration would yield correct data (assuming the vector reg - // assigned to scalar will get corrupted inside the loop body). To avoid such cases, we add control dependency - // on LoopBegin to guarantee that the constants are executed inside the Loop. - for (const auto& n : model->get_ordered_ops()) { - if (auto c = std::dynamic_pointer_cast(n)) - c->add_control_dependency(inner_loop_begin); - else if (n == inner_loop_begin) - break; - } - - if (outer_work_amount > 1) { - std::vector apply_increments = InsertLoops::calculate_outer_apply_increments(ioShapes); - const auto& outer_loop_begin = op::insertLoopBegin(commonParams); - op::insertLoopEnd(commonResults, outer_loop_begin, outer_work_amount, 1lu, apply_increments); - } - } else { - insert_loops_explicitly(ops, m_vector_size); - } - } - - return true; -} - -} // namespace pass -} // namespace snippets -} // namespace ngraph diff --git a/src/common/snippets/src/pass/insert_movebroadcast.cpp b/src/common/snippets/src/pass/insert_movebroadcast.cpp index de8d4a9d32a918..5f09dc17855051 100644 --- a/src/common/snippets/src/pass/insert_movebroadcast.cpp +++ b/src/common/snippets/src/pass/insert_movebroadcast.cpp @@ -2,27 +2,23 @@ // SPDX-License-Identifier: Apache-2.0 // -#include "snippets/remarks.hpp" -#include +#include "snippets/itt.hpp" #include "snippets/pass/insert_movebroadcast.hpp" #include "snippets/snippets_isa.hpp" #include "snippets/utils.hpp" -#include -#include -#include +#include "openvino/opsets/opset1.hpp" +#include "openvino/op/util/op_types.hpp" #include -using namespace ngraph; - namespace { std::pair> get_numpy_broadcast_partial_shapes(const std::vector& input_shapes) { ov::PartialShape target_shape = input_shapes.front(); for (size_t i = 1; i < input_shapes.size(); i++) { - if (!ov::PartialShape::broadcast_merge_into(target_shape, input_shapes[i], op::AutoBroadcastType::NUMPY)) + if (!ov::PartialShape::broadcast_merge_into(target_shape, input_shapes[i], ov::op::AutoBroadcastType::NUMPY)) OPENVINO_THROW("InsertMoveBroadcast: Failed broadcast-merge input shapes"); } std::vector normalized_shapes; @@ -37,8 +33,8 @@ std::pair> get_numpy_broadcast_p } // namespace -ngraph::Output ngraph::snippets::pass::InsertMoveBroadcast::BroadcastNodeLastDim( - const ngraph::Output& value, const ov::PartialShape& target_shape, const ov::PartialShape& normalized_shape) { +ov::Output ov::snippets::pass::InsertMoveBroadcast::BroadcastNodeLastDim( + const ov::Output& value, const ov::PartialShape& target_shape, const ov::PartialShape& normalized_shape) { if (target_shape == value.get_partial_shape()) { return value; } @@ -48,11 +44,8 @@ ngraph::Output ngraph::snippets::pass::InsertMoveBroadcast::Broadc if (*target_shape.rbegin() != *normalized_shape.rbegin()) { ov::PartialShape broadcasted_shape = normalized_shape; *broadcasted_shape.rbegin() = *target_shape.rbegin(); - const auto broadcast_node = std::make_shared(value, broadcasted_shape); - // BroadcastMove should be immediately executed after its input op (input op is node with output which should be broadcasted). - // For example, to execute Broadcast outside of a Loop We transfer control dependents and copy rt info - broadcast_node->add_node_control_dependents(value.get_node_shared_ptr()); - ov::copy_runtime_info(value.get_node_shared_ptr(), broadcast_node); + const auto broadcast_node = std::make_shared(value, broadcasted_shape); + utils::safe_copy_runtime_info(value.get_node_shared_ptr(), broadcast_node); return broadcast_node->output(0); } @@ -60,10 +53,10 @@ ngraph::Output ngraph::snippets::pass::InsertMoveBroadcast::Broadc return value; } -ngraph::snippets::pass::InsertMoveBroadcast::InsertMoveBroadcast() { +ov::snippets::pass::InsertMoveBroadcast::InsertMoveBroadcast() { MATCHER_SCOPE(InsertMoveBroadcast); - ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::InsertMoveBroadcast") + ov::graph_rewrite_callback callback = [](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::InsertMoveBroadcast") auto root = m.get_match_root(); const auto& values = root->input_values(); if (values.empty()) { @@ -75,7 +68,7 @@ ngraph::snippets::pass::InsertMoveBroadcast::InsertMoveBroadcast() { // - Scalar has emitter with explicit broadcasting // - VectorBuffer has scalar output shape to avoid broadcast conflicts and manually shape insertion. return utils::is_scalar_constant(v.get_node_shared_ptr()) || - ov::is_type(v.get_node_shared_ptr()); + ov::is_type(v.get_node_shared_ptr()); }; std::vector input_shapes; std::vector is_ignored; @@ -92,18 +85,17 @@ ngraph::snippets::pass::InsertMoveBroadcast::InsertMoveBroadcast() { // find the output tensor's shape, then broadcast all inputs so that they are compatible with respect to the last dim auto bcast_shapes = get_numpy_broadcast_partial_shapes(input_shapes); - ngraph::OutputVector broadcasted_inputs; + ov::OutputVector broadcasted_inputs; for (size_t i = 0; i < values.size(); ++i) { if (is_ignored[i]) { broadcasted_inputs.push_back(values[i]); } else { auto node = BroadcastNodeLastDim(values[i], bcast_shapes.first, bcast_shapes.second[i]); - ngraph::copy_runtime_info(root, node.get_node_shared_ptr()); broadcasted_inputs.push_back(node); } } - auto new_args = ngraph::as_node_vector(broadcasted_inputs); + auto new_args = ov::as_node_vector(broadcasted_inputs); for (size_t i = 0; i < new_args.size(); i++) { root->input(i).replace_source_output(new_args[i]->output(0)); } @@ -111,11 +103,11 @@ ngraph::snippets::pass::InsertMoveBroadcast::InsertMoveBroadcast() { }; // only numpy broadcast type is supported currently - auto any = std::make_shared(pattern::any_input(), - [](std::shared_ptr n) { + auto any = std::make_shared(ov::pass::pattern::any_input(), + [](const std::shared_ptr& n) { // should add supports_auto_broadcast to SquaredDifference - return ((ngraph::op::supports_auto_broadcast(n) || is_type(n) || is_type(n)) && - n->get_autob().m_type == ngraph::op::AutoBroadcastType::NUMPY) || is_type(n); }); + return ((ov::op::util::supports_auto_broadcast(n) || is_type(n) || is_type(n)) && + n->get_autob().m_type == ov::op::AutoBroadcastType::NUMPY) || is_type(n); }); - register_matcher(std::make_shared(any, matcher_name), callback); + register_matcher(std::make_shared(any, matcher_name), callback); } diff --git a/src/common/snippets/src/pass/load_movebroadcast_to_broadcastload.cpp b/src/common/snippets/src/pass/load_movebroadcast_to_broadcastload.cpp deleted file mode 100644 index 7aa69d65bbde28..00000000000000 --- a/src/common/snippets/src/pass/load_movebroadcast_to_broadcastload.cpp +++ /dev/null @@ -1,46 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include "snippets/remarks.hpp" -#include - -#include "snippets/pass/load_movebroadcast_to_broadcastload.hpp" -#include "snippets/snippets_isa.hpp" - -#include -#include -#include - -ngraph::snippets::pass::LoadMoveBroadcastToBroadcastLoad::LoadMoveBroadcastToBroadcastLoad() { - MATCHER_SCOPE(LoadMoveBroadcastToBroadcastLoad); - auto param_pattern = ngraph::pattern::wrap_type(); - auto load_pattern = ngraph::pattern::wrap_type({param_pattern}); - auto fbn = std::make_shared(load_pattern, Shape{1}); - - register_matcher(std::make_shared(fbn, matcher_name), - [load_pattern, param_pattern](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::LoadMoveBroadcastToBroadcastLoad") - auto root = m.get_match_root(); - - const auto &pm = m.get_pattern_value_map(); - const auto load = ov::as_type_ptr(pm.at(load_pattern).get_node_shared_ptr()); - const auto param = pm.at(param_pattern).get_node_shared_ptr(); - - // Cannot rewrite Broadcast + Load if load has more than 1 user - // or more than one input, or if Broadcast has several inputs - if (load->output(0).get_target_inputs().size() != 1 || - root->inputs().size() != 1 || load->inputs().size() != 1) { - return false; - } - - auto inshape = root->input(0).get_partial_shape(); - auto outshape = root->output(0).get_partial_shape(); - - auto broadcastload = std::make_shared(param, outshape, load->get_offset()); - ngraph::copy_runtime_info(root, broadcastload); - ngraph::replace_node(root, broadcastload); - - return true; - }); -} diff --git a/src/common/snippets/src/pass/loop_fusion.cpp b/src/common/snippets/src/pass/loop_fusion.cpp deleted file mode 100644 index 2291e0746075d9..00000000000000 --- a/src/common/snippets/src/pass/loop_fusion.cpp +++ /dev/null @@ -1,331 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include -#include - -#include "snippets/snippets_isa.hpp" -#include "snippets/pass/loop_fusion.hpp" -#include "snippets/utils.hpp" - -namespace { -using InputSet = std::set>; -using Edge = std::pair, InputSet>; - -auto can_be_merged(const std::shared_ptr& loop_end_up, - const std::shared_ptr& loop_begin_down) -> bool { - if (!loop_end_up || !loop_begin_down) - return false; - - const auto loop_end_down = loop_begin_down->get_loop_end(); - const auto loop_begin_up = loop_end_up->get_loop_begin(); - if (loop_end_down->get_work_amount() != loop_end_up->get_work_amount() || - loop_end_down->get_increment() != loop_end_up->get_increment()) - return false; - - /* If between Loops there are common dependencies (for example, reducing operations), we cannot merge these Loops - * Example, when there is HorizonMax op between Loops: - * Data - * VectorBuffer LoopBegin - * \ Load | \ - * Maximum | / - * / LoopEnd - * HorizonMax | - * \ LoopBegin - * \ Load \ - * Subtract | - * Store / - * LoopEnd - */ - auto up_dependent_ptrs = loop_end_up->get_control_dependents(); - ov::NodeVector up_dependents(up_dependent_ptrs.size(), nullptr); - std::transform(up_dependent_ptrs.begin(), up_dependent_ptrs.end(), up_dependents.begin(), [](ngraph::Node* node) { return node->shared_from_this(); }); - auto down_dependencies = loop_begin_down->get_control_dependencies(); - std::sort(up_dependents.begin(), up_dependents.end()); - std::sort(down_dependencies.begin(), down_dependencies.end()); - std::vector> common_nodes; - std::set_intersection(up_dependents.begin(), up_dependents.end(), down_dependencies.begin(), down_dependencies.end(), - std::back_inserter(common_nodes)); - // TODO: Add check for sequence/subgraph of depending nodes between Loops. - // At these moment we should have full list of dependencies and dependents of Loops to find intersection, - // not just first dependent of LoopEnd and first dependency of LoopBegin - return common_nodes.size() == 0; -} - -auto get_buffer_and_loop_end(const std::shared_ptr& loop_begin_down, - std::shared_ptr& loop_end_up, - std::shared_ptr& buffer) -> bool { - size_t fusion_input_num = 0; - for (const auto& parent : loop_begin_down->input_values()) { - const auto parent_shared = parent.get_node_shared_ptr(); - if (ov::is_type(parent_shared) || - ov::is_type(parent_shared) || - ov::is_type(parent_shared)) - continue; - - // We can fuse Loops even LoopBegin has several the same inputs (the common Buffer/LoopEnd) - if ((buffer && buffer == parent_shared) || (!buffer && loop_end_up && loop_end_up == parent_shared)) - continue; - - loop_end_up = ngraph::as_type_ptr(parent_shared); - buffer = ov::as_type_ptr(parent_shared); - if (buffer) { - if (buffer->output(0).get_target_inputs().size() == 0 || - buffer->get_input_source_output(0).get_target_inputs().size() != 1) - return false; - - loop_end_up = ngraph::as_type_ptr(buffer->get_input_node_shared_ptr(0)); - } - if (loop_end_up) - fusion_input_num++; - } - - return fusion_input_num == 1; -} - -auto collect_loop_inputs(const std::shared_ptr& loop_begin, - const std::shared_ptr& buffer, - std::vector& new_loop_inputs, - std::vector& new_ptr_increments, - std::vector& new_finalization_offsets) -> void { - const auto loop_end = loop_begin->get_loop_end(); - const auto ptr_increments = loop_end->get_ptr_increments(); - const auto finalization_offsets = loop_end->get_finalization_offsets(); - for (size_t i = 0; i < loop_begin->get_input_size(); i++) { - const auto input = loop_begin->input(i); - // Skip target Buffer - if (input.get_source_output().get_node_shared_ptr() != buffer) { - const auto edge = Edge{ input.get_source_output(), - loop_begin->output(input.get_index()).get_target_inputs() }; - new_loop_inputs.push_back(edge); - new_ptr_increments.push_back(ptr_increments[i]); - new_finalization_offsets.push_back(finalization_offsets[i]); - // Remove LoopBegin from Parent as target input - input.get_source_output().remove_target_input(input); - } - } -} - -auto collect_loop_outputs(const std::shared_ptr& loop_end, - const std::shared_ptr& buffer, - std::vector& new_loop_outputs, - std::vector& new_ptr_increments, - std::vector& new_finalization_offsets, - const bool reduce_max_case) -> bool { - const auto loop_begin = loop_end->get_loop_begin(); - const auto ptr_increments = loop_end->get_ptr_increments(); - const auto finalization_offsets = loop_end->get_finalization_offsets(); - bool is_current_reduce_max_case = false; - for (size_t i = 0; i < loop_end->get_output_size(); i++) { - // ReduceMax case. When Loop cannot have empty output as ngraph op, - // we should have fake edge through all Loops (LoopBegin->LoopEnd) which connect src and dst data. - // If we merge these this Loop and Loop Before, we should remove this fake edge - // because now we have real data for storing - auto new_input_node = loop_end->get_input_node_shared_ptr(i); - if (ov::is_type(new_input_node)) { - // We set temporary boolean variable because this value is for the next LoopEnd (upper), not for the current LoopEnd - is_current_reduce_max_case = true; - // Remove LoopEnd from Parent as target input - loop_end->input_value(i).remove_target_input(loop_end->input(i)); - } else { - const auto output = loop_end->output(i); - // Skip target Buffer - InputSet target_inputs; - for (const auto& input : output.get_target_inputs()) { - if (input.get_node()->shared_from_this() != buffer || reduce_max_case) { - target_inputs.insert(input); - } - } - - if (target_inputs.size()) { - const auto edge = Edge{loop_end->input_value(output.get_index()), target_inputs}; - new_loop_outputs.push_back(edge); - new_ptr_increments.push_back(ptr_increments[loop_begin->get_input_size() + i]); - new_finalization_offsets.push_back(finalization_offsets[loop_begin->get_input_size() + i]); - // Remove LoopEnd from Parent as target input - loop_end->input_value(i).remove_target_input(loop_end->input(i)); - } - } - } - - return is_current_reduce_max_case; -} - -} // namespace - - -bool ngraph::snippets::pass::LoopFusion::Merge(const std::shared_ptr& loop_begin_down) { - if (!loop_begin_down) { - return false; - } - - std::shared_ptr loop_end_up = nullptr; - std::shared_ptr buffer = nullptr; - // Initialize the corresponding upper LoopEnd and Buffer - if (!get_buffer_and_loop_end(loop_begin_down, loop_end_up, buffer)) { - return false; - } - // Check for conditions of fusion - if (!can_be_merged(loop_end_up, loop_begin_down)) { - return false; - } - - const auto loop_end_down = loop_begin_down->get_loop_end(); - const auto loop_begin_up = loop_end_up->get_loop_begin(); - const auto new_input_count = loop_begin_up->get_input_size() + loop_begin_down->get_input_size(); - const auto new_output_count = loop_end_up->get_output_size() + loop_end_down->get_output_size(); - const auto new_io_count = new_input_count + new_output_count; - const auto ptr_increments_up = loop_end_up->get_ptr_increments(); - const auto ptr_increments_down = loop_end_down->get_ptr_increments(); - const auto finalization_offsets_up = loop_end_up->get_finalization_offsets(); - const auto finalization_offsets_down = loop_end_down->get_finalization_offsets(); - std::vector new_ptr_increments, new_finalization_offsets; - new_ptr_increments.reserve(new_io_count); - new_finalization_offsets.reserve(new_io_count); - - // Collect new loop inputs - std::vector loop_inputs; - loop_inputs.reserve(new_input_count); - new_ptr_increments.reserve(new_io_count); - new_finalization_offsets.reserve(new_io_count); - collect_loop_inputs(loop_begin_up, buffer, loop_inputs, new_ptr_increments, new_finalization_offsets); - collect_loop_inputs(loop_begin_down, buffer, loop_inputs, new_ptr_increments, new_finalization_offsets); - - // Collect new Loop outputs - std::vector loop_outputs; - loop_outputs.reserve(new_output_count); - // We can fuse Loop with maximum accumulator pattern only with Smth input - // So firstly, we analyze LoopEnd down (it's possible maximum accumulator pattern), set `reduce_max_case` variable - // if it's really maximum accumulator pattern, and then analyze LoopEnd up using `reduce_max_case` variable - const bool reduce_max_case = collect_loop_outputs(loop_end_down, buffer, loop_outputs, new_ptr_increments, new_finalization_offsets, false); - collect_loop_outputs(loop_end_up, buffer, loop_outputs, new_ptr_increments, new_finalization_offsets, reduce_max_case); - if (reduce_max_case) { - const auto target_inputs = loop_begin_down->output(0).get_target_inputs(); - NGRAPH_CHECK(target_inputs.size() == 1, "LoopBegin in ReduceMax should have only one consumer (Load) for out port 0"); - const auto load = ov::as_type_ptr(target_inputs.begin()->get_node()->shared_from_this()); - NGRAPH_CHECK(load != nullptr, "LoopBegin in ReduceMax should have only one consumer for out port 0 - Load"); - - const auto store = ov::as_type_ptr(loop_end_up->get_input_node_shared_ptr(0)); - NGRAPH_CHECK(store != nullptr, "Before LoopEnd should be Store emitter"); - - // Connect vector emitters before Store and after Load - load->output(0).replace(store->get_input_source_output(0)); - } - - const auto new_increment = loop_end_up->get_increment(); - const auto new_work_amount = loop_end_up->get_work_amount(); - - // Create new LoopBegin - OutputVector new_loop_begin_inputs; - new_loop_begin_inputs.reserve(loop_inputs.size()); - for (const auto& loop_input : loop_inputs) { - const auto data_output = loop_input.first; - new_loop_begin_inputs.push_back(data_output); - } - const auto new_loop_begin = std::make_shared(new_loop_begin_inputs); - NGRAPH_CHECK(new_loop_begin->get_input_size() == loop_inputs.size(), "New LoopBegin has incorrect count of inputs."); - - // Connect new LoopBegin to input edges - for (size_t i = 0; i < loop_inputs.size(); i++) { - const auto edge = loop_inputs[i]; - for (auto& target_input : edge.second) { - target_input.replace_source_output(new_loop_begin->output(i)); - } - } - - // Create new LoopEnd - OutputVector new_loop_end_inputs; - new_loop_end_inputs.reserve(loop_outputs.size() + 1); // + 1 - for loop_begin - for (const auto& loop_output : loop_outputs) { - const auto data_output = loop_output.first; - new_loop_end_inputs.push_back(data_output); - } - new_loop_end_inputs.push_back(new_loop_begin->output(new_loop_begin->get_input_size())); - const auto new_loop_end = std::make_shared(new_loop_end_inputs, new_work_amount, new_increment, - new_ptr_increments, new_finalization_offsets); - NGRAPH_CHECK(new_loop_end->get_output_size() == loop_outputs.size(), "New LoopEnd has incorrect count of outputs."); - // Connect new LoopEnd to output edges - for (size_t i = 0; i < loop_outputs.size(); i++) { - const auto edge = loop_outputs[i]; - auto new_output = new_loop_end->output(i); - for (auto& target_input : edge.second) { - target_input.replace_source_output(new_output); - } - } - - if (reduce_max_case) { - loop_end_down->output(0).replace(buffer->output(0)); - } else { - // Remove old Loops and Load/Store if there are around Buffer - for (size_t i = 0; i < loop_end_up->get_input_size() - 1; i++) { - auto new_output = loop_end_up->input_value(i); - loop_end_up->output(i).replace(new_output); - new_output.remove_target_input(loop_end_up->input(i)); - } - for (size_t i = 0; i < loop_begin_down->get_input_size(); i++) { - const auto output_target_inputs = loop_begin_down->output(i).get_target_inputs(); - const auto new_output = loop_begin_down->input_value(i); - for (const auto &target_input : output_target_inputs) { - target_input.replace_source_output(new_output); - } - - // Clear old Buffer children - new_output.remove_target_input(loop_begin_down->input(i)); - } - } - - new_loop_end->has_outer_loop = loop_end_down->has_outer_loop || loop_end_up->has_outer_loop; - - loop_begin_up->transfer_control_dependents(new_loop_begin); - loop_begin_down->transfer_control_dependents(new_loop_begin); - loop_end_up->transfer_control_dependents(new_loop_end); - loop_end_down->transfer_control_dependents(new_loop_end); - new_loop_begin->add_node_control_dependencies(loop_begin_up); - new_loop_begin->add_node_control_dependencies(loop_begin_down); - new_loop_end->add_node_control_dependencies(loop_end_up); - new_loop_end->add_node_control_dependencies(loop_end_down); - - // If there was Buffer between Loops, after Loop fusion - // we should remove the Buffer node and MemoryAccess nodes if it's needed - if (buffer) { - const auto buffer_input = buffer->get_input_node_shared_ptr(0); - const auto buffer_output = buffer->output(0).get_target_inputs().begin()->get_node()->shared_from_this(); - - // If after merging there are Load and Store, we should remove them - if (const auto store = ov::as_type_ptr(buffer_input)) { - store->output(0).replace(store->input_value(0)); - } - if (const auto load = ov::as_type_ptr(buffer_output)) { - load->output(0).replace(load->input_value(0)); - } - - // Remove Buffer if there are no Loops and MatMul after Loop fusion - // because only these operations can have Buffer node on inputs and outputs. - // So if there aren't, it means that Buffer is extra, and we can remove it - if (!ov::is_type(buffer_output) && !ov::is_type(buffer_input) && - !ov::is_type(buffer_output) && !ov::is_type(buffer_input)) { - buffer->output(0).replace(buffer->input_value(0)); - } - } - - return true; -} - -ngraph::snippets::pass::LoopFusion::LoopFusion() { - MATCHER_SCOPE(LoopFusion); - - auto m_loop_begin = ngraph::pattern::wrap_type(); - - auto callback = [=](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::LoopFusion") - auto& pattern_to_output = m.get_pattern_value_map(); - const auto loop_begin = ngraph::as_type_ptr(pattern_to_output.at(m_loop_begin).get_node_shared_ptr()); - const auto status = Merge(loop_begin); - return status; - }; - - auto matcher = std::make_shared(m_loop_begin, matcher_name); - register_matcher(matcher, callback); -} diff --git a/src/common/snippets/src/pass/loop_helpers.cpp b/src/common/snippets/src/pass/loop_helpers.cpp deleted file mode 100644 index 2efa12c0d1d042..00000000000000 --- a/src/common/snippets/src/pass/loop_helpers.cpp +++ /dev/null @@ -1,48 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include "ngraph/op/op.hpp" -#include "snippets/pass/loop_helpers.hpp" - -namespace ngraph { -namespace snippets { -namespace op { -std::shared_ptr insertLoopBeginAfterOutputs(const OutputVector& originalOutputs) { - std::vector>> originalChildInputs; - for (const auto& out : originalOutputs) { - originalChildInputs.push_back(out.get_target_inputs()); - } - - auto loop_begin = std::make_shared(originalOutputs); - - for (size_t i = 0; i < originalChildInputs.size(); i++) { - for (auto& input : originalChildInputs[i]) { - input.replace_source_output(loop_begin->output(i)); - } - } - return loop_begin; -} - -std::shared_ptr insertLoopEndBeforeInputs(const std::vector>& originalInputs, - const std::shared_ptr& loopBegin, - size_t work_amount, size_t increment, - std::vector apply_increment, - std::vector finalization_offsets) { - OutputVector originalParentOutputs; - for (const auto& in : originalInputs) { - originalParentOutputs.push_back(in.get_source_output()); - } - originalParentOutputs.push_back(loopBegin->output(loopBegin->get_output_size() - 1)); - auto loop_end = std::make_shared(originalParentOutputs, work_amount, increment, - std::move(apply_increment), std::move(finalization_offsets)); - - for (size_t i = 0; i < originalInputs.size(); i++) { - originalInputs[i].replace_source_output(loop_end->output(i)); - } - return loop_end; -} - -} // namespace op -} // namespace snippets -} // namespace ngraph \ No newline at end of file diff --git a/src/common/snippets/src/pass/matmul_to_brgemm.cpp b/src/common/snippets/src/pass/matmul_to_brgemm.cpp index add672b0fef3ea..ba2d8c6311abe6 100644 --- a/src/common/snippets/src/pass/matmul_to_brgemm.cpp +++ b/src/common/snippets/src/pass/matmul_to_brgemm.cpp @@ -1,30 +1,42 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include "snippets/itt.hpp" - #include "snippets/pass/matmul_to_brgemm.hpp" +#include "snippets/itt.hpp" #include "snippets/snippets_isa.hpp" +#include "snippets/lowered/port_descriptor.hpp" -#include "ngraph/opsets/opset1.hpp" -#include "ngraph/rt_info.hpp" -#include "ngraph/pattern/op/wrap_type.hpp" +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { +void MatMulToBrgemm::init_ports(const std::shared_ptr& brgemm) const { + auto get_subtensor = [](const ov::Shape& shape) { + return std::vector{ lowered::PortDescriptor::ServiceDimensions::FULL_DIM, lowered::PortDescriptor::ServiceDimensions::FULL_DIM }; + }; + for (const auto& input : brgemm->inputs()) { + const auto tensor = input.get_shape(); + const auto subtensor = get_subtensor(tensor); + lowered::PortDescriptorUtils::set_port_descriptor_ptr(input, std::make_shared(tensor, subtensor)); + } + const auto tensor = brgemm->get_output_shape(0); + const auto subtensor = get_subtensor(tensor); + lowered::PortDescriptorUtils::set_port_descriptor_ptr(brgemm->output(0), std::make_shared(tensor, subtensor)); +} + MatMulToBrgemm::MatMulToBrgemm() { MATCHER_SCOPE(MatMulToBrgemm); - auto matmul_pattern = ngraph::pattern::wrap_type({ngraph::pattern::any_input(), - ngraph::pattern::any_input()}); + auto matmul_pattern = ov::pass::pattern::wrap_type({ov::pass::pattern::any_input(), ov::pass::pattern::any_input()}); - auto callback = [=](ngraph::pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::MatMulToBrgemm") + auto callback = [=](ov::pass::pattern::Matcher& m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::MatMulToBrgemm") auto& pm = m.get_pattern_value_map(); - const auto matmul = as_type_ptr(pm.at(matmul_pattern).get_node_shared_ptr()); + const auto matmul = as_type_ptr(pm.at(matmul_pattern).get_node_shared_ptr()); // Brgemm doesn't support transposed inputs currently, so we don't convert such matmuls if (matmul->get_transpose_a() || matmul->get_transpose_b()) return false; @@ -35,15 +47,16 @@ MatMulToBrgemm::MatMulToBrgemm() { nodes.emplace_back(std::make_shared(brgemm, matmul->get_output_element_type(0))); } brgemm->set_friendly_name(matmul->get_friendly_name()); - ngraph::copy_runtime_info(matmul, nodes); - ngraph::replace_node(matmul, nodes.back()); + ov::copy_runtime_info(matmul, nodes); + ov::replace_node(matmul, nodes.back()); + init_ports(brgemm); return true; }; - auto m = std::make_shared(matmul_pattern, matcher_name); + auto m = std::make_shared(matmul_pattern, matcher_name); register_matcher(m, callback); } } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/src/pass/mha_tokenization.cpp b/src/common/snippets/src/pass/mha_tokenization.cpp index 4a8ab99e89ff18..6898a423b5fdd0 100644 --- a/src/common/snippets/src/pass/mha_tokenization.cpp +++ b/src/common/snippets/src/pass/mha_tokenization.cpp @@ -1,38 +1,40 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include - #include "snippets/pass/mha_tokenization.hpp" + + +#include "snippets/itt.hpp" #include "snippets/pass/tokenization.hpp" #include "snippets/op/subgraph.hpp" -#include -#include -#include -#include +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" +#include "openvino/core/validation_util.hpp" namespace { -auto is_supported_tensor(const ngraph::descriptor::Tensor& t) -> bool { +auto is_supported_tensor(const ov::descriptor::Tensor& t) -> bool { // TODO: Add support of all supported by common tokenization element types - // return ngraph::snippets::pass::TokenizeSnippets::supported_element_types.count(input.get_element_type()) != 0; + // return ov::snippets::pass::TokenizeSnippets::supported_element_types.count(input.get_element_type()) != 0; // Also only 4D is supported at the moment - return t.get_element_type() == ngraph::element::f32 && t.get_partial_shape().is_static() && t.get_shape().size() == 4; + return t.get_element_type() == ov::element::f32 && t.get_partial_shape().is_static() && t.get_shape().size() == 4; } // TODO: Add support of FQ, Reshape? -auto is_supported_op(const std::shared_ptr& node) -> bool { - return ngraph::snippets::pass::TokenizeSnippets::AppropriateForSubgraph(node) && - (ngraph::is_type(node) || - ngraph::is_type(node) || - ngraph::is_type(node)); +auto is_supported_intermediate_op(const std::shared_ptr& node) -> bool { + const auto is_intermediate_op = [](const std::shared_ptr& node) { + return ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node); + }; + return is_intermediate_op(node) && ov::snippets::pass::TokenizeSnippets::AppropriateForSubgraph(node); } -auto is_valid_transpose(const std::shared_ptr& node, std::vector expected_order) -> bool { - auto valid_transpose_order = [expected_order](const std::shared_ptr& node) -> bool { - const auto transpose_pattern = ngraph::as_type_ptr(node); +auto is_valid_transpose(const std::shared_ptr& node, std::vector expected_order) -> bool { + auto valid_transpose_order = [expected_order](const std::shared_ptr& node) -> bool { + const auto transpose_pattern = ov::as_type_ptr(node); if (!transpose_pattern) return false; return transpose_pattern->cast_vector() == expected_order; @@ -61,7 +63,7 @@ auto tokenize_broadcast(const std::shared_ptr& interm_op, ov::NodeVect }; for (auto input : interm_op->inputs()) { - auto broadcast = ov::as_type_ptr(input.get_source_output().get_node_shared_ptr()); + auto broadcast = ov::as_type_ptr(input.get_source_output().get_node_shared_ptr()); // TODO: Can we reuse AppropriateForSubgraph here? Seems like it's huge check for Broadcast if (broadcast && broadcast->get_broadcast_spec().m_type == ov::op::AutoBroadcastType::NUMPY && broadcast->get_output_target_inputs(0).size() == 1) { @@ -71,14 +73,14 @@ auto tokenize_broadcast(const std::shared_ptr& interm_op, ov::NodeVect if (pshape.rank().is_static() && pshape.size() > 2) { ov::PartialShape::broadcast_merge_into(new_output_shape, skip_last_dim(pshape), - ::ngraph::op::AutoBroadcastType::NUMPY); + ::ov::op::AutoBroadcastType::NUMPY); } } else { const auto pshape = input.get_partial_shape(); if (pshape.rank().is_static() && pshape.size() > 2) { ov::PartialShape::broadcast_merge_into(new_output_shape, skip_last_dim(pshape), - ::ngraph::op::AutoBroadcastType::NUMPY); + ::ov::op::AutoBroadcastType::NUMPY); } } } @@ -91,9 +93,9 @@ auto tokenize_broadcast(const std::shared_ptr& interm_op, ov::NodeVect } auto tokenize_reshape_around_softmax(std::shared_ptr& interm_op, - std::shared_ptr& reshape, - ngraph::NodeVector& ordered_ops) -> bool { - reshape = ngraph::as_type_ptr(interm_op); + std::shared_ptr& reshape, + ov::NodeVector& ordered_ops) -> bool { + reshape = ov::as_type_ptr(interm_op); if (reshape) { const auto shape = reshape->get_input_shape(0); if (shape.back() != reshape->get_output_shape(0).back() || reshape->get_output_target_inputs(0).size() != 1) @@ -104,9 +106,25 @@ auto tokenize_reshape_around_softmax(std::shared_ptr& interm_op, return true; }; -auto update_intermediate_supported_ops(std::shared_ptr& interm_op, ngraph::NodeVector& ordered_ops) -> bool { +auto get_potential_body_params(const std::shared_ptr& op) -> size_t { + size_t count = 0; + for (size_t i = 1; i < op->get_input_size(); ++i) { + const auto input = op->input_value(i); + const auto parent = input.get_node_shared_ptr(); + const auto constant = ov::as_type_ptr(parent); + if (!(constant && (ov::shape_size(input.get_shape()) == 1 || + ov::is_type(op)|| + ov::snippets::op::Subgraph::constant_input_should_be_inside_body(op)))) { + count++; + } + } + return count; +} + +auto update_intermediate_supported_ops(std::shared_ptr& interm_op, ov::NodeVector& ordered_ops, + size_t& hidden_virtual_ports_count, size_t& potential_body_params_count) -> bool { // TODO: Add Reshape, FQ support - while (is_supported_op(interm_op)) { + while (is_supported_intermediate_op(interm_op)) { // All supported intermediate ops have only one output port // To verify output element type is enough because all supported intermediate ops have the same output element type as input type if (interm_op->get_output_target_inputs(0).size() != 1 || !is_supported_tensor(interm_op->get_output_tensor(0))) @@ -117,6 +135,36 @@ auto update_intermediate_supported_ops(std::shared_ptr& interm_op, ngr tokenize_broadcast(interm_op, ordered_ops); } + auto is_supported_branch_op = [&ordered_ops](const std::shared_ptr& op) { + return is_supported_intermediate_op(op) && + ov::snippets::pass::GetSnippetsNodeType(op) != ov::snippets::pass::SnippetsNodeType::SkippedByPlugin && + std::find(ordered_ops.begin(), ordered_ops.end(), op) == ordered_ops.end(); + }; + + for (size_t i = 0; i < interm_op->get_input_size(); ++i) { + const size_t shift = ordered_ops.size(); + auto parent = interm_op->get_input_node_shared_ptr(i); + while (is_supported_branch_op(parent)) { + // All supported ops have only one output port + if (parent->get_output_target_inputs(0).size() != 1) + break; + + // Add node only if there are scalar constants on inputs because of plugin-specific limitation + bool are_weights_scalar = true; + const auto parent_count = parent->get_input_size(); + for (size_t i = 1; i < parent_count; ++i) { + are_weights_scalar = are_weights_scalar && ov::shape_size(parent->get_input_shape(i)) == 1; + } + + ordered_ops.insert(ordered_ops.begin() + shift, parent); + // We think that sequence of ops goes through input port 0 + // But can be Select here? If it can be, parent shouldn't be on input port 0. Need another way? + parent = parent->get_input_node_shared_ptr(0); + } + } + + potential_body_params_count += get_potential_body_params(interm_op); + ordered_ops.push_back(interm_op); interm_op = interm_op->get_output_target_inputs(0).begin()->get_node()->shared_from_this(); } @@ -124,15 +172,15 @@ auto update_intermediate_supported_ops(std::shared_ptr& interm_op, ngr }; } // namespace -ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { +ov::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { MATCHER_SCOPE(TokenizeMHASnippets); - auto m_matmul0 = std::make_shared(ngraph::pattern::any_input(ngraph::pattern::has_static_shape()), - ngraph::pattern::any_input(ngraph::pattern::has_static_shape())); + auto m_matmul0 = std::make_shared(ov::pass::pattern::any_input(ov::pass::pattern::has_static_shape()), + ov::pass::pattern::any_input(ov::pass::pattern::has_static_shape())); - register_matcher(std::make_shared(m_matmul0, matcher_name), - [=](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::TokenizeMHASnippets") + register_matcher(std::make_shared(m_matmul0, matcher_name), + [=](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::TokenizeMHASnippets") auto& pattern_to_output = m.get_pattern_value_map(); // After some transformations, a different number of Constants for some operations may be created @@ -141,10 +189,26 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { // we should calculate potential number of non-scalar Constants that will be moved up from body. // TODO: Need update this variable when FQ will be supported size_t hidden_virtual_ports_count = 0; - // Default value is True because MHA pattern always requires Buffer op - bool need_buffer = true; + // Queries + Key + Values = 3 standard inputs of MHA + size_t potential_body_params_count = 3; + // The count of potential unique Buffers - it's hidden virtual ports as well + // We should go through Subgraph and calculate potential non-inplace Buffers count. + // Example: + // Buffer - i32 [32, 128] -> ~ Loop ~ -> Buffer - i8 [32, 128] + // After each Loop iteration we should increment pointers of Buffers: accordingly on 4 byte and 1 byte for scalar case. + // It means that these Buffers cannot be inplace => Each Buffer should have the own register + // For that we can just check the following "branches": + // - Between MatMul0 and MatMul1 - Softmax is sync point. The operations between MatMul0 -> Softmax and Softmax -> MatMul1 + // will be fused into one loop after conversion to snippet dialect (Because it's just FQ, Eltwise nodes) + // - Between MatMul0 and Transpose1 - At the moment operations after Transpose1 cannot be fused in Transpose Loop (to avoid performance regressions). + // But operations after Transpose1 and before MatMul0 will be fused into one loop as well (look at first point) + // Note: If the pass is updated, need to check the new possible branches for potential non-inplace Buffers! + // Default value is 1 because + // - Firstly Softmax always need to have Buffers + // - Secondly Softmax need 2 Buffer but they can be inplace - One virtual port is enough for Softmax + size_t buffer_count = 1; std::string fused_names; - ngraph::NodeVector ordered_ops; + ov::NodeVector ordered_ops; /* ======== Matcher Pass ========== */ @@ -161,7 +225,7 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { * \ / * MatMul1 */ - const auto matmul0 = ngraph::as_type_ptr(pattern_to_output.at(m_matmul0).get_node_shared_ptr()); + const auto matmul0 = ov::as_type_ptr(pattern_to_output.at(m_matmul0).get_node_shared_ptr()); if (!matmul0 || matmul0->get_output_target_inputs(0).size() != 1 || matmul0->get_transpose_a() || !is_supported_tensor(matmul0->get_input_tensor(0)) || !is_supported_tensor(matmul0->get_input_tensor(1))) return false; @@ -174,20 +238,20 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { auto interm_op = matmul0->get_output_target_inputs(0).begin()->get_node()->shared_from_this(); // Add supported operations which are between MatMul0 and Softmax to ordered_ops - if (!update_intermediate_supported_ops(interm_op, ordered_ops)) + if (!update_intermediate_supported_ops(interm_op, ordered_ops, hidden_virtual_ports_count, potential_body_params_count)) return false; - std::shared_ptr reshape0 = nullptr; + std::shared_ptr reshape0 = nullptr; if (!tokenize_reshape_around_softmax(interm_op, reshape0, ordered_ops)) return false; int64_t axis = 0; const auto rank = interm_op->get_input_partial_shape(0).rank(); - if (const auto softmax_v8 = ngraph::as_type_ptr(interm_op)) { + if (const auto softmax_v8 = ov::as_type_ptr(interm_op)) { OPENVINO_SUPPRESS_DEPRECATED_START - axis = ngraph::normalize_axis(interm_op->get_friendly_name(), softmax_v8->get_axis(), rank); + axis = ov::normalize_axis(interm_op->get_friendly_name(), softmax_v8->get_axis(), rank); OPENVINO_SUPPRESS_DEPRECATED_END - } else if (const auto softmax_v1 = ngraph::as_type_ptr(interm_op)) { + } else if (const auto softmax_v1 = ov::as_type_ptr(interm_op)) { axis = softmax_v1->get_axis(); } else { return false; @@ -198,7 +262,7 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { ordered_ops.push_back(interm_op); interm_op = interm_op->get_output_target_inputs(0).begin()->get_node()->shared_from_this(); - std::shared_ptr reshape1 = nullptr; + std::shared_ptr reshape1 = nullptr; if (!tokenize_reshape_around_softmax(interm_op, reshape1, ordered_ops)) return false; @@ -207,10 +271,10 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { return false; // Add supported operations which are between Softmax and MatMul1 to ordered_ops - if (!update_intermediate_supported_ops(interm_op, ordered_ops)) + if (!update_intermediate_supported_ops(interm_op, ordered_ops, hidden_virtual_ports_count, potential_body_params_count)) return false; - const auto matmul1 = ngraph::as_type_ptr(interm_op); + const auto matmul1 = ov::as_type_ptr(interm_op); if (!matmul1 || matmul1->get_output_target_inputs(0).size() != 1 || matmul1->get_transpose_a() || matmul1->get_transpose_b() || !is_supported_tensor(matmul1->get_input_tensor(0)) || !is_supported_tensor(matmul1->get_input_tensor(1))) return false; @@ -226,7 +290,7 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { // so firstly we insert Transpose1 on the beginning of ordered_ops and then Transpose1 bool are_weights_scalar = true; auto parent = matmul0->get_input_node_shared_ptr(1); - while (is_supported_op(parent)) { + while (is_supported_intermediate_op(parent)) { // All supported ops have only one output port // To verify output element type is enough because all supported ops have the same output element type as input type if (parent->get_output_target_inputs(0).size() != 1 || !is_supported_tensor(parent->get_output_tensor(0))) @@ -234,15 +298,16 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { const auto parent_count = parent->inputs().size(); for (size_t i = 1; i < parent_count; ++i) { - are_weights_scalar = are_weights_scalar && ngraph::shape_size(parent->get_input_shape(i)) == 1; + are_weights_scalar = are_weights_scalar && ov::shape_size(parent->get_input_shape(i)) == 1; } + potential_body_params_count += get_potential_body_params(parent); ordered_ops.insert(ordered_ops.begin(), parent); // We think that sequence of ops goes through input port 0 // But can be Select here? If it can be, parent shouldn't be on input port 0. Need another way? parent = parent->get_input_node_shared_ptr(0); } - auto transpose1 = ngraph::as_type_ptr(parent); + auto transpose1 = ov::as_type_ptr(parent); if (matmul0->get_transpose_b()) { if (is_valid_transpose(transpose1, {0, 2, 1, 3})) { // We can support several ops between MatMul0 with transposed_b and Transpose1 with 0213 order @@ -265,14 +330,14 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { // TODO: Add Reshape Support for all Transposes // Add 3D support for all Transposes - const auto transpose0 = ngraph::as_type_ptr(matmul0->get_input_node_shared_ptr(0)); + const auto transpose0 = ov::as_type_ptr(matmul0->get_input_node_shared_ptr(0)); if (is_valid_transpose(transpose0, {0, 2, 1, 3})) { ordered_ops.insert(ordered_ops.begin(), transpose0); } else if (matmul0->get_transpose_b()) { return false; } - const auto transpose2 = ngraph::as_type_ptr(matmul1->get_input_node_shared_ptr(1)); + const auto transpose2 = ov::as_type_ptr(matmul1->get_input_node_shared_ptr(1)); if (is_valid_transpose(transpose2, {0, 2, 1, 3})) { ordered_ops.push_back(transpose2); } @@ -285,7 +350,7 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { // ordered_ops.push_back(child); // } - auto transpose3 = ngraph::as_type_ptr(child); + auto transpose3 = ov::as_type_ptr(child); if (is_valid_transpose(transpose3, {0, 2, 1, 3})) { ordered_ops.push_back(transpose3); } @@ -296,17 +361,23 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { /* ====== Subgraph creation ======= */ - ngraph::OutputVector body_inputs, subgraph_inputs; - ngraph::ParameterVector body_parameters; - ngraph::ResultVector body_results; + // TODO: move this plugin-specific constraint to the plugin callback + const auto last_node = ordered_ops.back(); + if (potential_body_params_count + last_node->get_output_size() + hidden_virtual_ports_count + buffer_count > 12) { + return false; + } + + ov::OutputVector body_inputs, subgraph_inputs; + ov::ParameterVector body_parameters; + ov::ResultVector body_results; std::vector>> subgraph_result_inputs; - auto create_body_inputs = [&](const std::shared_ptr& node) -> void { + auto create_body_inputs = [&](const std::shared_ptr& node) -> void { for (size_t i = 0; i < node->get_input_size(); ++i) { const auto input = node->input(i); const auto parent = input.get_source_output().get_node_shared_ptr(); const auto constant = ov::as_type_ptr(parent); - if (constant && (ngraph::shape_size(input.get_shape()) == 1 || op::Subgraph::constant_input_should_be_inside_body(node))) { + if (constant && (ov::shape_size(input.get_shape()) == 1 || op::Subgraph::constant_input_should_be_inside_body(node))) { // If Constant has one consumer - target node, we add Constant to body_inputs // If Constant has several consumers, we should check that all these consumers are inside Subgraph body // and if all of them are inside body, we can explicitly add Constant to the body_inputs, otherwise we should @@ -318,7 +389,7 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { } else { const auto constant_consumers = constant->get_output_target_inputs(0); bool all_consumers_are_inside = std::all_of(constant_consumers.begin(), constant_consumers.end(), - [&ordered_ops](const ngraph::Input& input) { + [&ordered_ops](const ov::Input& input) { return std::find(ordered_ops.begin(), ordered_ops.end(), input.get_node()->shared_from_this()) != ordered_ops.end(); }); @@ -331,7 +402,7 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { } } } else if (std::find(ordered_ops.begin(), ordered_ops.end(), parent) == ordered_ops.end()) { - auto parameter = std::make_shared(input.get_element_type(), input.get_partial_shape()); + auto parameter = std::make_shared(input.get_element_type(), input.get_partial_shape()); body_parameters.push_back(parameter); body_parameters.back()->set_friendly_name(input.get_node()->get_friendly_name()); body_inputs.push_back(parameter->output(0)); @@ -349,23 +420,17 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { fused_names += op->get_friendly_name() + ","; } - const auto last_node = ordered_ops.back(); for (const auto& output : last_node->outputs()) { subgraph_result_inputs.push_back(output.get_target_inputs()); } for (const auto& output : last_node->outputs()) { - body_results.push_back(std::make_shared(last_node->output(output.get_index()))); + body_results.push_back(std::make_shared(last_node->output(output.get_index()))); } if (body_results.size() != subgraph_result_inputs.size()) { OPENVINO_THROW("body results and node results size mismatch during subgraph collapse"); } - // todo: move this plugin-specific constraint to the plugin callback - if (body_parameters.size() + body_results.size() + hidden_virtual_ports_count > 12) { - return false; - } - auto body = op::create_body(last_node->get_friendly_name(), body_results, body_parameters); auto subgraph = std::make_shared(subgraph_inputs, body); // Copy runtime info from last node to subgraph - to copy topological order @@ -387,7 +452,6 @@ ngraph::snippets::pass::TokenizeMHASnippets::TokenizeMHASnippets() { } subgraph->get_rt_info()["originalLayersNames"] = fused_names; subgraph->set_virtual_port_count(hidden_virtual_ports_count); - subgraph->set_buffer_needed(need_buffer); return true; diff --git a/src/common/snippets/src/pass/propagate_precision.cpp b/src/common/snippets/src/pass/propagate_precision.cpp index b4cae390be798b..8ee73a31b8167a 100644 --- a/src/common/snippets/src/pass/propagate_precision.cpp +++ b/src/common/snippets/src/pass/propagate_precision.cpp @@ -4,23 +4,23 @@ #include "snippets/pass/propagate_precision.hpp" -#include -#include #include "ov_ops/type_relaxed.hpp" #include "snippets/itt.hpp" -#include "ngraph/rt_info.hpp" +#include "snippets/utils.hpp" + +#include +#include -using namespace ngraph; -ngraph::snippets::pass::PropagatePrecision::PropagatePrecision( +ov::snippets::pass::PropagatePrecision::PropagatePrecision( const std::shared_ptr& target_machine) : target_machine(target_machine) { } -bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ptr& f) { +bool ov::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ptr& f) { RUN_ON_MODEL_SCOPE(PropagatePrecision); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::PropagatePrecision") + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::PropagatePrecision") - std::unordered_map, element::Type> result_types; + std::unordered_map, element::Type> result_types; auto results = f->get_results(); for (auto& result : results) { result_types.emplace(result, result->get_input_element_type(0)); @@ -29,12 +29,20 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ bool was_updated = false; for (const auto& op : f->get_ordered_ops()) { auto type_info = op->get_type_info(); - OPENVINO_ASSERT( - target_machine->has(type_info), - "operation '" + std::string(type_info.version_id) + "::" + std::string(type_info.name) + "' was not found in target machine"); + std::set supported_precisions; + // TODO: At the moment Softmax is decomposed on Linear IR level. + // When Softmax will be decomposed on NGraph level, remove it + if (type_info.is_castable(ov::op::v1::Softmax::get_type_info_static())) { + supported_precisions = {{ov::element::f32}}; + } else { + OPENVINO_ASSERT( + target_machine->has(type_info), + "operation '" + std::string(type_info.version_id) + "::" + std::string(type_info.name) + "' was not found in target machine"); + + auto exec = target_machine->get_supported_precisions(type_info); + supported_precisions = exec(op); + } - auto exec = target_machine->get_supported_precisions(type_info); - const auto supported_precisions = exec(op); if (supported_precisions.empty()) { continue; } @@ -45,7 +53,7 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ // 2) Type relaxed based operations. Will be resolved by snippet opset. for (const auto& input : op->inputs()) { - const auto convert = ngraph::as_type(input.get_source_output().get_node()); + const auto convert = ov::as_type(input.get_source_output().get_node()); if (convert == nullptr) { continue; } @@ -87,10 +95,10 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ "there are no supported precisions for operation '" + std::string(type_info.version_id) + "::" + std::string(type_info.name) + "'"); auto find_convert = []( - const ngraph::Output parent_output, - const ngraph::element::Type convert_type) -> snippets::op::ConvertSaturation* { + const ov::Output parent_output, + const ov::element::Type convert_type) -> snippets::op::ConvertSaturation* { for (const auto& input : parent_output.get_target_inputs()) { - const auto child = ngraph::as_type(input.get_node()); + const auto child = ov::as_type(input.get_node()); if ((child != nullptr) && (child->get_output_element_type(0) == convert_type)) { return child; } @@ -105,7 +113,7 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ const auto actual_before = parent_output.get_element_type(); if (actual_before != required_after) { was_updated = true; - auto existing_convert = ngraph::as_type( + auto existing_convert = ov::as_type( parent_output.get_node()); if (existing_convert == nullptr) { @@ -119,10 +127,10 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ if (existing_convert == nullptr) { // create new Convert - auto convert = std::make_shared( + auto convert = std::make_shared( parent_output, required_after); - ngraph::copy_runtime_info(parent_output.get_node_shared_ptr(), convert); + utils::safe_copy_runtime_info(parent_output.get_node_shared_ptr(), convert); op->set_argument(op_input.get_index(), convert); continue; } @@ -138,19 +146,19 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ if (can_be_fused(actual_after, required_after)) { // fuse existing convert - auto convert = std::make_shared( + auto convert = std::make_shared( existing_convert->get_input_node_shared_ptr(0), required_after); - ngraph::copy_runtime_info(parent_output.get_node_shared_ptr(), convert); + utils::safe_copy_runtime_info(parent_output.get_node_shared_ptr(), convert); op->set_argument(op_input.get_index(), convert); continue; } // create new convert - auto convert = std::make_shared( + auto convert = std::make_shared( existing_convert->output(0), required_after); - ngraph::copy_runtime_info(existing_convert->output(0).get_node()->shared_from_this(), convert); + utils::safe_copy_runtime_info(existing_convert->output(0).get_node()->shared_from_this(), convert); op->set_argument(op_input.get_index(), convert); } } @@ -169,10 +177,10 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ const auto expected_type = it->second; if (actual_type != it->second) { was_updated = true; - auto convert = std::make_shared( + auto convert = std::make_shared( result->get_input_node_shared_ptr(0), expected_type); - ngraph::copy_runtime_info(result->get_input_node_shared_ptr(0), convert); + utils::safe_copy_runtime_info(result->get_input_node_shared_ptr(0), convert); result->set_argument(0, convert); } } @@ -180,7 +188,7 @@ bool ngraph::snippets::pass::PropagatePrecision::run_on_model(const std::shared_ return was_updated; } -bool ngraph::snippets::pass::PropagatePrecision::validate_and_infer_types_and_restore_outputs(const std::shared_ptr& op) { +bool ov::snippets::pass::PropagatePrecision::validate_and_infer_types_and_restore_outputs(const std::shared_ptr& op) { bool was_updated = false; // update output precision @@ -212,10 +220,10 @@ bool ngraph::snippets::pass::PropagatePrecision::validate_and_infer_types_and_re if (output.get_element_type() != op_output_types[i]) { was_updated = true; - auto convert = std::make_shared( + auto convert = std::make_shared( output, op_output_types[i]); - ngraph::copy_runtime_info(output.get_node_shared_ptr(), convert); + utils::safe_copy_runtime_info(output.get_node_shared_ptr(), convert); for (auto& input : output.get_target_inputs()) { auto child = input.get_node(); @@ -226,7 +234,7 @@ bool ngraph::snippets::pass::PropagatePrecision::validate_and_infer_types_and_re input.replace_source_output(convert->output(0)); - if (ngraph::is_type(input.get_node())) { + if (ov::is_type(input.get_node())) { // Result input tensor name was changed, the name has to be restored // task #107826 input.get_tensor_ptr()->add_names(output.get_tensor_ptr()->get_names()); @@ -243,7 +251,7 @@ bool ngraph::snippets::pass::PropagatePrecision::validate_and_infer_types_and_re return was_updated; } -bool ngraph::snippets::pass::PropagatePrecision::can_be_removed( +bool ov::snippets::pass::PropagatePrecision::can_be_removed( const element::Type& actual_before, const element::Type& actual_after, const element::Type& required_after) noexcept { @@ -254,7 +262,7 @@ bool ngraph::snippets::pass::PropagatePrecision::can_be_removed( return can_be_fused(actual_after, actual_before); } -bool ngraph::snippets::pass::PropagatePrecision::can_be_fused( +bool ov::snippets::pass::PropagatePrecision::can_be_fused( const element::Type& actual, const element::Type& required) noexcept { if (actual == required) { @@ -278,7 +286,7 @@ bool ngraph::snippets::pass::PropagatePrecision::can_be_fused( (actual.bitwidth() > required.bitwidth()); } -std::vector ngraph::snippets::pass::PropagatePrecision::get_precisions( +std::vector ov::snippets::pass::PropagatePrecision::get_precisions( const std::vector& input_precisions, const std::set>& supported_precisions_pack) noexcept { bool was_found = false; diff --git a/src/common/snippets/src/pass/reset_buffer.cpp b/src/common/snippets/src/pass/reset_buffer.cpp deleted file mode 100644 index 54bdfef03f7f13..00000000000000 --- a/src/common/snippets/src/pass/reset_buffer.cpp +++ /dev/null @@ -1,114 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include -#include - -#include "snippets/snippets_isa.hpp" -#include "snippets/pass/reset_buffer.hpp" -#include "snippets/op/subgraph.hpp" - - -namespace { -void normalize_ptr_and_offsets(const ov::NodeVector &io, std::vector &ptr_increments, std::vector &finalization_offsets) { - bool there_is_buffer = false; - // Iterations are from end because before we correct finalization offsets for Loop outputs (io = inputs + outputs) - for (int i = static_cast(io.size()) - 1; i >= 0; --i) { - if (ov::is_type(io[i])) { - if (there_is_buffer) { - ptr_increments[i] = 0; - finalization_offsets[i] = 0; - } else { - there_is_buffer = true; - } - } - } -} -} // namespace - -int64_t ngraph::snippets::pass::ResetBufferState::calculate_required_finalization_offsets(const size_t back_step, const size_t target_work_amount) { - return target_work_amount != 1 ? -static_cast(back_step) : 0; -} - -ngraph::snippets::pass::ResetBufferState::ResetBufferState() { - MATCHER_SCOPE(ResetBufferState); - - // Match on LoopEnd is enough at the moment because Buffer op may be only after MatMul and LoopEnd, but - // MatMul doesn't change Buffer memory pointer after execution - auto m_loop_end = ngraph::pattern::wrap_type(); - - auto callback = [=](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::ResetBufferState") - auto& pattern_to_output = m.get_pattern_value_map(); - - const auto loop_end = ngraph::as_type_ptr(pattern_to_output.at(m_loop_end).get_node_shared_ptr()); - const auto loop_begin = loop_end->get_loop_begin(); - - const auto i_size = loop_begin->get_input_size(); - const auto o_size = loop_end->get_output_size(); - const auto count_io = i_size + o_size; - std::vector body_shapes(count_io); - ov::NodeVector io(count_io); - for (size_t i = 0; i < i_size; ++i) { - body_shapes[i] = loop_begin->input_value(i).get_partial_shape(); - io[i] = loop_begin->input_value(i).get_node_shared_ptr(); - auto port_idx = loop_begin->input_value(i).get_index(); - while (std::dynamic_pointer_cast(io[i])) { - const auto source_output = io[i]->input_value(port_idx); - io[i] = source_output.get_node_shared_ptr(); - port_idx = source_output.get_index(); - } - } - for (size_t i = 0; i < o_size; ++i) { - body_shapes[i_size + i] = loop_end->output(i).get_partial_shape(); - // check for first target input is enough for Buffer searching because operations can have only single Buffer per each output port as op - auto consumer = *loop_end->output(i).get_target_inputs().begin(); - auto port_idx = consumer.get_index(); - io[i_size + i] = consumer.get_node()->shared_from_this(); - while (std::dynamic_pointer_cast(io[i_size + i])) { - auto consumer = *io[i_size + i]->output(port_idx).get_target_inputs().begin(); - port_idx = consumer.get_index(); - io[i_size + i] = consumer.get_node()->shared_from_this(); - } - } - - auto ptr_increments = loop_end->get_ptr_increments(); - auto finalization_offsets = loop_end->get_finalization_offsets(); - - // If after Loop there is immediately Buffer, we should reset the Buffer ptr for the next calculations - for (size_t i = 0; i < o_size; ++i) { - // check for first target input is enough for Buffer searching because operations can have only single Buffer per each output port as op - const auto consumer = loop_end->output(i).get_target_inputs().begin()->get_node(); - if (const auto buffer = ov::as_type_ptr(consumer->shared_from_this())) { - // To calculate finalization offset we should know index of nesting Loop - auto loop_index = 0lu; - auto loop = loop_end->input_value(i).get_node_shared_ptr(); - auto port_idx = loop_end->input_value(i).get_index(); - while (std::dynamic_pointer_cast(loop)) { - const auto source_output = loop->input_value(port_idx); - loop = source_output.get_node_shared_ptr(); - port_idx = source_output.get_index(); - loop_index++; - } - const auto result_shape = buffer->get_allocation_shape(); - NGRAPH_CHECK(loop_index < result_shape.size(), "Buffer has invalid Loop index and allocation shape rank"); - const auto work_amount = std::accumulate(result_shape.rbegin(), result_shape.rbegin() + loop_index + 1, size_t(1), std::multiplies()); - finalization_offsets[i_size + i] = - calculate_required_finalization_offsets(work_amount, *(result_shape.rbegin() + loop_index)); - } - } - - // If there are several Buffers on I/O we should remember that all Buffer have the register, - // so we should update ptr for only one Buffer - normalize_ptr_and_offsets(io, ptr_increments, finalization_offsets); - loop_end->set_finalization_offsets(finalization_offsets); - loop_end->set_ptr_increments(ptr_increments); - - return true; - }; - - auto m = std::make_shared(m_loop_end, matcher_name); - register_matcher(m, callback); -} diff --git a/src/common/snippets/src/pass/set_softmax_ports.cpp b/src/common/snippets/src/pass/set_softmax_ports.cpp new file mode 100644 index 00000000000000..1651a6d6217495 --- /dev/null +++ b/src/common/snippets/src/pass/set_softmax_ports.cpp @@ -0,0 +1,57 @@ +// Copyright (C) 2018-2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/pass/set_softmax_ports.hpp" + +#include "snippets/itt.hpp" +#include "snippets/lowered/port_descriptor.hpp" + +#include "openvino/op/softmax.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" +#include "openvino/pass/pattern/op/or.hpp" +#include "openvino/core/validation_util.hpp" + + +ov::snippets::pass::SetSoftmaxPorts::SetSoftmaxPorts() { + MATCHER_SCOPE(SetSoftmaxPorts); + + auto m_softmax_v1 = ov::pass::pattern::wrap_type(); + auto m_softmax_v8 = ov::pass::pattern::wrap_type(); + auto m_softmax = std::make_shared(OutputVector{m_softmax_v1, m_softmax_v8}); + + auto callback = [](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::SetSoftmaxPorts") + auto root = m.get_match_root(); + + const auto& pshape = root->get_input_partial_shape(0); + if (pshape.is_dynamic()) + return false; + + const auto shape = pshape.get_shape(); + const auto rank = shape.size(); + + int64_t axis; + if (const auto softmax_v8 = ov::as_type_ptr(root)) { + OPENVINO_SUPPRESS_DEPRECATED_START + axis = ov::normalize_axis(root->get_friendly_name(), softmax_v8->get_axis(), rank); + OPENVINO_SUPPRESS_DEPRECATED_END + } else if (const auto softmax_v1 = ov::as_type_ptr(root)) { + axis = softmax_v1->get_axis(); + } else { + return false; + } + + OPENVINO_ASSERT(axis < static_cast(rank), "Softmax has incorrect axis"); + std::vector subtensor(rank, 1); + for (size_t i = axis; i < rank; ++i) + subtensor[i] = lowered::PortDescriptor::ServiceDimensions::FULL_DIM; + + lowered::PortDescriptorUtils::set_port_descriptor_ptr(root->input(0), std::make_shared(root->input(0), subtensor)); + lowered::PortDescriptorUtils::set_port_descriptor_ptr(root->output(0), std::make_shared(root->output(0), subtensor)); + + return true; + }; + + register_matcher(std::make_shared(m_softmax, matcher_name), callback); +} diff --git a/src/common/snippets/src/pass/softmax_decomposition.cpp b/src/common/snippets/src/pass/softmax_decomposition.cpp deleted file mode 100644 index a4a7ad77773327..00000000000000 --- a/src/common/snippets/src/pass/softmax_decomposition.cpp +++ /dev/null @@ -1,219 +0,0 @@ -// Copyright (C) 2018-2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include "snippets/remarks.hpp" -#include - -#include "snippets/pass/softmax_decomposition.hpp" -#include "snippets/pass/reset_buffer.hpp" -#include "snippets/pass/insert_loops.hpp" -#include "snippets/pass/loop_helpers.hpp" -#include "snippets/snippets_isa.hpp" - -#include -#include -#include -#include -#include - - -ngraph::snippets::pass::SoftmaxDecomposition::SoftmaxDecomposition(const size_t vector_size, const int32_t buffer_allocation_rank) { - MATCHER_SCOPE(SoftmaxDecomposition); - - auto m_softmax = ngraph::pattern::wrap_type(); - - auto callback = [=](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::SoftmaxDecomposition") - auto root = m.get_match_root(); - const auto master_pshape = root->get_input_partial_shape(0); - const auto rank = master_pshape.rank(); - if (rank.is_dynamic() || master_pshape.is_dynamic()) - return false; - - int64_t axis = 0; - if (const auto softmax_v8 = ngraph::as_type_ptr(root)) { - OPENVINO_SUPPRESS_DEPRECATED_START - axis = ngraph::normalize_axis(root->get_friendly_name(), softmax_v8->get_axis(), rank); - OPENVINO_SUPPRESS_DEPRECATED_END - } else if (const auto softmax_v1 = ngraph::as_type_ptr(root)) { - axis = softmax_v1->get_axis(); - } else { - return false; - } - - const auto shape_rank = rank.get_length(); - if (axis != shape_rank - 1) - return false; - - const auto data = root->get_input_node_shared_ptr(0); - - const auto master_shape = master_pshape.get_shape(); - const auto dimension = shape_rank - 1; - const auto work_amount = master_shape[dimension]; - const auto increment = vector_size; - const auto inner_dim = shape_rank - 1; - const auto inner_master_work_amount = static_cast(master_shape[inner_dim]); - const int outer_dim = shape_rank > 1 ? static_cast(shape_rank - 2) : -1; - const auto has_outer_loop = outer_dim >= 0 && master_shape[outer_dim] > 1; - - /* ====== ReduceMax decomposition ====== */ - - /* We have to have fake edge Data -> Loop[ReduceMax] -> Loop[Sub + Exp + ReduceSum] because ReduceMax is - * accumulator which finds maximum of elements and save it to vector register. Loop works only with GPR (data) but ReduceMax Loop - * doesn't save maximum to data. Seems like, LoopEnd shouldn't have outputs: - * Data - * VectorBuffer LoopBegin \ - * \ Load \ | - * Maximum / | - * / LoopEnd | - * HorizonMax / - * \ LoopBegin[Sub + Exp + ReduceSum] - * But nGraph doesn't allow to have 0 outputs for Node (at least 1 output). - * Thus, we propagate data through Loop[ReduceMax] using fake edge because of that Loop[ReduceMax] has two inputs "Data" - * Data - * VectorBuffer LoopBegin - * \ Load | \ - * Maximum | / - * / LoopEnd - * HorizonMax | - * \ LoopBegin[Sub + Exp + ReduceSum] - */ - const auto vector_buffer_max = std::make_shared(); - const auto loop_max_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{data, data}); - - const auto load_max = std::make_shared(loop_max_begin->output(0), increment); - const auto max = std::make_shared(load_max, vector_buffer_max); - - auto apply_increments_max = - InsertLoops::calculate_inner_apply_increments(master_shape, {data->get_shape(), data->get_shape(), data->get_shape()}); - // Input of softmax is Input and Output of this loop, which isn't used inside (it's just to have one output in Loop at least) - // So we shouldn't increment pointer after each loop iteration - apply_increments_max[0] = false; - apply_increments_max[1] = false; - // we should always reset data ptr after this loop because in the next Loop this ptr is used - // Although output isn't a Buffer op, we set finalization offset and ptr increment for output, because ResetBufferState pass - // normalizes offsets and increments starting from outputs - const auto finalization_offsets_max = - std::vector{ 0, 0, ResetBufferState::calculate_required_finalization_offsets(inner_master_work_amount, data->get_shape()[inner_dim]) }; - const auto loop_max_end = std::make_shared(ngraph::OutputVector{loop_max_begin->output(1), loop_max_begin->output(2)}, - work_amount, increment, apply_increments_max, finalization_offsets_max); - - const auto horizon_max = std::make_shared(max); - - /* =========================================== */ - - /* === Sub + Exp + ReduceSum decomposition === */ - - const auto vector_buffer_sum = std::make_shared(); - const auto loop_sum_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{loop_max_end->output(0)}); - - const auto load_sub = std::make_shared(loop_sum_begin->output(0), increment); - const auto sub = std::make_shared(load_sub, horizon_max); - const auto exp = std::make_shared(sub); - const auto sum = std::make_shared(exp, vector_buffer_sum); - const auto store_exp = std::make_shared(exp, increment); - - auto apply_increments_sum = - InsertLoops::calculate_inner_apply_increments(master_shape, {load_sub->get_shape(), store_exp->get_shape()}); - std::vector finalization_offsets_sum(2, 0); - if (has_outer_loop) { - finalization_offsets_sum = - InsertLoops::calculate_finalization_offsets(master_shape, {load_sub->get_shape(), store_exp->get_shape()}); - } - // we should always reset buffer ptr after loop because in the next Loop this buffer ptr is used - finalization_offsets_sum[1] = ResetBufferState::calculate_required_finalization_offsets(inner_master_work_amount, store_exp->get_shape()[inner_dim]); - const auto loop_sum_end = std::make_shared( - ngraph::OutputVector{store_exp, loop_sum_begin->output(1)}, work_amount, increment, - apply_increments_sum, finalization_offsets_sum); - - const auto horizon_sum = std::make_shared(sum); - const auto buffer_exp = std::make_shared(loop_sum_end->output(0), buffer_allocation_rank); - - /* =========================================== */ - - /* ================== Div ==================== */ - - // Divide is expensive operation, so we decompose it into 1 / x * y, where 1 / x is executed outside loop - const auto pow = std::make_shared(horizon_sum, - ngraph::op::Constant::create(ov::element::f32, ngraph::Shape{}, {-1})); - - const auto loop_div_begin = op::insertLoopBegin(ngraph::OutputVector{buffer_exp}); - - const auto load_div = std::make_shared(loop_div_begin->output(0), increment); - const auto mul = std::make_shared(load_div, pow); - const auto store_div = std::make_shared(mul, increment); - - auto apply_increments_div = - InsertLoops::calculate_inner_apply_increments(master_shape, {load_div->get_shape(), store_div->get_shape()}); - std::vector finalization_offsets_div(2, 0); - if (has_outer_loop) { - finalization_offsets_div = - InsertLoops::calculate_finalization_offsets(master_shape, {load_div->get_shape(), store_div->get_shape()}); - } - const auto loop_div_end = std::make_shared( - ngraph::OutputVector{store_div, loop_div_begin->output(1)}, work_amount, increment, - apply_increments_div, finalization_offsets_div); - - /* =========================================== */ - - /* ========== Control dependency ============= */ - - loop_max_begin->add_control_dependency(vector_buffer_max); - loop_max_end->add_control_dependency(max); - horizon_max->add_control_dependency(loop_max_end); - loop_sum_begin->add_control_dependency(vector_buffer_sum); - loop_sum_begin->add_control_dependency(horizon_max); - loop_sum_end->add_control_dependency(sum); - horizon_sum->add_control_dependency(loop_sum_end); - loop_div_begin->add_control_dependency(horizon_sum); - loop_div_begin->add_control_dependency(pow); - - /* =========================================== */ - - /* ============= Runtime Info ================ */ - - // For tail loop we should fill input of Max by float min and - // input of Sum by zero to avoid math incorrect calculations - max->input(0).get_rt_info()["set_fill"] = uint32_t(0xff7fffff); - sum->input(0).get_rt_info()["set_fill"] = uint32_t(0x00000000); - - // These nodes should be executed outside loops - ov::NodeVector ops_outside_loop = { vector_buffer_max, horizon_max, vector_buffer_sum, horizon_sum, pow, buffer_exp }; - for (const auto& op : ops_outside_loop) { - op->get_rt_info()["outside_loop"] = true; - } - - ngraph::copy_runtime_info(root, - {vector_buffer_max, loop_max_begin, load_max, max, horizon_max, loop_max_end, - vector_buffer_sum, loop_sum_begin, load_sub, sub, exp, sum, store_exp, horizon_sum, loop_sum_end, buffer_exp, pow, - loop_div_begin, load_div, mul, store_div, loop_div_end}); - - /* =========================================== */ - - ngraph::replace_node(root, loop_div_end); - - /* ============== Outer loop ================= */ - if (has_outer_loop) { - std::vector apply_increments = - InsertLoops::calculate_outer_apply_increments({root->get_input_shape(0), root->get_output_shape(0)}); - const auto softmax_parameters = - std::vector>{loop_max_begin->input(0).get_source_output()}; - const auto output_set = loop_div_end->output(0).get_target_inputs(); - const auto softmax_results = std::vector>{output_set.begin(), output_set.end()}; - const auto& outer_loop_begin = ngraph::snippets::op::insertLoopBegin(softmax_parameters); - const auto outer_loop_end = ngraph::snippets::op::insertLoopEndBeforeInputs( - softmax_results, outer_loop_begin, master_shape[outer_dim], 1, apply_increments); - - vector_buffer_max->add_control_dependency(outer_loop_begin); - - ngraph::copy_runtime_info(root, {outer_loop_begin, outer_loop_end}); - } - /* =========================================== */ - - return true; - }; - - auto m = std::make_shared(m_softmax, matcher_name); - register_matcher(m, callback); -} diff --git a/src/common/snippets/src/pass/softmax_reshape_elimination.cpp b/src/common/snippets/src/pass/softmax_reshape_elimination.cpp index 7229b29cc54ccd..2f60f1e1155c76 100644 --- a/src/common/snippets/src/pass/softmax_reshape_elimination.cpp +++ b/src/common/snippets/src/pass/softmax_reshape_elimination.cpp @@ -1,26 +1,26 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/remarks.hpp" #include "snippets/pass/softmax_reshape_elimination.hpp" #include "snippets/snippets_isa.hpp" -#include -#include +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" #include -ngraph::snippets::pass::SoftmaxReshapeElimination::SoftmaxReshapeElimination() { +ov::snippets::pass::SoftmaxReshapeElimination::SoftmaxReshapeElimination() { MATCHER_SCOPE(SoftmaxReshapeElimination); - const auto m_reshape0 = pattern::wrap_type(pattern::has_static_shape()); - const auto m_softmax = pattern::wrap_type({m_reshape0}); - const auto m_reshape1 = pattern::wrap_type({m_softmax, pattern::wrap_type()}); + const auto m_reshape0 = ov::pass::pattern::wrap_type(ov::pass::pattern::has_static_shape()); + const auto m_softmax = ov::pass::pattern::wrap_type({m_reshape0}); + const auto m_reshape1 = ov::pass::pattern::wrap_type({m_softmax, ov::pass::pattern::wrap_type()}); - register_matcher(std::make_shared(m_reshape1, matcher_name), - [=](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::SoftmaxReshapeElimination") + register_matcher(std::make_shared(m_reshape1, matcher_name), + [=](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::SoftmaxReshapeElimination") auto& pattern_to_output = m.get_pattern_value_map(); auto reshape0 = pattern_to_output[m_reshape0].get_node_shared_ptr(); auto softmax = pattern_to_output[m_softmax].get_node_shared_ptr(); @@ -33,11 +33,11 @@ ngraph::snippets::pass::SoftmaxReshapeElimination::SoftmaxReshapeElimination() { const auto softmax_rank = softmax->get_input_partial_shape(0).rank(); int64_t axis = 0; - if (const auto softmax_v8 = ngraph::as_type_ptr(softmax)) { + if (const auto softmax_v8 = ov::as_type_ptr(softmax)) { OPENVINO_SUPPRESS_DEPRECATED_START - axis = ngraph::normalize_axis(softmax->get_friendly_name(), softmax_v8->get_axis(), softmax_rank); + axis = ov::normalize_axis(softmax->get_friendly_name(), softmax_v8->get_axis(), softmax_rank); OPENVINO_SUPPRESS_DEPRECATED_END - } else if (const auto softmax_v1 = ngraph::as_type_ptr(softmax)) { + } else if (const auto softmax_v1 = ov::as_type_ptr(softmax)) { axis = softmax_v1->get_axis(); } else { return false; @@ -61,9 +61,9 @@ ngraph::snippets::pass::SoftmaxReshapeElimination::SoftmaxReshapeElimination() { // update axis const auto new_axis = input_shape.rank().get_length() - 1; - if (auto softmax_v8 = ngraph::as_type_ptr(softmax)) { + if (auto softmax_v8 = ov::as_type_ptr(softmax)) { softmax_v8->set_axis(new_axis); - } else if (auto softmax_v1 = ngraph::as_type_ptr(softmax)) { + } else if (auto softmax_v1 = ov::as_type_ptr(softmax)) { softmax_v1->set_axis(new_axis); } diff --git a/src/common/snippets/src/pass/tokenization.cpp b/src/common/snippets/src/pass/tokenization.cpp index e0aa78b4e2dad7..bdf684ef3fba6c 100644 --- a/src/common/snippets/src/pass/tokenization.cpp +++ b/src/common/snippets/src/pass/tokenization.cpp @@ -1,25 +1,26 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include +#include "snippets/itt.hpp" #include "snippets/pass/tokenization.hpp" #include "snippets/pass/common_optimizations.hpp" +#include "openvino/pass/manager.hpp" -namespace ngraph { +namespace ov { namespace snippets { namespace pass { void SetSnippetsNodeType(const std::shared_ptr &node, SnippetsNodeType nodeType) { - auto &rt = node->get_rt_info(); + auto& rt = node->get_rt_info(); rt["SnippetsNodeType"] = nodeType; } SnippetsNodeType GetSnippetsNodeType(const std::shared_ptr &node) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::GetSnippetsNodeType") - auto &rt = node->get_rt_info(); + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::GetSnippetsNodeType") + auto& rt = node->get_rt_info(); const auto rinfo = rt.find("SnippetsNodeType"); if (rinfo == rt.end()) return SnippetsNodeType::NotSet; @@ -27,13 +28,13 @@ SnippetsNodeType GetSnippetsNodeType(const std::shared_ptr &node) { } void SetTopologicalOrder(const std::shared_ptr &node, int64_t order) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::SetTopologicalOrder") - auto &rt = node->get_rt_info(); + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::SetTopologicalOrder") + auto& rt = node->get_rt_info(); rt["TopologicalOrder"] = order; } int64_t GetTopologicalOrder(const std::shared_ptr &node) { - auto &rt = node->get_rt_info(); + auto& rt = node->get_rt_info(); const auto rinfo = rt.find("TopologicalOrder"); if (rinfo == rt.end()) OPENVINO_THROW("Topological order is required, but not set."); @@ -41,10 +42,10 @@ int64_t GetTopologicalOrder(const std::shared_ptr &node) { } bool EnumerateNodes::run_on_model(const std::shared_ptr &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::EnumerateNodes") + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::EnumerateNodes") int64_t order = 0; // Todo: We don't really have to set order for every node, just for subgraph parents and children would be enough - for (auto &node : m->get_ordered_ops()) { + for (auto& node : m->get_ordered_ops()) { SetTopologicalOrder(node, order++); } return true; @@ -53,7 +54,7 @@ bool EnumerateNodes::run_on_model(const std::shared_ptr &m) { bool SnippetsTokenization::run_on_model(const std::shared_ptr& m) { RUN_ON_FUNCTION_SCOPE(SnippetsTokenization); - ngraph::pass::Manager manager(get_pass_config()); + ov::pass::Manager manager(get_pass_config()); manager.set_per_pass_validation(false); manager.register_pass(); @@ -69,4 +70,4 @@ bool SnippetsTokenization::run_on_model(const std::shared_ptr& m) { } // namespace pass } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/src/pass/transform_convert.cpp b/src/common/snippets/src/pass/transform_convert.cpp index b959b2f608bc5a..b6ea483b81b779 100644 --- a/src/common/snippets/src/pass/transform_convert.cpp +++ b/src/common/snippets/src/pass/transform_convert.cpp @@ -2,35 +2,33 @@ // SPDX-License-Identifier: Apache-2.0 // -#include "snippets/remarks.hpp" -#include +#include "snippets/itt.hpp" #include "snippets/pass/transform_convert.hpp" #include "snippets/snippets_isa.hpp" -#include -#include -#include +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" -ngraph::snippets::pass::TransformConvertToConvertTruncation::TransformConvertToConvertTruncation() { +ov::snippets::pass::TransformConvertToConvertTruncation::TransformConvertToConvertTruncation() { MATCHER_SCOPE(TransformConvertToConvertTruncation); - auto convert = std::make_shared(pattern::any_input(), + auto convert = std::make_shared(ov::pass::pattern::any_input(), [](const std::shared_ptr &n) { - return ov::is_type(n) && + return ov::is_type(n) && !ov::is_type(n) && !ov::is_type(n); }); - register_matcher(std::make_shared( - ngraph::pattern::wrap_type(), matcher_name), [](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::TransformConvertToConvertTruncation") + register_matcher(std::make_shared( + ov::pass::pattern::wrap_type(), matcher_name), [](ov::pass::pattern::Matcher &m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::TransformConvertToConvertTruncation") const auto root = m.get_match_root(); - const auto convert = ngraph::as_type_ptr(root); + const auto convert = ov::as_type_ptr(root); auto convert_truncation = std::make_shared(convert->get_input_source_output(0), convert->get_destination_type()); convert_truncation->set_friendly_name(convert->get_friendly_name()); - ngraph::copy_runtime_info(convert, convert_truncation); - ngraph::replace_node(convert, convert_truncation); + ov::copy_runtime_info(convert, convert_truncation); + ov::replace_node(convert, convert_truncation); return true; }); diff --git a/src/common/snippets/src/pass/transpose_decomposition.cpp b/src/common/snippets/src/pass/transpose_decomposition.cpp index 5dc6960b2fd71a..bb581105a7523a 100644 --- a/src/common/snippets/src/pass/transpose_decomposition.cpp +++ b/src/common/snippets/src/pass/transpose_decomposition.cpp @@ -1,35 +1,36 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include -#include -#include -#include +#include "snippets/pass/transpose_decomposition.hpp" -#include -#include -#include -#include -#include -#include -#include -const std::set> ngraph::snippets::pass::TransposeDecomposition::supported_cases = {{0, 2, 3, 1}}; -ngraph::snippets::pass::TransposeDecomposition::TransposeDecomposition() { +#include "snippets/itt.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/lowered/port_descriptor.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" + +namespace ov { +namespace snippets { +namespace pass { +using namespace lowered; + +const std::set> TransposeDecomposition::supported_cases = {{0, 2, 3, 1}}; + +TransposeDecomposition::TransposeDecomposition() { MATCHER_SCOPE(TransposeDecomposition); - // todo: we need a special transformation that detects and propagates data access pattern to Parameters and Results - // this is needed to communicate access pattern to the plugin node and op::Kernel - // This is the reason we match only to Parameter, this limitation could be relaxed if we propagate access pattern - // to the appropriate parameter - auto match_data = ngraph::pattern::wrap_type(); - auto match_order = ngraph::pattern::wrap_type(); - auto match_transpose = ngraph::pattern::wrap_type({match_data, match_order}); + // Todo: we need a special transformation that detects and propagates data access pattern to Parameters and Results + // this is needed to communicate access pattern to the plugin node and op::Kernel + // This is the reason we match only to Parameter, this limitation could be relaxed if we propagate access pattern + // to the appropriate parameter + auto match_data = ov::pass::pattern::wrap_type(); + auto match_order = ov::pass::pattern::wrap_type(); + auto match_transpose = ov::pass::pattern::wrap_type({match_data, match_order}); - ngraph::matcher_pass_callback callback = [=](ngraph::pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::TransposeDecomposition") + ov::matcher_pass_callback callback = [=](ov::pass::pattern::Matcher& m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::TransposeDecomposition") auto& pattern_to_output = m.get_pattern_value_map(); - const auto transpose = ov::as_type_ptr( - pattern_to_output.at(match_transpose).get_node_shared_ptr()); + const auto& data_input = pattern_to_output.at(match_data); + const auto transpose = ov::as_type_ptr(pattern_to_output.at(match_transpose).get_node_shared_ptr()); const auto order = ov::as_type_ptr(pattern_to_output.at(match_order).get_node_shared_ptr()); if (transformation_callback(transpose) || transpose->is_dynamic()) @@ -39,43 +40,31 @@ ngraph::snippets::pass::TransposeDecomposition::TransposeDecomposition() { if (supported_cases.count(order_value) == 0) return false; - auto data_input = pattern_to_output.at(match_data); - const auto& data_node = pattern_to_output.at(match_data).get_node_shared_ptr(); - auto ¶m_rt = data_node->get_rt_info(); - // Note: store and usage inside emitters as size_t is more convenient, so static_cast here - const auto& access_pattern = order->cast_vector(); - param_rt["Layout"] = access_pattern; + // number of elements that can be processed on every iteration. For 0,1,2,3 -> 0,2,3,1 we can guarantee only scalar access + const auto subtensor = std::vector{1}; + const auto& layout = order->cast_vector(); - // The line below is Ok, since we ensured that transpose is static above - auto data_shape = data_input.get_shape(); - // dim indexes with respect to SRC - const auto dim_C_idx = data_shape.size() - 3; - const auto dim_H_idx = data_shape.size() - 2; - const auto dim_W_idx = data_shape.size() - 1; - const auto size_C = static_cast(data_shape[dim_C_idx]); - const auto size_W = static_cast(data_shape[dim_W_idx]); - const auto size_H = static_cast(data_shape[dim_H_idx]); - - auto loop_W_begin = std::make_shared(OutputVector{data_input}); - auto loop_C_begin = std::make_shared(OutputVector{loop_W_begin->output(0)}); // todo: LoadReshape used here is essentially Load + an easy way to maintain correct shape propagation // fix this in future and develop a more consistent shape propagation approach. - auto load = std::make_shared(loop_C_begin->output(0), 1, 0, access_pattern); - auto store = std::make_shared(load, 1); - const std::vector ptr_increments_C {size_H * size_W, 1}; - const std::vector finalization_offsets_C {1 - size_H * size_W * size_C, 0}; - auto loop_C_end = std::make_shared(OutputVector{store->output(0), loop_C_begin->output(1)}, - size_C, 1, ptr_increments_C, finalization_offsets_C); - auto loop_W_end = std::make_shared(OutputVector{loop_C_end->output(0), loop_W_begin->output(1)}, - size_W, 1, std::vector{0, 0}, std::vector{0, 0}); + auto load = std::make_shared(data_input, subtensor[0], 0, layout); + auto store = std::make_shared(load, subtensor[0]); + + PortDescriptorUtils::set_port_descriptor_ptr(load->input(0), std::make_shared(load->get_input_shape(0), subtensor, layout)); + PortDescriptorUtils::set_port_descriptor_ptr(load->output(0), std::make_shared(load->get_output_shape(0), subtensor)); + PortDescriptorUtils::set_port_descriptor_ptr(store->input(0), std::make_shared(store->get_input_shape(0), subtensor)); + PortDescriptorUtils::set_port_descriptor_ptr(store->output(0), std::make_shared(store->get_output_shape(0), subtensor)); for (auto& input : transpose->output(0).get_target_inputs()) { - input.replace_source_output(loop_W_end->output(0)); + input.replace_source_output(store->output(0)); } return true; }; - auto m = std::make_shared(match_transpose, matcher_name); + auto m = std::make_shared(match_transpose, matcher_name); register_matcher(m, callback); } + +} // namespace pass +} // namespace snippets +} // namespace ov diff --git a/src/common/snippets/src/pass/vector_to_scalar.cpp b/src/common/snippets/src/pass/vector_to_scalar.cpp deleted file mode 100644 index 4f98a49de4eedd..00000000000000 --- a/src/common/snippets/src/pass/vector_to_scalar.cpp +++ /dev/null @@ -1,49 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include "snippets/pass/vector_to_scalar.hpp" -#include "snippets/snippets_isa.hpp" - -#include -#include - -ngraph::snippets::pass::SetScalarCountForLoad::SetScalarCountForLoad() { - MATCHER_SCOPE(SetScalarCountForLoad); - register_matcher(std::make_shared( - ngraph::pattern::wrap_type(), matcher_name), - [this](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::SetScalarCountForLoad_callback") - auto root = m.get_match_root(); - if (transformation_callback(root)) - return false; - - const auto load = ov::as_type_ptr(root); - if (!load) - return false; - - load->set_input_count(1lu, 0); - return true; - }); -} - -ngraph::snippets::pass::SetScalarCountForStore::SetScalarCountForStore() { - MATCHER_SCOPE(SetScalarCountForStore); - register_matcher(std::make_shared( - ngraph::pattern::wrap_type(), matcher_name), - [this](ngraph::pattern::Matcher &m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::SetScalarCountForStore_callback") - auto root = m.get_match_root(); - if (transformation_callback(root)) - return false; - - const auto store = ov::as_type_ptr(root); - if (!store) - return false; - - store->set_output_count(1lu, 0); - return true; - }); -} diff --git a/src/common/snippets/src/utils.cpp b/src/common/snippets/src/utils.cpp index 8d294455150628..02ec54af2d8dbe 100644 --- a/src/common/snippets/src/utils.cpp +++ b/src/common/snippets/src/utils.cpp @@ -5,17 +5,19 @@ #include "snippets/utils.hpp" #include "snippets/pass/fq_decomposition.hpp" +#include "openvino/core/rt_info.hpp" -namespace ngraph { + +namespace ov { namespace snippets { namespace utils { -auto get_non_scalar_constant_count_for_fq(const std::shared_ptr& fq) -> size_t { +auto get_non_scalar_constant_count_for_fq(const std::shared_ptr& fq) -> size_t { std::vector cl, ch, isc, ish, osc, osh; - const bool status = ngraph::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts(fq, cl, ch, isc, ish, osc, osh); + const bool status = ov::snippets::pass::FakeQuantizeDecomposition::getScalesAndShifts(fq, cl, ch, isc, ish, osc, osh); bool is_optimized = false; // The case when we can calculate only scales if (status) { - const auto out_scales = ngraph::snippets::pass::FakeQuantizeDecomposition::calculateScales(fq->get_output_element_type(0), cl, ch, isc, ish, osc, osh); + const auto out_scales = ov::snippets::pass::FakeQuantizeDecomposition::calculateScales(fq->get_output_element_type(0), cl, ch, isc, ish, osc, osh); is_optimized = out_scales.size() != 0; } @@ -24,10 +26,10 @@ auto get_non_scalar_constant_count_for_fq(const std::shared_ptrinput(1).get_shape()) != 1lu; - const bool ih = ngraph::shape_size(fq->input(2).get_shape()) != 1lu; - const bool ol = !only_quantized && ngraph::shape_size(fq->input(3).get_shape()) != 1lu; - const bool oh = !only_quantized && ngraph::shape_size(fq->input(4).get_shape()) != 1lu; + const bool il = ov::shape_size(fq->input(1).get_shape()) != 1lu; + const bool ih = ov::shape_size(fq->input(2).get_shape()) != 1lu; + const bool ol = !only_quantized && ov::shape_size(fq->input(3).get_shape()) != 1lu; + const bool oh = !only_quantized && ov::shape_size(fq->input(4).get_shape()) != 1lu; // FakeQuantize decompoisition has the folowwing formula: // round(x * (levels-1) / (ih - il) - il * (levels-1) / (ih - il)) * (oh - ol) / (levels-1) + ol @@ -67,27 +69,6 @@ auto get_non_scalar_constant_count_for_fq(const std::shared_ptr get_node_output_layout(const std::shared_ptr& node) { - return get_node_output_layout(node.get()); -} -std::vector get_node_output_layout(const Node* node) { - if (!node) - return {}; - if (node->is_dynamic()) - OPENVINO_THROW("It's illegal to call get_node_output_layout for dynamic nodes"); - auto &rt = node->get_rt_info(); - const auto rinfo = rt.find("Layout"); - if (rinfo != rt.end()) { - std::vector layout(rinfo->second.as>()); - // This might be a little costy, but still useful sanity check. Remove if proved to be unacceptably heavy. - std::set unique_elements(layout.begin(), layout.end()); - if (unique_elements.size() < layout.size()) - OPENVINO_THROW("Layout must contain only unique dimension indexes"); - return layout; - } else { - return {}; - } -} ov::PartialShape get_reordered_planar_shape(const ov::PartialShape& shape, const std::vector& layout) { if (layout.empty()) @@ -106,26 +87,21 @@ ov::PartialShape get_reordered_planar_shape(const ov::PartialShape& shape, const return reordered_shape; } -ov::PartialShape get_port_planar_shape(const Output& out) { - std::vector layout = get_node_output_layout(out.get_node_shared_ptr()); - const auto& tensor = out.get_tensor_ptr(); - if (!tensor) - OPENVINO_THROW("get_port_planar_shape can't be called for an uninitialized output tensor"); - auto tensor_shape = tensor->get_partial_shape(); - return get_reordered_planar_shape(tensor_shape, layout); +ov::PartialShape get_port_planar_shape(const Input& in) { + const auto& port = lowered::PortDescriptorUtils::get_port_descriptor_ptr(in); + return utils::get_reordered_planar_shape(ov::Shape{port->get_shape()}, port->get_layout()); } -void set_transpose_output_layout(const ov::Output& port, const std::shared_ptr& node) { - const auto& const_order = as_type_ptr(node->get_input_node_shared_ptr(1)); - OPENVINO_ASSERT(const_order != nullptr, "Transpose order must be Constant to set layout!"); - set_output_layout(port, const_order->cast_vector()); +ov::PartialShape get_port_planar_shape(const Output& out) { + const auto& port = lowered::PortDescriptorUtils::get_port_descriptor_ptr(out); + return utils::get_reordered_planar_shape(ov::Shape{port->get_shape()}, port->get_layout()); } -void set_output_layout(const ov::Output& port, const std::vector& layout) { - auto& rt_info = port.get_node_shared_ptr()->get_rt_info(); - rt_info["Layout"] = layout; +void safe_copy_runtime_info(const std::shared_ptr& from, const std::shared_ptr& to) { + ov::copy_runtime_info(from, to); + lowered::PortDescriptorUtils::clean(to); } } // namespace utils } // namespace snippets -} // namespace ngraph +} // namespace ov diff --git a/src/common/snippets/tests/CMakeLists.txt b/src/common/snippets/tests/CMakeLists.txt index 9feeb9399322fc..4bf75dbf74d7b6 100644 --- a/src/common/snippets/tests/CMakeLists.txt +++ b/src/common/snippets/tests/CMakeLists.txt @@ -19,6 +19,9 @@ addIeTargetTest( IE OV SNIPPETS ) +# LTO +set_target_properties(${TARGET_NAME} PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE ${ENABLE_LTO}) + ie_faster_build(${TARGET_NAME} UNITY PCH PRIVATE "src/precomp.hpp" diff --git a/src/common/snippets/tests/include/lowering_utils.hpp b/src/common/snippets/tests/include/lowering_utils.hpp index 7dfa71a4b6a7f7..64a0e762fa8713 100644 --- a/src/common/snippets/tests/include/lowering_utils.hpp +++ b/src/common/snippets/tests/include/lowering_utils.hpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -11,12 +11,12 @@ namespace ov { namespace test { namespace snippets { -using BlockedShapeVector = ngraph::snippets::op::Subgraph::BlockedShapeVector; +using BlockedShapeVector = ov::snippets::op::Subgraph::BlockedShapeVector; -class DummyEmitter : public ngraph::snippets::Emitter { +class DummyEmitter : public ov::snippets::Emitter { public: // Here I pass Add to Emitter, but could be any other op, since it's ignored anyway. - DummyEmitter(const std::vector& custom_opset = {}) : ngraph::snippets::Emitter(std::make_shared()) {} + DummyEmitter(const std::vector& custom_opset = {}) : ov::snippets::Emitter(std::make_shared()) {} void emit_code(const std::vector&, const std::vector&, const std::vector&, @@ -24,18 +24,18 @@ class DummyEmitter : public ngraph::snippets::Emitter { void emit_data() const override {} }; -class DummyTargetMachine : public ngraph::snippets::TargetMachine { +class DummyTargetMachine : public ov::snippets::TargetMachine { public: DummyTargetMachine(const std::vector& custom_opset = {}); bool is_supported() const override { return true; } - ngraph::snippets::code get_snippet() const override { return nullptr; } + ov::snippets::code get_snippet() const override { return nullptr; } size_t get_lanes() const override { return 10; } }; -class DummyGenerator : public ngraph::snippets::Generator { +class DummyGenerator : public ov::snippets::Generator { public: - DummyGenerator() : ngraph::snippets::Generator(std::make_shared()) {} - DummyGenerator(const std::shared_ptr& t) : ngraph::snippets::Generator(t) {} + DummyGenerator() : ov::snippets::Generator(std::make_shared()) {} + DummyGenerator(const std::shared_ptr& t) : ov::snippets::Generator(t) {} protected: opRegType get_specific_op_reg_type(const std::shared_ptr& op) const override { return vec2vec; }; @@ -49,14 +49,15 @@ class LoweringTests : public TransformationTestsF { void TearDown() override; protected: - static std::shared_ptr getSubgraph(const std::shared_ptr& f); - static std::shared_ptr getLoweredSubgraph(const std::shared_ptr& f, + static std::shared_ptr getSubgraph(const std::shared_ptr& f); + static std::shared_ptr getLoweredSubgraph(const std::shared_ptr& f, const ov::PartialShape& master_shape, ov::pass::Manager pre_dialect = {}, ov::pass::Manager post_dialect = {}, ov::pass::Manager post_precision = {}, - const std::shared_ptr generator = nullptr); - static std::shared_ptr getTokenizedSubgraph(const std::shared_ptr& f); + ov::snippets::lowered::pass::PassPipeline lowered_pipeline = {}, + const std::shared_ptr generator = nullptr); + static std::shared_ptr getTokenizedSubgraph(const std::shared_ptr& f); ov::PartialShape master_shape{}; }; diff --git a/src/common/snippets/tests/include/pass/broadcast_to_movebroadcast.hpp b/src/common/snippets/tests/include/pass/broadcast_to_movebroadcast.hpp deleted file mode 100644 index 15a1f5a98463dc..00000000000000 --- a/src/common/snippets/tests/include/pass/broadcast_to_movebroadcast.hpp +++ /dev/null @@ -1,29 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include "lowering_utils.hpp" -#include "snippets_helpers.hpp" - -namespace ov { -namespace test { -namespace snippets { -typedef std::tuple< - Shape, // Input shape 0 - Shape, // Input shape 1 - Shape // Broadcast shape -> BroadcastParams; - -class BroadcastToMoveBroadcastTests : public LoweringTests, public testing::WithParamInterface { -public: - static std::string getTestCaseName(testing::TestParamInfo obj); -protected: - void SetUp() override; - std::shared_ptr snippets_function; -}; - -} // namespace snippets -} // namespace test -} // namespace ov diff --git a/src/common/snippets/tests/include/pass/canonicalization.hpp b/src/common/snippets/tests/include/pass/canonicalization.hpp index 7a57a146c3a8b1..b0d2785810d066 100644 --- a/src/common/snippets/tests/include/pass/canonicalization.hpp +++ b/src/common/snippets/tests/include/pass/canonicalization.hpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -11,8 +11,8 @@ namespace ov { namespace test { namespace snippets { -using BlockedShape = ngraph::snippets::op::Subgraph::BlockedShape; -using BlockedShapeVector = ngraph::snippets::op::Subgraph::BlockedShapeVector; +using BlockedShape = ov::snippets::op::Subgraph::BlockedShape; +using BlockedShapeVector = ov::snippets::op::Subgraph::BlockedShapeVector; // todo: implement tests with 3 inputs and two outputs (aka SnippetsCanonicalizationParams3Inputs) // Note that the expected output shape isn't necessary equal to one of the output blocked_shapes. diff --git a/src/common/snippets/tests/include/pass/collapse_subgraph.hpp b/src/common/snippets/tests/include/pass/collapse_subgraph.hpp index 6a7fd359870a78..37f4924889a8e7 100644 --- a/src/common/snippets/tests/include/pass/collapse_subgraph.hpp +++ b/src/common/snippets/tests/include/pass/collapse_subgraph.hpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // diff --git a/src/common/snippets/tests/include/pass/fuse_transpose_brgemm.hpp b/src/common/snippets/tests/include/pass/fuse_transpose_brgemm.hpp index 8b886ef9876b06..2e161d7dc4ab6a 100644 --- a/src/common/snippets/tests/include/pass/fuse_transpose_brgemm.hpp +++ b/src/common/snippets/tests/include/pass/fuse_transpose_brgemm.hpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // diff --git a/src/common/snippets/tests/include/pass/insert_load_store.hpp b/src/common/snippets/tests/include/pass/insert_load_store.hpp deleted file mode 100644 index 2bc13f3290b30c..00000000000000 --- a/src/common/snippets/tests/include/pass/insert_load_store.hpp +++ /dev/null @@ -1,40 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include "lowering_utils.hpp" -#include "snippets_helpers.hpp" - -/* The main purpose is to test that: - * - Load/Store ops are inserted - * - Load + BroadcastMove fuses to BroadcastLoad (not the main focus, but still had to cover; overlays with insert_movebroadcast.cpp) - * - Proper Load/Stores are converted to scalar form to avoid invalid memory access by vector tile - * (temporary disabled, since corresponding PR is not merged yet) - */ - -namespace ov { -namespace test { -namespace snippets { - -typedef std::tuple< - Shape, // Input shape 0 - Shape, // Input shape 1 - Shape, // Input shape 2 - Shape, // Broadcast shape 0 - Shape, // Broadcast shape 1 - Shape // Broadcast shape 2 -> insertLoadStoreParams; - -class InsertLoadStoreTests : public LoweringTests, public testing::WithParamInterface { -public: - static std::string getTestCaseName(testing::TestParamInfo obj); -protected: - void SetUp() override; - std::shared_ptr snippets_function; -}; - -} // namespace snippets -} // namespace test -} // namespace ov diff --git a/src/common/snippets/tests/include/pass/insert_movebroadcast.hpp b/src/common/snippets/tests/include/pass/insert_movebroadcast.hpp deleted file mode 100644 index 98c890b9dc2f8b..00000000000000 --- a/src/common/snippets/tests/include/pass/insert_movebroadcast.hpp +++ /dev/null @@ -1,36 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include "lowering_utils.hpp" -#include "snippets_helpers.hpp" - -/* The main purpose is to test whether BroadcastMove ops are inserted. - * Conversion of Load + BroadcastMove to LoadBroadcastLoad is covered in insert_load_store.cpp - */ - -namespace ov { -namespace test { -namespace snippets { - -typedef std::tuple< - Shape, // Input shape 0 - Shape, // Input shape 1 - Shape, // Broadcast shape 0 - Shape // Broadcast shape 1 -> insertMoveBroadcastParams; - -using ngraph::snippets::op::Subgraph; -class InsertMoveBroadcastTests : public LoweringTests, public testing::WithParamInterface { -public: - static std::string getTestCaseName(testing::TestParamInfo obj); -protected: - void SetUp() override; - std::shared_ptr snippets_function; -}; - -} // namespace snippets -} // namespace test -} // namespace ov diff --git a/src/common/snippets/tests/include/pass/mha_tokenization.hpp b/src/common/snippets/tests/include/pass/mha_tokenization.hpp index 60e06d591ca13d..6b092209e9817a 100644 --- a/src/common/snippets/tests/include/pass/mha_tokenization.hpp +++ b/src/common/snippets/tests/include/pass/mha_tokenization.hpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // diff --git a/src/common/snippets/tests/include/pass/set_scalar_count_for_load_and_store.hpp b/src/common/snippets/tests/include/pass/set_scalar_count_for_load_and_store.hpp deleted file mode 100644 index 2bc13f3290b30c..00000000000000 --- a/src/common/snippets/tests/include/pass/set_scalar_count_for_load_and_store.hpp +++ /dev/null @@ -1,40 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include "lowering_utils.hpp" -#include "snippets_helpers.hpp" - -/* The main purpose is to test that: - * - Load/Store ops are inserted - * - Load + BroadcastMove fuses to BroadcastLoad (not the main focus, but still had to cover; overlays with insert_movebroadcast.cpp) - * - Proper Load/Stores are converted to scalar form to avoid invalid memory access by vector tile - * (temporary disabled, since corresponding PR is not merged yet) - */ - -namespace ov { -namespace test { -namespace snippets { - -typedef std::tuple< - Shape, // Input shape 0 - Shape, // Input shape 1 - Shape, // Input shape 2 - Shape, // Broadcast shape 0 - Shape, // Broadcast shape 1 - Shape // Broadcast shape 2 -> insertLoadStoreParams; - -class InsertLoadStoreTests : public LoweringTests, public testing::WithParamInterface { -public: - static std::string getTestCaseName(testing::TestParamInfo obj); -protected: - void SetUp() override; - std::shared_ptr snippets_function; -}; - -} // namespace snippets -} // namespace test -} // namespace ov diff --git a/src/common/snippets/tests/include/pass/softmax_decomposition.hpp b/src/common/snippets/tests/include/pass/softmax_decomposition.hpp deleted file mode 100644 index 3943bd641bf8bb..00000000000000 --- a/src/common/snippets/tests/include/pass/softmax_decomposition.hpp +++ /dev/null @@ -1,43 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include "lowering_utils.hpp" -#include "snippets_helpers.hpp" - -namespace ov { -namespace test { -namespace snippets { - -typedef std::tuple< - Shape, // Input shape 0 - int // Axis -> SoftmaxParams; - -typedef std::tuple< - Shape, // Input shape 0 - Shape, // Input shape 1 - int // Axis -> AddSoftmaxParams; - -class SoftmaxTests : public LoweringTests, public testing::WithParamInterface { -public: - static std::string getTestCaseName(testing::TestParamInfo obj); -protected: - void SetUp() override; - std::shared_ptr snippets_function; -}; - -class AddSoftmaxTests : public LoweringTests, public testing::WithParamInterface { -public: - static std::string getTestCaseName(testing::TestParamInfo obj); -protected: - void SetUp() override; - std::shared_ptr snippets_function; -}; - -} // namespace snippets -} // namespace test -} // namespace ov diff --git a/src/common/snippets/tests/src/broadcast_fusion.cpp b/src/common/snippets/tests/src/broadcast_fusion.cpp deleted file mode 100644 index d448a7fa86e670..00000000000000 --- a/src/common/snippets/tests/src/broadcast_fusion.cpp +++ /dev/null @@ -1,103 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include -#include - -#include -#include - -#include - -#include "common_test_utils/ngraph_test_utils.hpp" - -using namespace testing; -using namespace ngraph; - -// todo: Rewrite this test using Snippets test infrastructure. See ./include/canonicalization.hpp for example - -TEST(TransformationTests, FuseLoadWithBroadcastMoveByX) { - std::shared_ptr f(nullptr), f_ref(nullptr); - { - auto data0 = std::make_shared(element::f32, Shape{2, 1}); - auto data1 = std::make_shared(element::f32, Shape{2, 2}); - auto load0 = std::make_shared(data0); - auto load1 = std::make_shared(data1); - auto bct = std::make_shared(load0, load1->get_shape()); - auto add = std::make_shared(bct, load1); - auto store = std::make_shared(add); - f = std::make_shared(NodeVector{store}, ParameterVector{data0, data1}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - - { - auto data0 = std::make_shared(element::f32, Shape{2, 1}); - auto data1 = std::make_shared(element::f32, Shape{2, 2}); - auto load0 = std::make_shared(data0, data1->get_shape()); - auto load1 = std::make_shared(data1); - auto add = std::make_shared(load0, load1); - auto store = std::make_shared(add); - f_ref = std::make_shared(NodeVector{store}, ParameterVector{data0, data1}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; -} - -TEST(TransformationTests, NoFuseLoadWithBroadcastMoveMultipleUsers) { - std::shared_ptr f(nullptr), f_ref(nullptr); - { - auto data0 = std::make_shared(element::f32, Shape{2, 2}); - auto data1 = std::make_shared(element::f32, Shape{2, 1}); - auto data2 = std::make_shared(element::f32, Shape{2, 1}); - - auto load0 = std::make_shared(data0); - auto load1 = std::make_shared(data1); - auto load2 = std::make_shared(data2); - - auto bct1 = std::make_shared(load1, load0->get_shape()); - - auto add = std::make_shared(load0, bct1); - auto mul = std::make_shared(load1, load2); - - auto store0 = std::make_shared(add); - auto store1 = std::make_shared(mul); - f = std::make_shared(NodeVector{store0, store1}, ParameterVector{data0, data1, data2}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - - { - auto data0 = std::make_shared(element::f32, Shape{2, 2}); - auto data1 = std::make_shared(element::f32, Shape{2, 1}); - auto data2 = std::make_shared(element::f32, Shape{2, 1}); - - auto load0 = std::make_shared(data0); - auto load1 = std::make_shared(data1); - auto load2 = std::make_shared(data2); - - auto bct1 = std::make_shared(load1, load0->get_shape()); - - auto add = std::make_shared(load0, bct1); - auto mul = std::make_shared(load1, load2); - - auto store0 = std::make_shared(add); - auto store1 = std::make_shared(mul); - f_ref = std::make_shared(NodeVector{store0, store1}, ParameterVector{data0, data1, data2}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; -} diff --git a/src/common/snippets/tests/src/lowering_utils.cpp b/src/common/snippets/tests/src/lowering_utils.cpp index 55480e95dae510..ba3a4f91d43e33 100644 --- a/src/common/snippets/tests/src/lowering_utils.cpp +++ b/src/common/snippets/tests/src/lowering_utils.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -12,9 +12,9 @@ namespace test { namespace snippets { DummyTargetMachine::DummyTargetMachine(const std::vector&custom_opset) { - auto dummy_functor = ngraph::snippets::jitters_value { - [](const std::shared_ptr& n) { return std::make_shared(); }, - [](const std::shared_ptr& n) { return std::set>{};} + auto dummy_functor = ov::snippets::jitters_value { + [](const std::shared_ptr& n) { return std::make_shared(); }, + [](const std::shared_ptr& n) { return std::set>{};} }; jitters[op::v0::Parameter::get_type_info_static()] = dummy_functor; @@ -26,23 +26,23 @@ DummyTargetMachine::DummyTargetMachine(const std::vector& jitters[op::v1::Divide::get_type_info_static()] = dummy_functor; jitters[op::v1::Maximum::get_type_info_static()] = dummy_functor; jitters[op::v0::Exp::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::PowerStatic::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::HorizonMax::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::HorizonSum::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::Load::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::BroadcastLoad::get_type_info_static()] = dummy_functor; - - jitters[ngraph::snippets::op::Store::get_type_info_static()] = dummy_functor; - - jitters[ngraph::snippets::op::Scalar::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::BroadcastMove::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::Kernel::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::LoopBegin::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::LoopEnd::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::Brgemm::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::Buffer::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::VectorBuffer::get_type_info_static()] = dummy_functor; - jitters[ngraph::snippets::op::Fill::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::PowerStatic::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::HorizonMax::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::HorizonSum::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::Load::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::BroadcastLoad::get_type_info_static()] = dummy_functor; + + jitters[ov::snippets::op::Store::get_type_info_static()] = dummy_functor; + + jitters[ov::snippets::op::Scalar::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::BroadcastMove::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::Kernel::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::LoopBegin::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::LoopEnd::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::Brgemm::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::Buffer::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::VectorBuffer::get_type_info_static()] = dummy_functor; + jitters[ov::snippets::op::Fill::get_type_info_static()] = dummy_functor; for (const auto& elem : custom_opset) { jitters[elem] = dummy_functor; @@ -61,7 +61,8 @@ void LoweringTests::SetUp() { } void LoweringTests::TearDown() { - auto cloned_function = ngraph::clone_function(*function); + ASSERT_TRUE(function); + auto cloned_function = function->clone(); if (!function_ref) { function_ref = cloned_function; } @@ -79,14 +80,14 @@ void LoweringTests::TearDown() { ASSERT_TRUE(res.valid) << res.message; } -std::shared_ptr LoweringTests::getSubgraph(const std::shared_ptr& f) { - std::shared_ptr subgraph; - for (const auto &op : f->get_ops()) { - bool is_subgraph = is_type(op); +std::shared_ptr LoweringTests::getSubgraph(const std::shared_ptr& f) { + std::shared_ptr subgraph; + for (const auto& op : f->get_ops()) { + bool is_subgraph = is_type(op); if (is_subgraph) { NGRAPH_CHECK(subgraph.use_count() == 0, "Functions provided for lowering tests contains more than one subgraph."); - subgraph = as_type_ptr(op); + subgraph = as_type_ptr(op); } NGRAPH_CHECK(is_subgraph || is_type(op) || @@ -97,12 +98,13 @@ std::shared_ptr LoweringTests::getSubgraph(const return subgraph; } -std::shared_ptr LoweringTests::getLoweredSubgraph(const std::shared_ptr &f, +std::shared_ptr LoweringTests::getLoweredSubgraph(const std::shared_ptr &f, const ov::PartialShape& master_shape, ov::pass::Manager pre_dialect, ov::pass::Manager post_dialect, ov::pass::Manager post_precision, - const std::shared_ptr generator) { + ov::snippets::lowered::pass::PassPipeline lowered_pipeline, + const std::shared_ptr generator) { auto subgraph = getTokenizedSubgraph(f); subgraph->set_generator(generator == nullptr ? std::make_shared() : generator); subgraph->set_master_shape(master_shape); @@ -123,15 +125,15 @@ std::shared_ptr LoweringTests::getLoweredSubgrap } body_rt_info["PluginShapesOverride"] = new_shapes; subgraph->set_tile_rank(2); - subgraph->generate(pre_dialect, post_precision, post_precision); + subgraph->generate(pre_dialect, post_precision, post_precision, lowered_pipeline); return subgraph; } -std::shared_ptr LoweringTests::getTokenizedSubgraph(const std::shared_ptr &f) { +std::shared_ptr LoweringTests::getTokenizedSubgraph(const std::shared_ptr &f) { // Perform tokenization - ngraph::pass::Manager m; - m.register_pass(); - m.register_pass(); + ov::pass::Manager m; + m.register_pass(); + m.register_pass(); m.run_passes(f); // Perform lowering return getSubgraph(f); diff --git a/src/common/snippets/tests/src/memory_ops.cpp b/src/common/snippets/tests/src/memory_ops.cpp deleted file mode 100644 index 5a458702fc4619..00000000000000 --- a/src/common/snippets/tests/src/memory_ops.cpp +++ /dev/null @@ -1,96 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include -#include - -#include -#include - -#include - -#include "common_test_utils/ngraph_test_utils.hpp" - -using namespace testing; -using namespace ngraph; - -// todo: Rewrite this test using Snippets test infrastructure. See ./include/canonicalization.hpp for example - -TEST(TransformationTests, InsertLoadStore) { - std::shared_ptr f(nullptr), f_ref(nullptr); - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto neg = std::make_shared(data); - f = std::make_shared(NodeVector{neg}, ParameterVector{data}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.register_pass(); - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto load = std::make_shared(data); - auto neg = std::make_shared(load); - auto store = std::make_shared(neg); - f_ref = std::make_shared(NodeVector{store}, ParameterVector{data}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; -} - -TEST(TransformationTests, InsertLoadTwise) { - std::shared_ptr f(nullptr), f_ref(nullptr); - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto neg = std::make_shared(data); - f = std::make_shared(NodeVector{neg}, ParameterVector{data}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.register_pass(); - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto load = std::make_shared(data); - auto neg = std::make_shared(load); - f_ref = std::make_shared(NodeVector{neg}, ParameterVector{data}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; -} - -TEST(TransformationTests, InsertStoreTwise) { - std::shared_ptr f(nullptr), f_ref(nullptr); - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto neg = std::make_shared(data); - f = std::make_shared(NodeVector{neg}, ParameterVector{data}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.register_pass(); - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto neg = std::make_shared(data); - auto store = std::make_shared(neg); - f_ref = std::make_shared(NodeVector{store}, ParameterVector{data}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; -} diff --git a/src/common/snippets/tests/src/movebroadcast.cpp b/src/common/snippets/tests/src/movebroadcast.cpp deleted file mode 100644 index 91cd39ac1e220f..00000000000000 --- a/src/common/snippets/tests/src/movebroadcast.cpp +++ /dev/null @@ -1,38 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include -#include - -#include -#include - -#include - -#include "common_test_utils/ngraph_test_utils.hpp" - -using namespace testing; -using namespace ngraph; - -// todo: Rewrite this test using Snippets test infrastructure. See ./include/canonicalization.hpp for example - -TEST_F(TransformationTestsF, InsertBroadcastMove) { - { - auto data0 = std::make_shared(element::f32, Shape{2, 3}); - auto data1 = std::make_shared(element::f32, Shape{1, 2, 1}); - auto add = std::make_shared(data0, data1); - function = std::make_shared(NodeVector{add}, ParameterVector{data0, data1}); - - manager.register_pass(); - } - { - auto data0 = std::make_shared(element::f32, Shape{2, 3}); - auto data1 = std::make_shared(element::f32, Shape{1, 2, 1}); - auto move1 = std::make_shared(data1, Shape{1, 2, 3}); - auto add = std::make_shared(data0, move1); - function_ref = std::make_shared(NodeVector{add}, ParameterVector{data0, data1}); - } -} diff --git a/src/common/snippets/tests/src/pass/broadcast_to_movebroadcast.cpp b/src/common/snippets/tests/src/pass/broadcast_to_movebroadcast.cpp deleted file mode 100644 index 36e89b8eac63a1..00000000000000 --- a/src/common/snippets/tests/src/pass/broadcast_to_movebroadcast.cpp +++ /dev/null @@ -1,59 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include "pass/broadcast_to_movebroadcast.hpp" -#include "common_test_utils/common_utils.hpp" -#include - -namespace ov { -namespace test { -namespace snippets { - - -std::string BroadcastToMoveBroadcastTests::getTestCaseName(testing::TestParamInfo obj) { - std::vector inputShapes(2); - Shape broadcast_shape; - std::tie(inputShapes[0], inputShapes[1], broadcast_shape) = obj.param; - std::ostringstream result; - for (size_t i = 0; i < inputShapes.size(); i++) - result << "IS[" << i << "]=" << CommonTestUtils::vec2str(inputShapes[i]) << "_"; - result << "BS=" << CommonTestUtils::vec2str(broadcast_shape) << "_"; - return result.str(); -} - -void BroadcastToMoveBroadcastTests::SetUp() { - TransformationTestsF::SetUp(); - std::vector inputShapes(2); - PartialShape broadcast_shape; - std::tie(inputShapes[0], inputShapes[1], broadcast_shape) = this->GetParam(); - snippets_function = std::make_shared(inputShapes, broadcast_shape); - master_shape = {}; - for (size_t i = 0; i < inputShapes[0].size(); i++) - master_shape.push_back(static_cast(std::max(inputShapes[0].get_shape()[i], inputShapes[1].get_shape()[i]))); -} - -TEST_P(BroadcastToMoveBroadcastTests, BroadcastSelect) { - PartialShape scheduler_shape({master_shape[master_shape.size() - 2], - master_shape[master_shape.size() - 1]}); - auto subgraph = getLoweredSubgraph(snippets_function->getOriginal(), scheduler_shape); - function = subgraph->body_ptr(); - function_ref = snippets_function->getLowered(); -} - -namespace BroadcastToMoveBroadcastTestsInstantiation { -using ov::Shape; -std::vector inputShapes0 {{1, 8, 2, 10}, {1, 8, 2, 1}, {1, 1, 1, 1}}; -std::vector inputShapes1 {{1, 8, 2, 10}, {1, 8, 2, 1}, {1, 1, 1, 1}}; -Shape broadcastShape {1, 8, 2, 10}; -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_Broadcast, BroadcastToMoveBroadcastTests, - ::testing::Combine( - ::testing::ValuesIn(inputShapes0), - ::testing::ValuesIn(inputShapes1), - ::testing::Values(broadcastShape)), - BroadcastToMoveBroadcastTests::getTestCaseName); -} // namespace BroadcastToMoveBroadcastTestsInstantiation -} // namespace snippets -} // namespace test -} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/tests/src/pass/canonicalization.cpp b/src/common/snippets/tests/src/pass/canonicalization.cpp index 7b687bad226443..97e1ddac468dd7 100644 --- a/src/common/snippets/tests/src/pass/canonicalization.cpp +++ b/src/common/snippets/tests/src/pass/canonicalization.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -10,7 +10,7 @@ namespace ov { namespace test { namespace snippets { -using ngraph::snippets::op::Subgraph; +using ov::snippets::op::Subgraph; std::string CanonicalizationTests::getTestCaseName(testing::TestParamInfo obj) { std::vector> inputs(2); @@ -19,7 +19,7 @@ std::string CanonicalizationTests::getTestCaseName(testing::TestParamInfo(inputs[i]); + const auto& blockedshape = std::get<1>(inputs[i]); // input shape result << "IS[" << i << "]=" << CommonTestUtils::vec2str(std::get<0>(inputs[i])) << "_"; // input blocked shape @@ -56,7 +56,7 @@ TEST_P(CanonicalizationTests, Add) { } namespace CanonicalizationTestsInstantiation { -using ngraph::snippets::op::Subgraph; +using ov::snippets::op::Subgraph; std::vector input_shapes; Shape expected_output_shape; Subgraph::BlockedShapeVector input_blocked_shapes; diff --git a/src/common/snippets/tests/src/pass/collapse_subgraph.cpp b/src/common/snippets/tests/src/pass/collapse_subgraph.cpp index 8f5fc36f1f051a..48ce19052827c8 100644 --- a/src/common/snippets/tests/src/pass/collapse_subgraph.cpp +++ b/src/common/snippets/tests/src/pass/collapse_subgraph.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -16,66 +16,66 @@ namespace snippets { void CollapseSubgraphTests::run() { ASSERT_TRUE(function); std::string name; - manager.register_pass(); - manager.register_pass(); + manager.register_pass(); + manager.register_pass(); // todo: This is a temporary work-around. remove when MatMul tokenization is supported through general pipeline - manager.get_pass_config()->set_callback( + manager.get_pass_config()->set_callback( [](const std::shared_ptr& n) -> bool { return ov::is_type(n); }); } TEST_F(CollapseSubgraphTests, smoke_Snippets_Eltwise) { - const auto &f = EltwiseFunction(std::vector {{2, 3}, {1, 3}}); + const auto& f = EltwiseFunction(std::vector {{2, 3}, {1, 3}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(CollapseSubgraphTests, smoke_Snippets_MatMulWithEltwise) { - const auto &f = MatMulEltwiseBranchesFunction(std::vector {{1, 3, 4, 4}, {1, 3, 4, 4}}); + const auto& f = MatMulEltwiseBranchesFunction(std::vector {{1, 3, 4, 4}, {1, 3, 4, 4}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(CollapseSubgraphTests, smoke_Snippets_AvoidLoopEltwise) { - const auto &f = EltwiseLogLoopFunction(std::vector {{2, 5}, {2, 1}}); + const auto& f = EltwiseLogLoopFunction(std::vector {{2, 5}, {2, 1}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(CollapseSubgraphTests, smoke_Snippets_OneConvert) { - const auto &f = ConvertFunction(std::vector{{2, 5}}); + const auto& f = ConvertFunction(std::vector{{2, 5}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(CollapseSubgraphTests, smoke_Snippets_ConvertInput) { - const auto &f = ConvertInputFunction(std::vector{{2, 5}, {1, 5}}); + const auto& f = ConvertInputFunction(std::vector{{2, 5}, {1, 5}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(CollapseSubgraphTests, smoke_Snippets_ConvertOutput) { - const auto &f = ConvertOutputFunction(std::vector{{2, 5}, {1, 5}}); + const auto& f = ConvertOutputFunction(std::vector{{2, 5}, {1, 5}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(CollapseSubgraphTests, smoke_Snippets_ConvertStub) { - const auto &f = ConvertStubFunction(std::vector{{2, 5, 2}, {1, 5, 1}}); + const auto& f = ConvertStubFunction(std::vector{{2, 5, 2}, {1, 5, 1}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(CollapseSubgraphTests, smoke_Snippets_ConvertPartialInputsAndResults) { - const auto &f = ConvertPartialInputsAndResultsFunction(std::vector{{2, 5, 1}, {1, 5, 1}, {2, 1, 10}}, + const auto& f = ConvertPartialInputsAndResultsFunction(std::vector{{2, 5, 1}, {1, 5, 1}, {2, 1, 10}}, std::vector{ov::element::i8, ov::element::bf16, ov::element::f32}, std::vector{ov::element::f32, ov::element::i8}); function = f.getOriginal(); @@ -84,7 +84,7 @@ TEST_F(CollapseSubgraphTests, smoke_Snippets_ConvertPartialInputsAndResults) { } TEST_F(CollapseSubgraphTests, smoke_Snippets_EltwiseTwoResultsFunction) { - const auto &f = EltwiseTwoResultsFunction(std::vector{{2, 5}, {2, 1}}); + const auto& f = EltwiseTwoResultsFunction(std::vector{{2, 5}, {2, 1}}); function = f.getOriginal(); function_ref = f.getReference(); comparator.enable(FunctionsComparator::CmpValues::NAMES); @@ -92,7 +92,7 @@ TEST_F(CollapseSubgraphTests, smoke_Snippets_EltwiseTwoResultsFunction) { } TEST_F(CollapseSubgraphTests, smoke_Snippets_ThreeFQFunction) { - const auto &f = ThreeFQFunction(std::vector{}); + const auto& f = ThreeFQFunction(std::vector{}); function = f.getOriginal(); function_ref = f.getReference(); run(); diff --git a/src/common/snippets/tests/src/pass/fake_quantize_decomposition_test.cpp b/src/common/snippets/tests/src/pass/fake_quantize_decomposition_test.cpp index 885e5c2304a7b6..529e00c811e243 100644 --- a/src/common/snippets/tests/src/pass/fake_quantize_decomposition_test.cpp +++ b/src/common/snippets/tests/src/pass/fake_quantize_decomposition_test.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -17,17 +17,17 @@ namespace snippets { class FakeQuantizeDecompositionTest : public TransformationTestsF { public: void register_passes() { - manager.register_pass(); + manager.register_pass(); } void TearDown() override { TransformationTestsF::TearDown(); auto subgraph = FunctionHelper::getSubgraph(function); - auto body = subgraph == nullptr ? nullptr : std::dynamic_pointer_cast(subgraph)->body_ptr(); + auto body = subgraph == nullptr ? nullptr : std::dynamic_pointer_cast(subgraph)->body_ptr(); auto subgraph_ref = FunctionHelper::getSubgraph(function_ref); - auto body_ref = subgraph_ref == nullptr ? nullptr : std::dynamic_pointer_cast(subgraph_ref)->body_ptr(); + auto body_ref = subgraph_ref == nullptr ? nullptr : std::dynamic_pointer_cast(subgraph_ref)->body_ptr(); auto res = comparator.compare(body, body_ref); ASSERT_TRUE(res.valid) << res.message; diff --git a/src/common/snippets/tests/src/pass/fuse_transpose_brgemm.cpp b/src/common/snippets/tests/src/pass/fuse_transpose_brgemm.cpp index 22936ca0c6208d..c5bba0725bc3fb 100644 --- a/src/common/snippets/tests/src/pass/fuse_transpose_brgemm.cpp +++ b/src/common/snippets/tests/src/pass/fuse_transpose_brgemm.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // diff --git a/src/common/snippets/tests/src/pass/insert_load_store.cpp b/src/common/snippets/tests/src/pass/insert_load_store.cpp deleted file mode 100644 index 929697852cbe5f..00000000000000 --- a/src/common/snippets/tests/src/pass/insert_load_store.cpp +++ /dev/null @@ -1,64 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include "pass/insert_load_store.hpp" -#include "common_test_utils/common_utils.hpp" -#include - -namespace ov { -namespace test { -namespace snippets { - -std::string InsertLoadStoreTests::getTestCaseName(testing::TestParamInfo obj) { - std::vector inputShapes(3); - std::vector broadcastShapes(3); - std::tie(inputShapes[0], inputShapes[1], inputShapes[2], - broadcastShapes[0], broadcastShapes[1], broadcastShapes[2]) = obj.param; - std::ostringstream result; - for (size_t i = 0; i < inputShapes.size(); i++) - result << "IS[" << i << "]=" << CommonTestUtils::vec2str(inputShapes[i]) << "_"; - for (size_t i = 0; i < broadcastShapes.size(); i++) - result << "BS[" << i << "]=" << CommonTestUtils::vec2str(broadcastShapes[i]) << "_"; - return result.str(); -} - -void InsertLoadStoreTests::SetUp() { - LoweringTests::SetUp(); - std::vector inputShapes(3); - std::vector broadcastShapes(3); - std::tie(inputShapes[0], inputShapes[1], inputShapes[2], - broadcastShapes[0], broadcastShapes[1], broadcastShapes[2]) = this->GetParam(); - snippets_function = std::make_shared( - std::vector {inputShapes[0], inputShapes[1], inputShapes[2]}, broadcastShapes); - master_shape = inputShapes[0]; -} - -TEST_P(InsertLoadStoreTests, ThreeInputsEltwise) { - auto subgraph = getLoweredSubgraph(snippets_function->getOriginal(), master_shape); - function = subgraph->body_ptr(); - function_ref = snippets_function->getLowered(); -} - -namespace InsertLoadStoreTestsInstantiation { -using ov::Shape; -std::vector inputShapes{{1, 4, 1, 5, 1}, {1, 4, 2, 5, 1}}; -std::vector broadcastShapes{{1, 4, 1, 5, 16}, {1, 4, 2, 5, 16}}; -Shape exec_domain{1, 4, 2, 5, 16}; -Shape emptyShape{}; - -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_BroadcastLoad, InsertLoadStoreTests, - ::testing::Combine( - ::testing::Values(exec_domain), - ::testing::Values(inputShapes[0]), - ::testing::Values(inputShapes[1]), - ::testing::Values(emptyShape), - ::testing::Values(broadcastShapes[0]), - ::testing::Values(broadcastShapes[1])), - InsertLoadStoreTests::getTestCaseName); - -} // namespace InsertLoadStoreTestsInstantiation -} // namespace snippets -} // namespace test -} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/tests/src/pass/insert_movebroadcast.cpp b/src/common/snippets/tests/src/pass/insert_movebroadcast.cpp deleted file mode 100644 index 45ec71588f2951..00000000000000 --- a/src/common/snippets/tests/src/pass/insert_movebroadcast.cpp +++ /dev/null @@ -1,90 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include -#include "pass/insert_movebroadcast.hpp" -#include "common_test_utils/common_utils.hpp" -#include - -namespace ov { -namespace test { -namespace snippets { - -std::string InsertMoveBroadcastTests::getTestCaseName(testing::TestParamInfo obj) { - std::vector inputShapes(2); - std::vector broadcastShapes(2); - std::tie(inputShapes[0], inputShapes[1], broadcastShapes[0], broadcastShapes[1]) = obj.param; - std::ostringstream result; - for (size_t i = 0; i < inputShapes.size(); i++) - result << "IS[" << i << "]=" << CommonTestUtils::vec2str(inputShapes[i]) << "_"; - for (size_t i = 0; i < broadcastShapes.size(); i++) - result << "BS[" << i << "]=" << CommonTestUtils::vec2str(broadcastShapes[i]) << "_"; - return result.str(); -} - -void InsertMoveBroadcastTests::SetUp() { - LoweringTests::SetUp(); - std::vector inputShapes(2); - std::vector broadcastShapes(2); - std::tie(inputShapes[0], inputShapes[1], broadcastShapes[0], broadcastShapes[1]) = this->GetParam(); - snippets_function = std::make_shared(std::vector {inputShapes[0], inputShapes[1]}, broadcastShapes); - if (inputShapes[0].size() != inputShapes[1].size()) - IE_THROW() << "Expected input shapes of the same size"; - master_shape = {}; - for (size_t i = 0; i < inputShapes[0].size(); i++) - master_shape.push_back(static_cast(std::max(inputShapes[0][i], inputShapes[1][i]))); -} - -TEST_P(InsertMoveBroadcastTests, AddBroadcast) { - PartialShape scheduler_shape({master_shape[master_shape.size() - 2], - master_shape[master_shape.size() - 1]}); - auto subgraph = getLoweredSubgraph(snippets_function->getOriginal(), scheduler_shape); - function = subgraph->body_ptr(); - function_ref = snippets_function->getLowered(); -} - -namespace InsertMoveBroadcastTestsInstantiation { -using ov::Shape; -std::vector inputShapes0 {{1, 8, 2, 1}}; -std::vector inputShapes1 {{1, 8, 2, 3}}; -Shape broadcastShape {1, 8, 2, 3}; -Shape emptyShape {}; -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_BroadcastOn0, InsertMoveBroadcastTests, - ::testing::Combine( - ::testing::ValuesIn(inputShapes0), - ::testing::ValuesIn(inputShapes1), - ::testing::Values(broadcastShape), - ::testing::Values(emptyShape)), - InsertMoveBroadcastTests::getTestCaseName); - -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_BroadcastOn1, InsertMoveBroadcastTests, - ::testing::Combine( - ::testing::ValuesIn(inputShapes1), - ::testing::ValuesIn(inputShapes0), - ::testing::Values(emptyShape), - ::testing::Values(broadcastShape)), - InsertMoveBroadcastTests::getTestCaseName); - -std::vector inputShapesBoth0 {{4, 1, 2, 1}, {1, 8, 1, 1}, {1, 1, 2, 3}}; -std::vector inputShapesBoth1 {{4, 8, 2, 3}, {4, 1, 2, 3}, {4, 8, 1, 1}}; -std::vector broadcastShapeBoth{{4, 1, 2, 3}, {1, 8, 1, 3}, {4, 8, 1, 3}}; -std::vector params = {std::make_tuple(inputShapesBoth0[0], inputShapesBoth1[0], broadcastShapeBoth[0], emptyShape), - std::make_tuple(inputShapesBoth0[1], inputShapesBoth1[1], broadcastShapeBoth[1], emptyShape), - std::make_tuple(inputShapesBoth0[2], inputShapesBoth1[2], emptyShape, broadcastShapeBoth[2])}; - -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_BroadcastOnBoth, InsertMoveBroadcastTests, - ::testing::ValuesIn(params), - InsertMoveBroadcastTests::getTestCaseName); - -std::vector paramsNo = {std::make_tuple(inputShapesBoth0[0], inputShapesBoth0[0], emptyShape, emptyShape), - std::make_tuple(inputShapesBoth0[1], inputShapesBoth0[1], emptyShape, emptyShape), - std::make_tuple(inputShapesBoth0[2], inputShapesBoth0[2], emptyShape, emptyShape)}; - -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_NoBroadcast, InsertMoveBroadcastTests, - ::testing::ValuesIn(paramsNo), - InsertMoveBroadcastTests::getTestCaseName); -} // namespace InsertMoveBroadcastTestsInstantiation -} // namespace snippets -} // namespace test -} // namespace ov \ No newline at end of file diff --git a/src/common/snippets/tests/src/pass/merge_loops.cpp b/src/common/snippets/tests/src/pass/merge_loops.cpp deleted file mode 100644 index 048b3e52a76b1b..00000000000000 --- a/src/common/snippets/tests/src/pass/merge_loops.cpp +++ /dev/null @@ -1,169 +0,0 @@ -// Copyright (C) 2018-2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include -#include - -#include -#include - -#include - -#include "common_test_utils/ngraph_test_utils.hpp" - -using namespace testing; -using namespace ngraph; - -TEST(TransformationTests, UnaryEltwisesLoops) { - std::shared_ptr f(nullptr), f_ref(nullptr); - auto shape = Shape{2, 3, 240}; - const size_t vector_size = 16; - const std::vector inner_ptr_increments(2, vector_size); - const std::vector inner_finalization_offsets(2, 0); - { - auto data = std::make_shared(element::f32, shape); - - auto outer_loop_begin_up = std::make_shared(OutputVector{data}); - auto inner_loop_begin_up = std::make_shared(OutputVector{outer_loop_begin_up}); - auto load_up = std::make_shared(inner_loop_begin_up->output(0)); - auto relu = std::make_shared(load_up); - auto store_up = std::make_shared(relu); - auto inner_loop_end_up = std::make_shared( - OutputVector{store_up, inner_loop_begin_up->output(1)}, shape[shape.size() - 1], vector_size, - inner_ptr_increments, inner_finalization_offsets); - auto outer_loop_end_up = std::make_shared( - OutputVector{inner_loop_end_up->output(0), outer_loop_begin_up->output(1)}, shape[shape.size() - 2], 1, - std::vector{0, 0}, std::vector{0, 0}); - - auto buffer = std::make_shared(outer_loop_end_up); - - auto outer_loop_begin_down = std::make_shared(OutputVector{buffer}); - auto inner_loop_begin_down = std::make_shared(OutputVector{outer_loop_begin_down}); - auto load_down = std::make_shared(inner_loop_begin_down->output(0)); - auto hswish = std::make_shared(load_down); - auto store_down = std::make_shared(hswish); - auto inner_loop_end_down = std::make_shared( - OutputVector{store_down, inner_loop_begin_down->output(1)}, shape[shape.size() - 1], vector_size, - inner_ptr_increments, inner_finalization_offsets); - auto outer_loop_end_down = std::make_shared( - OutputVector{inner_loop_end_down->output(0), outer_loop_begin_down->output(1)}, shape[shape.size() - 2], 1, - std::vector{0, 0}, std::vector{0, 0}); - - f = std::make_shared(OutputVector{outer_loop_end_down->output(0)}, ParameterVector{data}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.run_passes(f); - } - { - auto data = std::make_shared(element::f32, shape); - - auto outer_loop_begin = std::make_shared(OutputVector{data}); - auto inner_loop_begin = std::make_shared(OutputVector{outer_loop_begin}); - auto load = std::make_shared(inner_loop_begin->output(0)); - auto relu = std::make_shared(load); - auto hswish = std::make_shared(relu); - auto store = std::make_shared(hswish); - auto inner_loop_end = std::make_shared( - OutputVector{store, inner_loop_begin->output(1)}, shape[shape.size() - 1], vector_size, - inner_ptr_increments, inner_finalization_offsets); - auto outer_loop_end = std::make_shared( - OutputVector{inner_loop_end->output(0), outer_loop_begin->output(1)}, shape[shape.size() - 2], 1, - std::vector{0, 0}, std::vector{0, 0}); - - f_ref = std::make_shared(OutputVector{outer_loop_end->output(0)}, ParameterVector{data}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; -} - -TEST(TransformationTests, BinaryEltwisesLoops) { - std::shared_ptr f(nullptr), f_ref(nullptr); - auto shape = Shape{2, 3, 240}; - const size_t vector_size = 16; - { - const std::vector inner_ptr_increments(3, vector_size); - const std::vector inner_finalization_offsets(3, 0); - - auto data0 = std::make_shared(element::f32, shape); - auto data1 = std::make_shared(element::f32, shape); - - auto outer_loop_begin_up = std::make_shared(OutputVector{data0, data1}); - auto inner_loop_begin_up = std::make_shared(OutputVector{outer_loop_begin_up->output(0), - outer_loop_begin_up->output(1)}); - auto load0_up = std::make_shared(inner_loop_begin_up->output(0)); - auto load1_up = std::make_shared(inner_loop_begin_up->output(1)); - auto add = std::make_shared(load0_up, load1_up); - auto relu = std::make_shared(add); - auto store_up = std::make_shared(relu); - auto inner_loop_end_up = std::make_shared( - OutputVector{store_up, inner_loop_begin_up->output(2)}, shape[shape.size() - 1], vector_size, - inner_ptr_increments, inner_finalization_offsets); - auto outer_loop_end_up = std::make_shared( - OutputVector{inner_loop_end_up->output(0), outer_loop_begin_up->output(2)}, shape[shape.size() - 2], 1, - std::vector{0, 0, 0}, std::vector{0, 0, 0}); - - auto buffer = std::make_shared(outer_loop_end_up); - - auto data2 = std::make_shared(element::f32, shape); - - auto outer_loop_begin_down = std::make_shared(OutputVector{buffer, data2}); - auto inner_loop_begin_down = std::make_shared(OutputVector{outer_loop_begin_down->output(0), - outer_loop_begin_down->output(1)}); - auto load0_down = std::make_shared(inner_loop_begin_down->output(0)); - auto load1_down = std::make_shared(inner_loop_begin_down->output(1)); - auto mul = std::make_shared(load0_down, load1_down); - auto hswish = std::make_shared(mul); - auto store_down = std::make_shared(hswish); - auto inner_loop_end_down = std::make_shared( - OutputVector{store_down, inner_loop_begin_down->output(2)}, shape[shape.size() - 1], vector_size, - inner_ptr_increments, inner_finalization_offsets); - auto outer_loop_end_down = std::make_shared( - OutputVector{inner_loop_end_down->output(0), outer_loop_begin_down->output(2)}, shape[shape.size() - 2], 1, - std::vector{0, 0, 0}, std::vector{0, 0, 0}); - - f = std::make_shared(OutputVector{outer_loop_end_down->output(0)}, ParameterVector{data0, data1, data2}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.run_passes(f); - } - { - const std::vector inner_ptr_increments(4, vector_size); - const std::vector inner_finalization_offsets(4, 0); - - auto data0 = std::make_shared(element::f32, shape); - auto data1 = std::make_shared(element::f32, shape); - auto data2 = std::make_shared(element::f32, shape); - - auto outer_loop_begin = std::make_shared(OutputVector{data0, data1, data2}); - auto inner_loop_begin = std::make_shared(OutputVector{outer_loop_begin->output(0), - outer_loop_begin->output(1), - outer_loop_begin->output(2)}); - auto load0 = std::make_shared(inner_loop_begin->output(0)); - auto load1 = std::make_shared(inner_loop_begin->output(1)); - auto load2 = std::make_shared(inner_loop_begin->output(2)); - auto add = std::make_shared(load0, load1); - auto relu = std::make_shared(add); - auto mul = std::make_shared(relu, load2); - auto hswish = std::make_shared(mul); - auto store = std::make_shared(hswish); - auto inner_loop_end = std::make_shared( - OutputVector{store, inner_loop_begin->output(3)}, shape[shape.size() - 1], vector_size, - inner_ptr_increments, inner_finalization_offsets); - auto outer_loop_end = std::make_shared( - OutputVector{inner_loop_end->output(0), outer_loop_begin->output(3)}, shape[shape.size() - 2], 1, - std::vector{0, 0, 0, 0}, std::vector{0, 0, 0, 0}); - - f_ref = std::make_shared(OutputVector{outer_loop_end->output(0)}, ParameterVector{data0, data1, data2}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; -} diff --git a/src/common/snippets/tests/src/pass/mha_tokenization.cpp b/src/common/snippets/tests/src/pass/mha_tokenization.cpp index 4c3d967be5f310..19e1453c463825 100644 --- a/src/common/snippets/tests/src/pass/mha_tokenization.cpp +++ b/src/common/snippets/tests/src/pass/mha_tokenization.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -15,19 +15,19 @@ namespace snippets { void TokenizeMHASnippetsTests::run() { ASSERT_TRUE(function); std::string name; - manager.register_pass(); - manager.register_pass(); + manager.register_pass(); + manager.register_pass(); } TEST_F(TokenizeMHASnippetsTests, smoke_Snippets_MHA) { - const auto &f = MHAFunction(std::vector{{1, 128, 12, 64}, {1, 128, 12, 64}, {1, 12, 128, 128}, {1, 128, 12, 64}}); + const auto& f = MHAFunction(std::vector{{1, 128, 12, 64}, {1, 128, 12, 64}, {1, 12, 128, 128}, {1, 128, 12, 64}}); function = f.getOriginal(); function_ref = f.getReference(); run(); } TEST_F(TokenizeMHASnippetsTests, smoke_Snippets_MHA_with_MatMul0_Transpose) { - const auto &f = MHAMatMul0TransposeFunction(std::vector{{1, 128, 12, 64}, {1, 128, 12, 64}, {1, 12, 128, 128}, {1, 128, 12, 64}}); + const auto& f = MHAMatMul0TransposeFunction(std::vector{{1, 128, 12, 64}, {1, 128, 12, 64}, {1, 12, 128, 128}, {1, 128, 12, 64}}); function = f.getOriginal(); function_ref = f.getReference(); run(); diff --git a/src/common/snippets/tests/src/pass/movebroadcast.cpp b/src/common/snippets/tests/src/pass/movebroadcast.cpp new file mode 100644 index 00000000000000..ad96c6d77971cb --- /dev/null +++ b/src/common/snippets/tests/src/pass/movebroadcast.cpp @@ -0,0 +1,38 @@ +// Copyright (C) 2018-2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include + +#include "ngraph/function.hpp" +#include "ngraph/pass/manager.hpp" + +#include "snippets/snippets_isa.hpp" +#include "snippets/pass/insert_movebroadcast.hpp" + +#include "transformations/init_node_info.hpp" + +#include "common_test_utils/ngraph_test_utils.hpp" + +using namespace testing; +using namespace ov; + +// todo: Rewrite this test using Snippets test infrastructure. See ./include/canonicalization.hpp for example + +TEST_F(TransformationTestsF, InsertBroadcastMove) { + { + auto data0 = std::make_shared(element::f32, Shape{2, 3}); + auto data1 = std::make_shared(element::f32, Shape{1, 2, 1}); + auto add = std::make_shared(data0, data1); + function = std::make_shared(NodeVector{add}, ParameterVector{data0, data1}); + + manager.register_pass(); + } + { + auto data0 = std::make_shared(element::f32, Shape{2, 3}); + auto data1 = std::make_shared(element::f32, Shape{1, 2, 1}); + auto move1 = std::make_shared(data1, Shape{1, 2, 3}); + auto add = std::make_shared(data0, move1); + function_ref = std::make_shared(NodeVector{add}, ParameterVector{data0, data1}); + } +} diff --git a/src/common/snippets/tests/src/pass/precision_propagation.cpp b/src/common/snippets/tests/src/pass/precision_propagation.cpp index 3c7da4d06aa165..11a98157097e5b 100644 --- a/src/common/snippets/tests/src/pass/precision_propagation.cpp +++ b/src/common/snippets/tests/src/pass/precision_propagation.cpp @@ -23,17 +23,17 @@ class DummyPrecisionPropagationTargetMachine : public DummyTargetMachine { const std::set>& op1_supported_precisions, const std::set>& op2_supported_precisions) : DummyTargetMachine() { - jitters[DummyAdd::get_type_info_static()] = ngraph::snippets::jitters_value { - [](const std::shared_ptr& n) { return std::make_shared(); }, - [op1_supported_precisions](const std::shared_ptr& n) { return op1_supported_precisions; }}; - jitters[op::v1::Maximum::get_type_info_static()] = ngraph::snippets::jitters_value{ - [](const std::shared_ptr& n) { return std::make_shared(); }, - [op2_supported_precisions](const std::shared_ptr&n) { return op2_supported_precisions; }}; + jitters[DummyAdd::get_type_info_static()] = ov::snippets::jitters_value { + [](const std::shared_ptr& n) { return std::make_shared(); }, + [op1_supported_precisions](const std::shared_ptr& n) { return op1_supported_precisions; }}; + jitters[op::v1::Maximum::get_type_info_static()] = ov::snippets::jitters_value{ + [](const std::shared_ptr& n) { return std::make_shared(); }, + [op2_supported_precisions](const std::shared_ptr&n) { return op2_supported_precisions; }}; - auto default_jitter = ngraph::snippets::jitters_value{ - [](const std::shared_ptr& n) { return std::make_shared(); }, - [](const std::shared_ptr& n) { return std::set>{};} }; - jitters[ngraph::snippets::op::ConvertSaturation::get_type_info_static()] = default_jitter; + auto default_jitter = ov::snippets::jitters_value{ + [](const std::shared_ptr& n) { return std::make_shared(); }, + [](const std::shared_ptr& n) { return std::set>{};} }; + jitters[ov::snippets::op::ConvertSaturation::get_type_info_static()] = default_jitter; } }; @@ -97,7 +97,7 @@ TEST_P(PrecisionPropagationTest, CompareFunctions) { test_values.actual.op1_supported_precisions, test_values.actual.op2_supported_precisions); - manager.register_pass(target_machine); + manager.register_pass(target_machine); function_ref = function_stub.getReference(); } diff --git a/src/common/snippets/tests/src/pass/precision_propagation_convert_test.cpp b/src/common/snippets/tests/src/pass/precision_propagation_convert_test.cpp index 2292001d755e6b..88882879d7b469 100644 --- a/src/common/snippets/tests/src/pass/precision_propagation_convert_test.cpp +++ b/src/common/snippets/tests/src/pass/precision_propagation_convert_test.cpp @@ -53,7 +53,7 @@ TEST_F(PrecisionPropagationConvertTest, smoke_Snippets_PrecisionPropagation_can_ }; for (const auto& precisions : precisions_set) { - ASSERT_TRUE(ngraph::snippets::pass::PropagatePrecision::can_be_fused( + ASSERT_TRUE(ov::snippets::pass::PropagatePrecision::can_be_fused( precisions.first, precisions.second)) << precisions.second << " can replace " << precisions.first; @@ -61,7 +61,7 @@ TEST_F(PrecisionPropagationConvertTest, smoke_Snippets_PrecisionPropagation_can_ continue; } - ASSERT_FALSE(ngraph::snippets::pass::PropagatePrecision::can_be_fused( + ASSERT_FALSE(ov::snippets::pass::PropagatePrecision::can_be_fused( precisions.second, precisions.first)) << precisions.second << " can not replace " << precisions.first; } @@ -138,7 +138,7 @@ TEST_F(PrecisionPropagationConvertTest, smoke_Snippets_PrecisionPropagation_can_ }; for (const auto& precisions : precisions_set) { - ASSERT_FALSE(ngraph::snippets::pass::PropagatePrecision::can_be_fused( + ASSERT_FALSE(ov::snippets::pass::PropagatePrecision::can_be_fused( precisions.first, precisions.second)) << precisions.second << " can not replace " << precisions.first; } @@ -182,7 +182,7 @@ TEST_F(PrecisionPropagationConvertTest, smoke_Snippets_PrecisionPropagation_can_ const auto actual_before = std::get<0>(precisions); const auto actual_after = std::get<1>(precisions); const auto required_after = std::get<2>(precisions); - ASSERT_TRUE(ngraph::snippets::pass::PropagatePrecision::can_be_removed( + ASSERT_TRUE(ov::snippets::pass::PropagatePrecision::can_be_removed( actual_before, actual_after, required_after)) << "can_be_removed: " << actual_before << " => " << actual_after << " => " << required_after; diff --git a/src/common/snippets/tests/src/pass/precision_propagation_get_precisions.cpp b/src/common/snippets/tests/src/pass/precision_propagation_get_precisions.cpp index 9e97fcc8ad4aa1..8b9b69c1a064b7 100644 --- a/src/common/snippets/tests/src/pass/precision_propagation_get_precisions.cpp +++ b/src/common/snippets/tests/src/pass/precision_propagation_get_precisions.cpp @@ -13,13 +13,13 @@ namespace snippets { class PrecisionPropagationGetPrecisionsTest : public testing::Test {}; TEST_F(PrecisionPropagationGetPrecisionsTest, empty) { - ASSERT_EQ(std::vector{}, ngraph::snippets::pass::PropagatePrecision::get_precisions({}, {})); + ASSERT_EQ(std::vector{}, ov::snippets::pass::PropagatePrecision::get_precisions({}, {})); } TEST_F(PrecisionPropagationGetPrecisionsTest, selected) { ASSERT_EQ( std::vector({element::f32, element::f32}), - ngraph::snippets::pass::PropagatePrecision::get_precisions( + ov::snippets::pass::PropagatePrecision::get_precisions( { element::f32, element::f32 }, { {element::bf16, element::bf16}, @@ -31,7 +31,7 @@ TEST_F(PrecisionPropagationGetPrecisionsTest, selected) { TEST_F(PrecisionPropagationGetPrecisionsTest, first) { ASSERT_EQ( std::vector({ element::bf16, element::bf16 }), - ngraph::snippets::pass::PropagatePrecision::get_precisions( + ov::snippets::pass::PropagatePrecision::get_precisions( { element::i32, element::i32 }, { {element::bf16, element::bf16}, diff --git a/src/common/snippets/tests/src/pass/set_scalar_count_for_load_and_store.cpp b/src/common/snippets/tests/src/pass/set_scalar_count_for_load_and_store.cpp deleted file mode 100644 index 50448be3a5c38f..00000000000000 --- a/src/common/snippets/tests/src/pass/set_scalar_count_for_load_and_store.cpp +++ /dev/null @@ -1,74 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include -#include - -#include -#include - -#include - -#include "common_test_utils/ngraph_test_utils.hpp" - -using namespace testing; -using namespace ngraph; - -// todo: Rewrite this test using Snippets test infrastructure. See ./include/canonicalization.hpp for example - -size_t get_count(const std::shared_ptr& f, const std::string& name, bool is_load = true) { - size_t count = std::numeric_limits::max(); - for (auto op : f->get_ops()) { - if (op->get_friendly_name() == name) { - if (const auto memory_access = std::dynamic_pointer_cast(op)) { - count = is_load ? memory_access->get_input_offset(0) - : memory_access->get_output_offset(0); - } - } - } - return count; -} - -TEST(TransformationTests, SetScalarCountForLoadStore) { - std::shared_ptr f(nullptr), f_ref(nullptr); - const auto count = 16; - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto load = std::make_shared(data, count); - load->set_friendly_name("load"); - auto neg = std::make_shared(load); - auto store = std::make_shared(neg, count); - store->set_friendly_name("store"); - f = std::make_shared(NodeVector{store}, ParameterVector{data}); - - pass::Manager m; - m.register_pass(); - m.register_pass(); - m.register_pass(); - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - { - auto data = std::make_shared(element::f32, Shape{2, 2}); - auto load = std::make_shared(data, 1lu); - load->set_friendly_name("load_ref"); - auto neg = std::make_shared(load); - auto store = std::make_shared(neg, 1lu); - store->set_friendly_name("store_ref"); - f_ref = std::make_shared(NodeVector{store}, ParameterVector{data}); - } - - auto res = compare_functions(f, f_ref); - ASSERT_TRUE(res.first) << res.second; - - auto load_count = get_count(f, "load"); - auto load_count_ref = get_count(f_ref, "load_ref"); - ASSERT_EQ(load_count, load_count_ref); - - auto store_count = get_count(f, "store", false); - auto store_count_ref = get_count(f_ref, "store_ref", false); - ASSERT_EQ(store_count, store_count_ref); -} diff --git a/src/common/snippets/tests/src/pass/softmax_decomposition.cpp b/src/common/snippets/tests/src/pass/softmax_decomposition.cpp deleted file mode 100644 index e3330bd69ded9f..00000000000000 --- a/src/common/snippets/tests/src/pass/softmax_decomposition.cpp +++ /dev/null @@ -1,122 +0,0 @@ -// Copyright (C) 2022 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include "pass/softmax_decomposition.hpp" -#include "common_test_utils/common_utils.hpp" -#include "subgraph_softmax.hpp" -#include "subgraph_lowered.hpp" - -#include "snippets/pass/softmax_decomposition.hpp" -#include "snippets/pass/insert_load_store.hpp" -#include "snippets/pass/insert_movebroadcast.hpp" -#include "snippets/pass/insert_buffer.hpp" -#include "snippets/pass/convert_power_to_powerstatic.hpp" - - -namespace ov { -namespace test { -namespace snippets { - -std::string SoftmaxTests::getTestCaseName(testing::TestParamInfo obj) { - Shape inputShape; - int axis; - std::tie(inputShape, axis) = obj.param; - std::ostringstream result; - result << "IS=" << CommonTestUtils::vec2str(inputShape) << "_"; - result << "Axis=" << axis << "_"; - return result.str(); -} - -void SoftmaxTests::SetUp() { - LoweringTests::SetUp(); - - const size_t count = 10; - manager.register_pass(count); - manager.register_pass(); - manager.register_pass(count); - manager.register_pass(count); - manager.register_pass(); - Shape inputShape; - int axis; - std::tie(inputShape, axis) = this->GetParam(); - snippets_function = std::make_shared(std::vector{inputShape}, axis); - master_shape = inputShape; -} - -std::string AddSoftmaxTests::getTestCaseName(testing::TestParamInfo obj) { - Shape inputShape0, inputShape1; - int axis; - std::tie(inputShape0, inputShape1, axis) = obj.param; - std::ostringstream result; - result << "IS[0]=" << CommonTestUtils::vec2str(inputShape0) << "_"; - result << "IS[1]=" << CommonTestUtils::vec2str(inputShape1) << "_"; - result << "Axis=" << axis << "_"; - return result.str(); -} - -void AddSoftmaxTests::SetUp() { - LoweringTests::SetUp(); - - const size_t count = 10; - manager.register_pass(); - manager.register_pass(count); - manager.register_pass(); - manager.register_pass(count); - manager.register_pass(count); - manager.register_pass(); - Shape inputShape0, inputShape1; - int axis; - std::tie(inputShape0, inputShape1, axis) = this->GetParam(); - snippets_function = std::make_shared(std::vector{inputShape0, inputShape1}, axis); - - ov::PartialShape master_pshape(inputShape0); - ov::PartialShape::broadcast_merge_into(master_pshape, inputShape1, op::AutoBroadcastType::NUMPY); - master_shape = master_pshape.get_shape(); -} - -TEST_P(SoftmaxTests, SoftmaxDecomposition) { - PartialShape scheduler_shape({master_shape[master_shape.size() - 2], - master_shape[master_shape.size() - 1]}); - auto subgraph = getLoweredSubgraph(snippets_function->getOriginal(), scheduler_shape); - function = subgraph->body_ptr(); - function_ref = snippets_function->getLowered(); -} - -TEST_P(AddSoftmaxTests, AddSoftmaxDecomposition) { - PartialShape scheduler_shape({master_shape[master_shape.size() - 2], - master_shape[master_shape.size() - 1]}); - auto subgraph = getLoweredSubgraph(snippets_function->getOriginal(), scheduler_shape); - function = subgraph->body_ptr(); - function_ref = snippets_function->getLowered(); -} - -namespace SoftmaxTestsInstantiation { -std::vector inputShape{{12, 4, 12, 12, 127}, {12, 4, 12, 12, 1}}; - -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_SoftmaxDecomposition, SoftmaxTests, - ::testing::Combine( - ::testing::ValuesIn(inputShape), - ::testing::Values(-1)), - SoftmaxTests::getTestCaseName); - -} // namespace SoftmaxTestsInstantiation - -namespace AddSoftmaxTestsInstantiation { -std::vector inputShape0{{12, 4, 12, 12, 17}, {12, 4, 12, 12, 1}}; -std::vector inputShape1{{12, 4, 12, 12, 17}, {12, 4, 12, 12, 1}}; - -INSTANTIATE_TEST_SUITE_P(smoke_Snippets_AddSoftmaxDecomposition, AddSoftmaxTests, - ::testing::Combine( - ::testing::ValuesIn(inputShape0), - ::testing::ValuesIn(inputShape1), - ::testing::Values(-1)), - AddSoftmaxTests::getTestCaseName); - -} // namespace AddSoftmaxTestsInstantiation - -} // namespace snippets -} // namespace test -} // namespace ov diff --git a/src/common/snippets/tests/src/pass/softmax_reshape_elimination.cpp b/src/common/snippets/tests/src/pass/softmax_reshape_elimination.cpp index f8b51924a025ae..0547bcb476b49f 100644 --- a/src/common/snippets/tests/src/pass/softmax_reshape_elimination.cpp +++ b/src/common/snippets/tests/src/pass/softmax_reshape_elimination.cpp @@ -1,4 +1,4 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // @@ -15,55 +15,55 @@ #include "common_test_utils/ngraph_test_utils.hpp" using namespace testing; -using namespace ngraph; +using namespace ov; TEST_F(TransformationTestsF, SoftmaxV1ReshapeElimination) { { - auto data = std::make_shared(element::f32, Shape{2, 3, 240}); + auto data = std::make_shared(element::f32, Shape{2, 3, 240}); auto shape0 = std::make_shared(ov::element::i32, ov::Shape{2}, std::vector{6, 240}); auto reshape0 = std::make_shared(data, shape0, false); auto softmax_v1 = std::make_shared(reshape0, 1); auto shape1 = std::make_shared(ov::element::i32, ov::Shape{3}, std::vector{2, 3, 240}); auto reshape1 = std::make_shared(softmax_v1, shape1, false); - function = std::make_shared(NodeVector{reshape1}, ParameterVector{data}); + function = std::make_shared(NodeVector{reshape1}, ParameterVector{data}); manager.register_pass(); } { - auto data = std::make_shared(element::f32, Shape{2, 3, 240}); + auto data = std::make_shared(element::f32, Shape{2, 3, 240}); auto softmax_v1 = std::make_shared(data, 2); - function_ref = std::make_shared(NodeVector{softmax_v1}, ParameterVector{data}); + function_ref = std::make_shared(NodeVector{softmax_v1}, ParameterVector{data}); } } TEST_F(TransformationTestsF, SoftmaxV8ReshapeElimination) { { - auto data = std::make_shared(element::f32, Shape{1, 2, 340, 240}); + auto data = std::make_shared(element::f32, Shape{1, 2, 340, 240}); auto shape0 = std::make_shared(ov::element::i32, ov::Shape{2}, std::vector{680, 240}); auto reshape0 = std::make_shared(data, shape0, false); auto softmax_v1 = std::make_shared(reshape0, -1); auto shape1 = std::make_shared(ov::element::i32, ov::Shape{4}, std::vector{1, 2, 340, 240}); auto reshape1 = std::make_shared(softmax_v1, shape1, false); - function = std::make_shared(NodeVector{reshape1}, ParameterVector{data}); + function = std::make_shared(NodeVector{reshape1}, ParameterVector{data}); manager.register_pass(); } { - auto data = std::make_shared(element::f32, Shape{1, 2, 340, 240}); + auto data = std::make_shared(element::f32, Shape{1, 2, 340, 240}); auto softmax_v1 = std::make_shared(data, 3); - function_ref = std::make_shared(NodeVector{softmax_v1}, ParameterVector{data}); + function_ref = std::make_shared(NodeVector{softmax_v1}, ParameterVector{data}); } } TEST_F(TransformationTestsF, SoftmaxReshapeElimination_IncorrectReshape) { { - auto data = std::make_shared(element::f32, Shape{1, 2, 340, 240}); + auto data = std::make_shared(element::f32, Shape{1, 2, 340, 240}); auto shape0 = std::make_shared(ov::element::i32, ov::Shape{2}, std::vector{2, 81600}); auto reshape0 = std::make_shared(data, shape0, false); auto softmax_v1 = std::make_shared(reshape0, -1); auto shape1 = std::make_shared(ov::element::i32, ov::Shape{4}, std::vector{1, 2, 340, 240}); auto reshape1 = std::make_shared(softmax_v1, shape1, false); - function = std::make_shared(NodeVector{reshape1}, ParameterVector{data}); + function = std::make_shared(NodeVector{reshape1}, ParameterVector{data}); manager.register_pass(); } diff --git a/src/common/snippets/tests/src/precomp.hpp b/src/common/snippets/tests/src/precomp.hpp index de1a6039b3d8c9..84796af134517a 100644 --- a/src/common/snippets/tests/src/precomp.hpp +++ b/src/common/snippets/tests/src/precomp.hpp @@ -1,11 +1,11 @@ -// Copyright (C) 2022 Intel Corporation +// Copyright (C) 2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // #pragma once #include -#include +#include "openvino/core/node.hpp" #include #include diff --git a/src/common/snippets/tests/src/registers.cpp b/src/common/snippets/tests/src/registers.cpp deleted file mode 100644 index e9d7c503802142..00000000000000 --- a/src/common/snippets/tests/src/registers.cpp +++ /dev/null @@ -1,175 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include - -#include -#include - -#include -#include - -#include - -#include "common_test_utils/ngraph_test_utils.hpp" -#include "lowering_utils.hpp" - -using namespace testing; -using namespace ngraph; - -// todo: Rewrite this test using Snippets test infrastructure. See ./include/canonicalization.hpp for example - -TEST(TransformationTests, AssignRegisters) { - const auto generator = std::make_shared(); - std::shared_ptr f(nullptr); - { - auto p0 = std::make_shared(element::f32, Shape(1)); - auto p1 = std::make_shared(element::f32, Shape(1)); - p0->set_friendly_name("p00"); - p1->set_friendly_name("p01"); - auto y00 = std::make_shared(p0); y00->set_friendly_name("y00"); - auto y01 = std::make_shared(p1); y01->set_friendly_name("y01"); - auto y02 = std::make_shared(y00, y01); y02->set_friendly_name("y02"); - auto s00 = std::make_shared(y02); s00->set_friendly_name("y03"); - s00->set_friendly_name("s00"); - f = std::make_shared(NodeVector{s00}, ParameterVector{p0, p1}); - // Note that testing the result is not strictly necessary, since the Result doesn't emit any code - f->get_result()->set_friendly_name("r00"); - - pass::Manager m; - m.register_pass(); - std::function& op)> reg_type_mapper = - [=](const std::shared_ptr& op) -> snippets::Generator::opRegType { - return generator->get_op_reg_type(op); - }; - m.register_pass(reg_type_mapper); - - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - - /* Instead of comparing to a reference function check that registers are correctly assigned and stored to runtime - * info. Note that Parameters and Store rt_info contains gpr indexes, while general op's rt_info contain vector - * indexes */ - { - std::map ref_registers { - {"p00", 0}, // gpr - {"p01", 1}, // gpr - {"y00", 0}, - {"y01", 1}, - {"y02", 2}, - {"s00", 2}, // gpr - {"r00", 2} // gpr - }; - - auto total_ops = 0; - for (auto& op : f->get_ordered_ops()) { - for (const auto& output : op->outputs()) { - const auto& rt = output.get_tensor_ptr()->get_rt_info(); - auto it_rt = rt.find("reginfo"); - if (it_rt != rt.end()) { - auto reg = it_rt->second.as(); - ASSERT_TRUE(ref_registers[op->get_friendly_name()] == reg); - total_ops++; - } - } - } - ASSERT_EQ(total_ops, ref_registers.size()); - } -} - -TEST(TransformationTests, AssignRegisters2) { - const auto generator = std::make_shared(); - std::shared_ptr f(nullptr); - { - auto p0 = std::make_shared(ngraph::element::f32, Shape()); - auto p1 = std::make_shared(ngraph::element::f32, Shape()); - auto p2 = std::make_shared(ngraph::element::f32, Shape()); - auto p3 = std::make_shared(ngraph::element::f32, Shape()); - auto p4 = std::make_shared(ngraph::element::f32, Shape()); - auto p5 = std::make_shared(ngraph::element::f32, Shape()); - auto p6 = std::make_shared(ngraph::element::f32, Shape()); - auto p7 = std::make_shared(ngraph::element::f32, Shape()); - p0->set_friendly_name("p00"); - p1->set_friendly_name("p01"); - p2->set_friendly_name("p02"); - p3->set_friendly_name("p03"); - p4->set_friendly_name("p04"); - p5->set_friendly_name("p05"); - p6->set_friendly_name("p06"); - p7->set_friendly_name("p07"); - - auto c0 = std::make_shared(ngraph::element::f32, Shape(), 3.14f); c0->set_friendly_name("r00"); - auto c1 = std::make_shared(ngraph::element::f32, Shape(), 6.6260701e-34f); c1->set_friendly_name("r01"); - - auto y00 = std::make_shared(p0); y00->set_friendly_name("r02"); - auto y01 = std::make_shared(p1); y01->set_friendly_name("r03"); - auto y02 = std::make_shared(y00, c0); y02->set_friendly_name("r04"); - auto y03 = std::make_shared(y01, c1); y03->set_friendly_name("r05"); - auto y04 = std::make_shared(p2); y04->set_friendly_name("r06"); - auto y05 = std::make_shared(p3); y05->set_friendly_name("r07"); - auto y06 = std::make_shared(y02, y03); y06->set_friendly_name("r08"); - auto y07 = std::make_shared(y04, c0); y07->set_friendly_name("r09"); - auto y08 = std::make_shared(y05, c1); y08->set_friendly_name("r10"); - auto y09 = std::make_shared(p4); y09->set_friendly_name("r11"); - auto y10 = std::make_shared(p5); y10->set_friendly_name("r12"); - auto y11 = std::make_shared(y07, y08); y11->set_friendly_name("r13"); - auto y12 = std::make_shared(y09, c0); y12->set_friendly_name("r14"); - auto y13 = std::make_shared(y10, c1); y13->set_friendly_name("r15"); - auto y14 = std::make_shared(p6); y14->set_friendly_name("r16"); - auto y15 = std::make_shared(y12, y13); y15->set_friendly_name("r17"); - auto y16 = std::make_shared(p7); y16->set_friendly_name("r18"); - auto y17 = std::make_shared(y14, c0); y17->set_friendly_name("r19"); - auto y18 = std::make_shared(y16, c1); y18->set_friendly_name("r20"); - auto y19 = std::make_shared(y06, y11); y19->set_friendly_name("r21"); - auto y20 = std::make_shared(y17, y18); y20->set_friendly_name("r22"); - auto y21 = std::make_shared(y15, y19); y21->set_friendly_name("r23"); - auto y22 = std::make_shared(y20, y21); y22->set_friendly_name("r24"); - auto s00 = std::make_shared(y22); - s00->set_friendly_name("s00"); - - f = std::make_shared(NodeVector{s00}, ParameterVector{p0, p1, p2, p3, p4, p5, p6, p7}); - f->get_result()->set_friendly_name("res00"); - - pass::Manager m; - m.register_pass(); - std::function& op)> reg_type_mapper = - [=](const std::shared_ptr& op) -> snippets::Generator::opRegType { - return generator->get_op_reg_type(op); - }; - m.register_pass(reg_type_mapper); - m.run_passes(f); - ASSERT_NO_THROW(check_rt_info(f)); - } - - // instead of comparing to a reference function check that registers are correctly assigned - // and stored to runtime info - { - std::map ref_registers { - {"p00", 0}, {"p01", 1}, {"p02", 2}, {"p03", 3}, {"p04", 4}, {"p05", 5}, - {"p06", 6}, {"p07", 7}, - {"r00", 1}, {"r01", 3}, {"r02", 5}, {"r03", 5}, {"r04", 2}, {"r05", 6}, - {"r06", 6}, {"r07", 6}, {"r08", 5}, {"r09", 2}, {"r10", 1}, {"r11", 4}, - {"r12", 4}, {"r13", 6}, {"r14", 2}, {"r15", 5}, {"r16", 0}, {"r17", 4}, - {"r18", 0}, {"r19", 2}, {"r20", 4}, {"r21", 1}, {"r22", 0}, {"r23", 6}, - {"r24", 1}, - {"s00", 8}, - {"res00", 8} - }; - - auto total_ops = 0; - for (auto& op : f->get_ordered_ops()) { - for (const auto& output : op->outputs()) { - const auto& rt = output.get_tensor_ptr()->get_rt_info(); - auto it_rt = rt.find("reginfo"); - if (it_rt != rt.end()) { - auto reg = it_rt->second.as(); - ASSERT_TRUE(ref_registers[op->get_friendly_name()] == reg); - total_ops++; - } - } - } - ASSERT_EQ(total_ops, ref_registers.size()); - } -} diff --git a/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.cpp b/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.cpp index bc3ce1c2d3f677..1244fac99ad2fb 100644 --- a/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.cpp +++ b/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.cpp @@ -20,16 +20,15 @@ #include "transformations/snippets/x64/op/fused_mul_add.hpp" #include "transformations/snippets/x64/op/brgemm_copy_b.hpp" #include "transformations/snippets/x64/op/brgemm_cpu.hpp" -#include "snippets/op/brgemm.hpp" #include "transformations/cpu_opset/common/op/swish_cpu.hpp" +#include "transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.hpp" #include using namespace std; -using namespace ngraph::snippets; #define CREATE_EMITTER(e_type) { \ - [this](const std::shared_ptr& n) -> std::shared_ptr { \ + [this](const std::shared_ptr& n) -> std::shared_ptr { \ return std::make_shared(h.get(), isa, n); \ }, \ [](const std::shared_ptr& n) -> std::set> { \ @@ -53,99 +52,99 @@ class jit_snippet : public dnnl::impl::cpu::x64::jit_generator { ov::intel_cpu::CPUTargetMachine::CPUTargetMachine(dnnl::impl::cpu::x64::cpu_isa_t host_isa) : TargetMachine(), h(new jit_snippet()), isa(host_isa) { // data movement - jitters[ngraph::opset1::Parameter::get_type_info_static()] = CREATE_EMITTER(NopEmitter); - jitters[ngraph::opset1::Result::get_type_info_static()] = CREATE_EMITTER(NopEmitter); - jitters[ngraph::snippets::op::Buffer::get_type_info_static()] = CREATE_EMITTER(NopEmitter); - jitters[ngraph::snippets::op::VectorBuffer::get_type_info_static()] = CREATE_EMITTER(VectorBufferEmitter); - // jitters[ngraph::opset1::Constant::get_type_info_static()] = CREATE_EMITTER(); // Not supported - - jitters[ngraph::snippets::op::Load::get_type_info_static()] = CREATE_EMITTER(LoadEmitter); - jitters[ngraph::snippets::op::LoadReshape::get_type_info_static()] = CREATE_EMITTER(LoadEmitter); - jitters[ngraph::snippets::op::BroadcastLoad::get_type_info_static()] = CREATE_EMITTER(BroadcastLoadEmitter); + jitters[ov::op::v0::Parameter::get_type_info_static()] = CREATE_EMITTER(NopEmitter); + jitters[ov::op::v0::Result::get_type_info_static()] = CREATE_EMITTER(NopEmitter); + jitters[snippets::op::Buffer::get_type_info_static()] = CREATE_EMITTER(NopEmitter); + jitters[snippets::op::VectorBuffer::get_type_info_static()] = CREATE_EMITTER(VectorBufferEmitter); + // jitters[ov::op::v1::Constant::get_type_info_static()] = CREATE_EMITTER(); // Not supported + + jitters[snippets::op::Load::get_type_info_static()] = CREATE_EMITTER(LoadEmitter); + jitters[snippets::op::LoadReshape::get_type_info_static()] = CREATE_EMITTER(LoadEmitter); + jitters[snippets::op::BroadcastLoad::get_type_info_static()] = CREATE_EMITTER(BroadcastLoadEmitter); jitters[ov::intel_cpu::LoadConvertSaturation::get_type_info_static()] = CREATE_EMITTER(LoadConvertEmitter); jitters[ov::intel_cpu::LoadConvertTruncation::get_type_info_static()] = CREATE_EMITTER(LoadConvertEmitter); - jitters[ngraph::snippets::op::Store::get_type_info_static()] = CREATE_EMITTER(StoreEmitter); + jitters[snippets::op::Store::get_type_info_static()] = CREATE_EMITTER(StoreEmitter); jitters[ov::intel_cpu::StoreConvertSaturation::get_type_info_static()] = CREATE_EMITTER(StoreConvertEmitter); jitters[ov::intel_cpu::StoreConvertTruncation::get_type_info_static()] = CREATE_EMITTER(StoreConvertEmitter); - jitters[ngraph::snippets::op::Scalar::get_type_info_static()] = CREATE_EMITTER(ScalarEmitter); - jitters[ngraph::snippets::op::BroadcastMove::get_type_info_static()] = CREATE_EMITTER(BroadcastMoveEmitter); - // jitters[ngraph::snippets::op::Nop::get_type_info_static()] = CREATE_EMITTER(NopEmitter); // Not supported - // jitters[ngraph::opset1::Broadcast::get_type_info_static()] = CREATE_EMITTER(); // Not supported + jitters[snippets::op::Scalar::get_type_info_static()] = CREATE_EMITTER(ScalarEmitter); + jitters[snippets::op::BroadcastMove::get_type_info_static()] = CREATE_EMITTER(BroadcastMoveEmitter); + // jitters[snippets::op::Nop::get_type_info_static()] = CREATE_EMITTER(NopEmitter); // Not supported + // jitters[ov::op::v1::Broadcast::get_type_info_static()] = CREATE_EMITTER(); // Not supported - jitters[ngraph::snippets::op::ConvertTruncation::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_convert_truncation_emitter); - jitters[ngraph::snippets::op::ConvertSaturation::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_convert_saturation_emitter); - // jitters[ngraph::opset1::FakeQuantize::get_type_info_static()] = CREATE_EMITTER(); // not supported + jitters[snippets::op::ConvertTruncation::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_convert_truncation_emitter); + jitters[snippets::op::ConvertSaturation::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_convert_saturation_emitter); + // jitters[ov::op::v1::FakeQuantize::get_type_info_static()] = CREATE_EMITTER(); // not supported // ternary - jitters[ngraph::opset1::Select::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_select_emitter); + jitters[ov::op::v1::Select::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_select_emitter); jitters[ov::intel_cpu::FusedMulAdd::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_mul_add_emitter); // binary - jitters[ngraph::opset1::Add::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_add_emitter); - jitters[ngraph::opset1::Divide::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_divide_emitter); - jitters[ngraph::opset1::Equal::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_equal_emitter); - jitters[ngraph::opset1::FloorMod::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_floor_mod_emitter); - jitters[ngraph::opset1::Greater::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_greater_emitter); - jitters[ngraph::opset1::GreaterEqual::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_greater_equal_emitter); - jitters[ngraph::opset1::Less::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_less_emitter); - jitters[ngraph::opset1::LessEqual::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_less_equal_emitter); - jitters[ngraph::opset1::LogicalAnd::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_and_emitter); - jitters[ngraph::opset1::LogicalOr::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_or_emitter); - jitters[ngraph::opset1::LogicalXor::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_xor_emitter); - jitters[ngraph::opset1::Maximum::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_maximum_emitter); - jitters[ngraph::opset1::Minimum::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_minimum_emitter); - jitters[ngraph::opset1::Mod::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_mod_emitter); - jitters[ngraph::opset1::Multiply::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_multiply_emitter); - jitters[ngraph::opset1::NotEqual::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_not_equal_emitter); - jitters[ngraph::snippets::op::PowerStatic::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_power_static_emitter); - jitters[ngraph::opset1::Power::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_power_dynamic_emitter); - jitters[ngraph::opset1::PRelu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_prelu_emitter); - jitters[ngraph::opset1::SquaredDifference::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_squared_difference_emitter); - jitters[ngraph::opset1::Subtract::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_subtract_emitter); - jitters[ngraph::opset1::Xor::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_xor_emitter); + jitters[ov::op::v1::Add::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_add_emitter); + jitters[ov::op::v1::Divide::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_divide_emitter); + jitters[ov::op::v1::Equal::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_equal_emitter); + jitters[ov::op::v1::FloorMod::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_floor_mod_emitter); + jitters[ov::op::v1::Greater::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_greater_emitter); + jitters[ov::op::v1::GreaterEqual::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_greater_equal_emitter); + jitters[ov::op::v1::Less::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_less_emitter); + jitters[ov::op::v1::LessEqual::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_less_equal_emitter); + jitters[ov::op::v1::LogicalAnd::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_and_emitter); + jitters[ov::op::v1::LogicalOr::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_or_emitter); + jitters[ov::op::v1::LogicalXor::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_xor_emitter); + jitters[ov::op::v1::Maximum::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_maximum_emitter); + jitters[ov::op::v1::Minimum::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_minimum_emitter); + jitters[ov::op::v1::Mod::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_mod_emitter); + jitters[ov::op::v1::Multiply::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_multiply_emitter); + jitters[ov::op::v1::NotEqual::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_not_equal_emitter); + jitters[snippets::op::PowerStatic::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_power_static_emitter); + jitters[ov::op::v1::Power::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_power_dynamic_emitter); + jitters[ov::op::v0::PRelu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_prelu_emitter); + jitters[ov::op::v0::SquaredDifference::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_squared_difference_emitter); + jitters[ov::op::v1::Subtract::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_subtract_emitter); + jitters[ov::op::v0::Xor::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_xor_emitter); // unary - jitters[ngraph::opset1::Abs::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_abs_emitter); - // jitters[ngraph::opset1::Acos::get_type_info_static()] = CREATE_EMITTER(); // not supported - // jitters[ngraph::opset1::Asin::get_type_info_static()] = CREATE_EMITTER(); // not supported - // jitters[ngraph::opset1::Atan::get_type_info_static()] = CREATE_EMITTER(); // not supported - jitters[ngraph::opset1::Ceiling::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_ceiling_emitter); - jitters[ngraph::opset1::Clamp::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_clamp_emitter); - // jitters[ngraph::opset1::Cos::get_type_info_static()] = CREATE_EMITTER(); // not supported - // jitters[ngraph::opset1::Cosh::get_type_info_static()] = CREATE_EMITTER(); // not supported - jitters[ngraph::opset1::Elu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_elu_emitter); - jitters[ngraph::opset1::Erf::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_erf_emitter); - jitters[ngraph::opset1::Exp::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_exp_emitter); - jitters[ngraph::opset1::Floor::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_floor_emitter); + jitters[ov::op::v0::Abs::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_abs_emitter); + // jitters[ov::op::v1::Acos::get_type_info_static()] = CREATE_EMITTER(); // not supported + // jitters[ov::op::v1::Asin::get_type_info_static()] = CREATE_EMITTER(); // not supported + // jitters[ov::op::v1::Atan::get_type_info_static()] = CREATE_EMITTER(); // not supported + jitters[ov::op::v0::Ceiling::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_ceiling_emitter); + jitters[ov::op::v0::Clamp::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_clamp_emitter); + // jitters[ov::op::v1::Cos::get_type_info_static()] = CREATE_EMITTER(); // not supported + // jitters[ov::op::v1::Cosh::get_type_info_static()] = CREATE_EMITTER(); // not supported + jitters[ov::op::v0::Elu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_elu_emitter); + jitters[ov::op::v0::Erf::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_erf_emitter); + jitters[ov::op::v0::Exp::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_exp_emitter); + jitters[ov::op::v0::Floor::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_floor_emitter); jitters[ngraph::opset5::Round::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_round_emitter); - // jitters[ngraph::opset1::Log::get_type_info_static()] = CREATE_EMITTER(); // not supported - jitters[ngraph::opset1::LogicalNot::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_not_emitter); - jitters[ngraph::opset1::Negative::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_negative_emitter); - jitters[ngraph::opset1::Relu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_relu_emitter); - // jitters[ngraph::opset1::Sign::get_type_info_static()] = CREATE_EMITTER(); // not supported - jitters[ngraph::opset1::Sigmoid::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_sigmoid_emitter); - // jitters[ngraph::opset1::Sin::get_type_info_static()] = CREATE_EMITTER(); // not supported - // jitters[ngraph::opset1::Sinh::get_type_info_static()] = CREATE_EMITTER(); // not supported - jitters[ngraph::opset1::Sqrt::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_sqrt_emitter); - // jitters[ngraph::opset1::Tan::get_type_info_static()] = CREATE_EMITTER(); // not supported - jitters[ngraph::opset1::Tanh::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_tanh_emitter); + // jitters[ov::op::v1::Log::get_type_info_static()] = CREATE_EMITTER(); // not supported + jitters[ov::op::v1::LogicalNot::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_logical_not_emitter); + jitters[ov::op::v0::Negative::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_negative_emitter); + jitters[ov::op::v0::Relu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_relu_emitter); + // jitters[ov::op::v1::Sign::get_type_info_static()] = CREATE_EMITTER(); // not supported + jitters[ov::op::v0::Sigmoid::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_sigmoid_emitter); + // jitters[ov::op::v1::Sin::get_type_info_static()] = CREATE_EMITTER(); // not supported + // jitters[ov::op::v1::Sinh::get_type_info_static()] = CREATE_EMITTER(); // not supported + jitters[ov::op::v0::Sqrt::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_sqrt_emitter); + // jitters[ov::op::v1::Tan::get_type_info_static()] = CREATE_EMITTER(); // not supported + jitters[ov::op::v0::Tanh::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_tanh_emitter); jitters[ov::intel_cpu::SwishNode::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_swish_emitter); jitters[ngraph::op::v4::HSwish::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_hswish_emitter); - // jitters[ngraph::opset1::HardSigmoid::get_type_info_static()] = CREATE_EMITTER(); // not supported - // jitters[ngraph::opset1::Selu::get_type_info_static()] = CREATE_EMITTER(); // not supported + // jitters[ov::op::v1::HardSigmoid::get_type_info_static()] = CREATE_EMITTER(); // not supported + // jitters[ov::op::v1::Selu::get_type_info_static()] = CREATE_EMITTER(); // not supported jitters[ngraph::op::v0::Gelu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_gelu_v0_emitter); jitters[ngraph::op::v7::Gelu::get_type_info_static()] = CREATE_EMITTER(ov::intel_cpu::jit_gelu_v7_emitter); - jitters[ngraph::snippets::op::Fill::get_type_info_static()] = CREATE_EMITTER(FillEmitter); + jitters[snippets::op::Fill::get_type_info_static()] = CREATE_EMITTER(FillEmitter); - jitters[ngraph::snippets::op::HorizonMax::get_type_info_static()] = CREATE_EMITTER(HorizonMaxEmitter); - jitters[ngraph::snippets::op::HorizonSum::get_type_info_static()] = CREATE_EMITTER(HorizonSumEmitter); + jitters[snippets::op::HorizonMax::get_type_info_static()] = CREATE_EMITTER(HorizonMaxEmitter); + jitters[snippets::op::HorizonSum::get_type_info_static()] = CREATE_EMITTER(HorizonSumEmitter); - jitters[ngraph::snippets::op::Kernel::get_type_info_static()] = CREATE_EMITTER(KernelEmitter); - jitters[ngraph::snippets::op::LoopBegin::get_type_info_static()] = CREATE_EMITTER(LoopBeginEmitter); - jitters[ngraph::snippets::op::LoopEnd::get_type_info_static()] = CREATE_EMITTER(LoopEndEmitter); + jitters[snippets::op::Kernel::get_type_info_static()] = CREATE_EMITTER(KernelEmitter); + jitters[snippets::op::LoopBegin::get_type_info_static()] = CREATE_EMITTER(LoopBeginEmitter); + jitters[snippets::op::LoopEnd::get_type_info_static()] = CREATE_EMITTER(LoopEndEmitter); jitters[ov::intel_cpu::BrgemmCPU::get_type_info_static()] = CREATE_EMITTER(BrgemmEmitter); jitters[ov::intel_cpu::BrgemmCopyB::get_type_info_static()] = CREATE_EMITTER(BrgemmCopyBEmitter); } @@ -163,7 +162,7 @@ bool ov::intel_cpu::CPUTargetMachine::is_supported() const { return dnnl::impl::cpu::x64::mayiuse(isa); } -code ov::intel_cpu::CPUTargetMachine::get_snippet() const { +ov::snippets::code ov::intel_cpu::CPUTargetMachine::get_snippet() const { if (h->create_kernel() != dnnl::impl::status::success) { IE_THROW() << "Failed to create jit_kernel in get_snippet()"; } @@ -173,7 +172,7 @@ code ov::intel_cpu::CPUTargetMachine::get_snippet() const { ov::intel_cpu::CPUGenerator::CPUGenerator(dnnl::impl::cpu::x64::cpu_isa_t isa_) : Generator(std::make_shared(isa_)) { } -ngraph::snippets::Generator::opRegType ov::intel_cpu::CPUGenerator::get_specific_op_reg_type(const std::shared_ptr& op) const { +ov::snippets::Generator::opRegType ov::intel_cpu::CPUGenerator::get_specific_op_reg_type(const std::shared_ptr& op) const { if (std::dynamic_pointer_cast(op) || std::dynamic_pointer_cast(op)) return gpr2gpr; diff --git a/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.hpp b/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.hpp index b624d2c0b093bf..96ccbb4b0db97f 100644 --- a/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.hpp +++ b/src/plugins/intel_cpu/src/emitters/x64/cpu_generator.hpp @@ -7,17 +7,18 @@ #include #include +#include "snippets/target_machine.hpp" #include "snippets/generator.hpp" namespace ov { namespace intel_cpu { -class CPUTargetMachine : public ngraph::snippets::TargetMachine { +class CPUTargetMachine : public snippets::TargetMachine { public: CPUTargetMachine(dnnl::impl::cpu::x64::cpu_isa_t host_isa); bool is_supported() const override; - ngraph::snippets::code get_snippet() const override; + snippets::code get_snippet() const override; size_t get_lanes() const override; private: @@ -25,7 +26,7 @@ class CPUTargetMachine : public ngraph::snippets::TargetMachine { dnnl::impl::cpu::x64::cpu_isa_t isa; }; -class CPUGenerator : public ngraph::snippets::Generator { +class CPUGenerator : public snippets::Generator { public: CPUGenerator(dnnl::impl::cpu::x64::cpu_isa_t isa); diff --git a/src/plugins/intel_cpu/src/emitters/x64/jit_dnnl_ext_emitters.hpp b/src/plugins/intel_cpu/src/emitters/x64/jit_dnnl_ext_emitters.hpp index 6e31515fa46c22..67ff928321c7fe 100644 --- a/src/plugins/intel_cpu/src/emitters/x64/jit_dnnl_ext_emitters.hpp +++ b/src/plugins/intel_cpu/src/emitters/x64/jit_dnnl_ext_emitters.hpp @@ -56,7 +56,7 @@ class jit_elu_emitter : public jit_dnnl_emitter { InferenceEngine::Precision exec_prc = InferenceEngine::Precision::FP32) : jit_dnnl_emitter(host, host_isa, n, exec_prc) { kind = dnnl_eltwise_elu; - alpha = ngraph::as_type_ptr(n)->get_alpha(); + alpha = ngraph::as_type_ptr(n)->get_alpha(); beta = 0.f; set_injector(); @@ -95,7 +95,7 @@ class jit_clamp_emitter : public jit_dnnl_emitter { InferenceEngine::Precision exec_prc = InferenceEngine::Precision::FP32) : jit_dnnl_emitter(host, host_isa, n, exec_prc) { kind = dnnl_eltwise_clip; - auto op = ngraph::as_type_ptr(n); + auto op = ngraph::as_type_ptr(n); alpha = op->get_min(); beta = op->get_max(); diff --git a/src/plugins/intel_cpu/src/emitters/x64/jit_eltwise_emitters.cpp b/src/plugins/intel_cpu/src/emitters/x64/jit_eltwise_emitters.cpp index 150d524ac04ce7..0ba374b68b93be 100644 --- a/src/plugins/intel_cpu/src/emitters/x64/jit_eltwise_emitters.cpp +++ b/src/plugins/intel_cpu/src/emitters/x64/jit_eltwise_emitters.cpp @@ -1462,7 +1462,7 @@ size_t jit_logical_not_emitter::aux_vecs_count() const { /// POWER_STATIC /// jit_power_static_emitter::jit_power_static_emitter(x64::jit_generator *host, x64::cpu_isa_t host_isa, const std::shared_ptr& node, Precision exec_prc) : jit_emitter(host, host_isa, node, exec_prc) { - auto powerStaticNode = ov::as_type_ptr(node); + auto powerStaticNode = ov::as_type_ptr(node); if (powerStaticNode == nullptr) { IE_THROW() << "Can't cast to snippets::op::PowerStatic"; } diff --git a/src/plugins/intel_cpu/src/emitters/x64/jit_emitter.hpp b/src/plugins/intel_cpu/src/emitters/x64/jit_emitter.hpp index eb3309de32d8c5..d2e3a33b914406 100644 --- a/src/plugins/intel_cpu/src/emitters/x64/jit_emitter.hpp +++ b/src/plugins/intel_cpu/src/emitters/x64/jit_emitter.hpp @@ -28,7 +28,7 @@ struct emitter_params { virtual size_t hash() const = 0; }; -class jit_emitter : public ngraph::snippets::Emitter { +class jit_emitter : public ov::snippets::Emitter { public: jit_emitter(dnnl::impl::cpu::x64::jit_generator* host, dnnl::impl::cpu::x64::cpu_isa_t host_isa, InferenceEngine::Precision exec_prc = InferenceEngine::Precision::FP32, emitter_in_out_map in_out_type = emitter_in_out_map::vec_to_vec) diff --git a/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.cpp b/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.cpp index c58dfb595069b9..ed341ae1309caa 100644 --- a/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.cpp +++ b/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.cpp @@ -2,18 +2,17 @@ // SPDX-License-Identifier: Apache-2.0 // -#include +#include "jit_snippets_emitters.hpp" + #include -#include "jit_snippets_emitters.hpp" -#include "snippets/op/subgraph.hpp" -#include "snippets/utils.hpp" +#include "snippets/snippets_isa.hpp" +#include "snippets/lowered/expression.hpp" +#include "snippets/lowered/port_connector.hpp" #include "transformations/snippets/x64/op/brgemm_copy_b.hpp" #include "transformations/snippets/x64/op//brgemm_cpu.hpp" using namespace InferenceEngine; -using ngraph::snippets::op::Subgraph; -using ngraph::snippets::AllocatedEmitter; using namespace Xbyak; using namespace dnnl::impl; using namespace dnnl::impl::cpu::x64; @@ -36,8 +35,8 @@ jit_container_emitter::jit_container_emitter(dnnl::impl::cpu::x64::jit_generator } void jit_container_emitter::map_abstract_registers(mapping_info& gpr_map_pool, mapping_info& vec_map_pool, - std::vector& allocated_emitters) const { - if (allocated_emitters.empty()) + snippets::lowered::LinearIR::container& expressions) const { + if (expressions.empty()) IE_THROW() << "Cannot map registers when there is no allocated_emitters provided"; auto map_regs = [](const std::vector& abstract_regs, mapping_info& mapping) { auto& abstract_to_physical = mapping.first; @@ -59,25 +58,14 @@ void jit_container_emitter::map_abstract_registers(mapping_info& gpr_map_pool, return physical_regs; }; - for (auto& code : allocated_emitters) { - const auto& emitter = code.first; + for (const auto& expression : expressions) { + const auto& emitter = expression->get_emitter(); std::vector in_abstract_regs, out_abstract_regs; - std::tie(in_abstract_regs, out_abstract_regs) = code.second; + std::tie(in_abstract_regs, out_abstract_regs) = expression->get_reg_info(); std::vector in_physical_regs, out_physical_regs; switch (std::dynamic_pointer_cast(emitter)->get_in_out_type()) { case gpr_to_gpr: - // Note that gpr_to_gpr is used for high-level utility operations like Kernel/Loop. - // Input registers are not mapped in this case, since they contain utility info - // (num_params, loop increment, etc.), but not reg indexes. - // todo: Note that LoopBeginEmitter and LoopEndEmitter demonstrate new paradigm, - // where all utility emitters align with conventional Op emitters - if (std::dynamic_pointer_cast(emitter) || - std::dynamic_pointer_cast(emitter) || - std::dynamic_pointer_cast(emitter) || - std::dynamic_pointer_cast(emitter)) - in_physical_regs = map_regs(in_abstract_regs, gpr_map_pool); - else - in_physical_regs = std::move(in_abstract_regs); + in_physical_regs = map_regs(in_abstract_regs, gpr_map_pool); out_physical_regs = map_regs(out_abstract_regs, gpr_map_pool); break; case gpr_to_vec: @@ -98,9 +86,9 @@ void jit_container_emitter::map_abstract_registers(mapping_info& gpr_map_pool, default: IE_THROW() << "Unhandled in_out type"; } - code.second = std::make_pair(in_physical_regs, out_physical_regs); - if (auto container = std::dynamic_pointer_cast(code.first)) - container->map_abstract_registers(gpr_map_pool, vec_map_pool, allocated_emitters); + expression->set_reg_info({in_physical_regs, out_physical_regs}); + if (auto container = std::dynamic_pointer_cast(expression->get_emitter())) + container->map_abstract_registers(gpr_map_pool, vec_map_pool, expressions); } } @@ -109,7 +97,7 @@ KernelEmitter::KernelEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl: jit_container_emitter(h, isa, n), reg_indexes_idx(abi_param1.getIdx()), reg_const_params_idx(abi_param2.getIdx()) { - const auto kernel = ov::as_type_ptr(n); + const auto kernel = ov::as_type_ptr(n); if (!kernel) IE_THROW() << "KernelEmitter invoked with invalid op argument"; if (kernel->region.empty()) @@ -118,48 +106,33 @@ KernelEmitter::KernelEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl: IE_THROW() << "KernelEmitter invoked with op::Kernel that contains no compile_params"; body = kernel->region; jcp = *reinterpret_cast(kernel->compile_params); - // calc data access pattern. we'll need it for offsets calculation - const auto& model = kernel->model; - const auto get_data_layout = [](const Output& out, std::vector& shape) { - const auto& layout = ngraph::snippets::utils::get_node_output_layout(out.get_node_shared_ptr()); - // default access pattern - if (!layout.empty()) { - const auto layout_shape_diff = static_cast(shape.size()) - static_cast(layout.size()); - // Plugin can (and usually does) prepend shapes with 1's to facilitate scheduling, here we can safely remove leading 1's - if (layout_shape_diff > 0) { - if (std::any_of(shape.begin(), shape.begin() + layout_shape_diff, [](size_t x){return x != 1;})) - IE_THROW() << "KernelEmitter detected shape vs access pattern conflict: only leading 1's can be removed from the shape"; - shape.erase(shape.begin(), shape.begin() + layout_shape_diff); + const auto& io_exprs = body.get_IO_ops(); + num_inputs = 0; + num_outputs = 0; + for (const auto& expr : io_exprs) { + snippets::lowered::PortDescriptorPtr desc = nullptr; + element::Type etype; + switch (expr->get_type()) { + case snippets::lowered::IOExpression::io_type::INPUT: { + desc = expr->get_output_port_descriptor(0); + etype = expr->get_node()->get_output_element_type(0); + num_inputs++; + break; + } + case snippets::lowered::IOExpression::io_type::OUTPUT: { + num_outputs++; + desc = expr->get_input_port_descriptor(0); + etype = expr->get_node()->get_input_element_type(0); + break; + } default : { + IE_THROW() << "Kernel detected unsupported io_type"; } } - return layout; - }; - const auto& ops = model->get_ordered_ops(); - auto params = model->get_parameters(); - auto results = model->get_results(); - num_inputs = params.size(); - num_outputs = results.size(); - is_buffer_needed = std::any_of(ops.begin(), ops.end(), - [](const std::shared_ptr& node) { return ov::is_type(node); } ); - NodeVector io_nodes; - std::copy(params.begin(), params.end(), std::back_inserter(io_nodes)); - std::copy(results.begin(), results.end(), std::back_inserter(io_nodes)); - - const auto& model_rt_info = model->get_rt_info(); - const auto& plugin_shapes = model_rt_info.find("PluginShapesOverride"); - if (plugin_shapes == model_rt_info.end()) { - IE_THROW() << "JIT KernelEmitter requires plugin-overriden shapes in model rt_info"; - } else { - const auto& new_shapes = plugin_shapes->second.as>>(); - if (new_shapes.size() != num_inputs + num_outputs) - IE_THROW() << "JIT KernelEmitter detected invalid plugin-overriden shapes"; - io_shapes = new_shapes; - } - for (int i = 0; i < io_nodes.size(); i++) { - const auto& out = i < num_inputs ? io_nodes[i]->output(0) : io_nodes[i]->input_value(0); - data_layout.push_back(get_data_layout(out, io_shapes[i])); - io_data_size.push_back(out.get_element_type().size()); + io_shapes.push_back(desc->get_shape()); + io_data_layouts.push_back(desc->get_layout()); + io_data_sizes.push_back(etype.size()); } + // Initialize pools of gp and vec registers gp_regs_pool.resize(16); vec_regs_pool.resize(16); @@ -180,28 +153,37 @@ KernelEmitter::KernelEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl: mapping_info gpr_map_pool({}, gp_regs_pool); mapping_info vec_map_pool({}, vec_regs_pool); - std::vector data_io_emitters; - std::copy_if(body.begin(), body.end(), std::back_inserter(data_io_emitters), - [](const AllocatedEmitter& code){ - const auto& emitter = code.first; - const auto emitter_type = std::dynamic_pointer_cast(emitter)->get_in_out_type(); - // todo: how this will be handled if Brgemm in & out are op::Buffer - // Brgemm is a special case since it incorporates input and output (we use onednn kernel) - // Just like Load & Store it requires offsets calculation - const auto is_brgemm = std::dynamic_pointer_cast(emitter) || - std::dynamic_pointer_cast(emitter); - return emitter_type == gpr_to_vec || emitter_type == vec_to_gpr || is_brgemm; - }); + snippets::lowered::LinearIR::container mem_access_exprs; + snippets::lowered::LinearIR::container general_exprs; + std::set unique_buffers; + + for (const auto& expr : body) { + // Brgemm is a special case since it incorporates input and output (we use onednn kernel) + // Just like Load & Store it requires offsets calculation + if (std::dynamic_pointer_cast(expr)) { + mem_access_exprs.emplace_back(expr); + } else if (const auto buffer = ov::as_type_ptr(expr->get_node())) { + const auto buffer_id = buffer->get_id(); + if (unique_buffers.count(buffer_id) == 0) { + mem_access_exprs.push_back(expr); + unique_buffers.insert(buffer_id); + } + } else { + general_exprs.emplace_back(expr); + } + } + num_unique_buffer = unique_buffers.size(); + // Note that we can't use reg_indexes_idx or reg_const_params_idx to store data pointers because these two // regs are used to calculate offsets for the data pointers - map_abstract_registers(gpr_map_pool, vec_map_pool, data_io_emitters); + map_abstract_registers(gpr_map_pool, vec_map_pool, mem_access_exprs); for (const auto& abstract_to_physical : gpr_map_pool.first) data_ptr_regs_idx.push_back(abstract_to_physical.second); // However we can use reg_indexes_idx and reg_const_params_idx for other operations since we won't need them // after offsets calculation gpr_map_pool.second.push_back(reg_indexes_idx); gpr_map_pool.second.push_back(reg_const_params_idx); - map_abstract_registers(gpr_map_pool, vec_map_pool, body); + map_abstract_registers(gpr_map_pool, vec_map_pool, general_exprs); } void KernelEmitter::emit_code(const std::vector &in, @@ -216,21 +198,19 @@ void KernelEmitter::validate_arguments(const std::vector &in, IE_THROW() << "KernelEmitter got invalid number of inputs. Expected 0, got " << in.size(); if (!out.empty()) IE_THROW() << "KernelEmitter got invalid number of outputs. Expected 0, got " << out.size(); - const auto num_params = num_inputs + num_outputs + static_cast(is_buffer_needed); + const auto num_params = num_inputs + num_outputs + num_unique_buffer; // The number of used gpr may be >= num_params since LoopBegin+LoopEnd could also use gpr to store work_amount if (data_ptr_regs_idx.size() != num_params) IE_THROW() << "KernelEmitter: number of inputs and outputs is inconsisnent with the number of allocated registers" << num_params << " data_ptr_regs_idx.size() = " << data_ptr_regs_idx.size(); } -void KernelEmitter::init_data_pointers(size_t num_inputs, size_t num_params, bool is_buffer_needed, +void KernelEmitter::init_data_pointers(size_t num_inputs, size_t num_params, size_t num_buffer, const Reg64& reg_indexes, const Reg64& reg_const_params, const std::vector& data_ptr_regs) const { // Note that we don't need offset for the last dim, since it's handled directly by Tile emitter const size_t offset_rank = jcp.master_shape.size() - 1; - //const size_t tile_rank = jcp.tile_rank; std::vector> data_offsets(num_params, std::vector{}); - auto offset_calculation = [=](const std::vector& shape, - const std::vector& layout, const size_t data_size) { + auto offset_calculation = [=](const std::vector& shape, const std::vector& layout, const size_t data_size) { // Strides represent distance between consecutive elements of corresponding dimension. // If a dim size == 1, then the next dim starts immediately and the stride is 0 // case 1: @@ -265,7 +245,7 @@ void KernelEmitter::init_data_pointers(size_t num_inputs, size_t num_params, boo return strides; }; for (size_t i = 0; i < num_params; i++) { - data_offsets[i] = offset_calculation(io_shapes[i], data_layout[i], io_data_size[i]); + data_offsets[i] = offset_calculation(io_shapes[i], io_data_layouts[i], io_data_sizes[i]); } // master_shape size must be valid in both static and dynamic cases std::function&, Reg64)> init_ptr_with_offset; @@ -287,8 +267,8 @@ void KernelEmitter::init_data_pointers(size_t num_inputs, size_t num_params, boo // Vector "data_ptr_regs" is sorted by abstract regs. // It means that the vector contains the physical registers in order [src, .., src, dst, .., dst, buffer] // So we can initialize buffer register firstly as last value of vector "data_ptr_regs" - if (is_buffer_needed) { - h->mov(data_ptr_regs[num_params], h->ptr[reg_const_params + GET_OFF(buffer_scratchpad_ptr)]); + for (size_t i = 0; i < num_buffer; ++i) { + h->mov(data_ptr_regs[num_params + i], h->ptr[reg_const_params + GET_OFF(buffer_scratchpad_ptr)]); } size_t i = 0; for (; i < num_params - last_iter_explicitly; i++) { @@ -319,11 +299,11 @@ void KernelEmitter::emit_impl(const std::vector& in, std::vector data_ptr_regs; transform_idxs_to_regs(data_ptr_regs_idx, data_ptr_regs); - init_data_pointers(num_inputs, num_inputs + num_outputs, is_buffer_needed, reg_indexes, reg_const_params, data_ptr_regs); - for (const auto& c : body) { - const auto& emitter = c.first; + init_data_pointers(num_inputs, num_inputs + num_outputs, num_unique_buffer, reg_indexes, reg_const_params, data_ptr_regs); + for (const auto& expression : body) { + const auto& emitter = expression->get_emitter(); std::vector in_regs, out_regs; - std::tie(in_regs, out_regs) = c.second; + std::tie(in_regs, out_regs) = expression->get_reg_info(); emitter->emit_code(in_regs, out_regs, vec_regs_pool, gp_regs_pool); } h->postamble(); @@ -332,19 +312,18 @@ void KernelEmitter::emit_impl(const std::vector& in, LoopBeginEmitter::LoopBeginEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n) : jit_emitter(h, isa, n) { - loop_begin = ov::as_type_ptr(n); + loop_begin = ov::as_type_ptr(n); if (!loop_begin) IE_THROW() << "LoopBeginEmitter invoked with invalid op argument"; const auto& target_inputs = loop_begin->output(loop_begin->get_output_size() - 1).get_target_inputs(); // todo: this check could be excessive, since we check for it in validate_and_infer_types() if (target_inputs.size() != 1) IE_THROW() << "LoopBeginEmitter invoked with invalid configuration: the last output must have exactly one input attached"; - const auto loop_end = ov::as_type_ptr(target_inputs.begin()->get_node()->shared_from_this()); + const auto loop_end = ov::as_type_ptr(target_inputs.begin()->get_node()->shared_from_this()); if (!loop_end) IE_THROW() << "LoopBeginEmitter invoked with invalid configuration: the last output must be LoopEnd"; - work_amount = loop_begin->get_work_amount(); - evaluate_once = loop_begin->get_evaluate_once(); - num_inputs = loop_begin->get_input_size(); + work_amount = loop_end->get_work_amount(); + evaluate_once = loop_end->get_evaluate_once(); in_out_type_ = emitter_in_out_map::gpr_to_gpr; } @@ -355,17 +334,17 @@ void LoopBeginEmitter::emit_code(const std::vector &in, } void LoopBeginEmitter::validate_arguments(const std::vector &in, - const std::vector &out) const { - if (in.size() != num_inputs) - IE_THROW() << "Invalid inputs size: expected " << num_inputs << " got " << in.size(); - if (out.size() != num_inputs + 1) - IE_THROW() << "Invalid outputs size: expected " << num_inputs + 1 << " got " << out.size(); + const std::vector &out) const { + if (!in.empty()) + IE_THROW() << "Invalid inputs size: expected 0 got " << in.size(); + if (out.size() != 1) + IE_THROW() << "Invalid outputs size: expected 1 got " << out.size(); } void LoopBeginEmitter::emit_impl(const std::vector& in, const std::vector& out) const { // todo: In dynamic case we will also need to set broadcasting info here - Reg64 reg_work_amount = Reg64(out.back()); + Reg64 reg_work_amount = Reg64(static_cast(out.back())); Label for_body; // save previous register state (if there is an outer loop that uses this reg for example) if (!evaluate_once) { @@ -375,12 +354,11 @@ void LoopBeginEmitter::emit_impl(const std::vector& in, // or ready(), but they both set internal flags and that's not a desired way to use them. // So the most obvious WA is just to use current address manually loop_begin->begin_address = h->getCurr(); - loop_begin->input_regs = in; } LoopEndEmitter::LoopEndEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n) : jit_emitter(h, isa, n) { - loop_end = ov::as_type_ptr(n); + loop_end = ov::as_type_ptr(n); if (!loop_end) IE_THROW() << "LoopEndEmitter invoked with invalid op argument"; loop_begin = loop_end->get_loop_begin(); @@ -388,17 +366,14 @@ LoopEndEmitter::LoopEndEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::imp if (!loop_begin) IE_THROW() << "LoopEndEmitter invoked with invalid configuration: the last arg must be LoopBegin"; // Note that 1 edge connects LoopBegin and LoopEnd - num_inputs = loop_begin->get_input_size(); - num_outputs = loop_end->get_output_size(); - wa_increment = loop_end->get_increment(); - work_amount = loop_end->get_work_amount(); + num_inputs = loop_end->get_input_num(); + num_outputs = loop_end->get_output_num(); + wa_increment = static_cast(loop_end->get_increment()); + work_amount = static_cast(loop_end->get_work_amount()); ptr_increments = loop_end->get_ptr_increments(); finalization_offsets = loop_end->get_finalization_offsets(); evaluate_once = loop_end->get_evaluate_once(); - for (int i = 0; i < num_inputs; i++) - io_data_size.push_back(static_cast(loop_begin->get_input_element_type(i).size())); - for (int i = 0; i < num_outputs; i++) - io_data_size.push_back(static_cast(loop_end->get_output_element_type(i).size())); + io_data_size = loop_end->get_element_type_sizes(); in_out_type_ = emitter_in_out_map::gpr_to_gpr; } @@ -411,31 +386,30 @@ void LoopEndEmitter::emit_code(const std::vector &in, void LoopEndEmitter::validate_arguments(const std::vector &in, const std::vector &out) const { - if (loop_begin->input_regs.size() != num_inputs) - IE_THROW() << "Invalid loop_begin->input_regs size: expected " << num_inputs << " got " << loop_begin->input_regs.size(); if (out.size() != num_outputs) IE_THROW() << "Invalid number of out arguments: expected " << num_outputs << " got " << out.size(); - if (in.size() != num_outputs + 1) - IE_THROW() << "Invalid number of in arguments: expected " << num_inputs + 1 << " got " << in.size(); - const auto io_size = num_inputs + num_outputs; + if (in.size() != num_inputs) + IE_THROW() << "Invalid number of in arguments: expected " << num_inputs << " got " << in.size(); + const auto io_size = num_inputs - 1; if (ptr_increments.size() != io_size) - IE_THROW() << "Invalid apply_increments size: expected " << io_size << " got " << ptr_increments.size(); + IE_THROW() << "Invalid ptr_increments size: expected " << io_size << " got " << ptr_increments.size(); if (finalization_offsets.size() != io_size) IE_THROW() << "Invalid finalization_offsets size: expected: " << io_size << " got " << finalization_offsets.size(); } void LoopEndEmitter::emit_impl(const std::vector& in, const std::vector& out) const { - std::vector data_ptr_reg_idxs(loop_begin->input_regs); - data_ptr_reg_idxs.reserve(num_inputs + num_outputs); - std::copy(out.begin(), out.end(), std::back_inserter(data_ptr_reg_idxs)); + std::vector data_ptr_reg_idxs; + // the last input is actually a work_amount reg + data_ptr_reg_idxs.reserve(num_inputs - 1); + std::copy(in.begin(), in.end() - 1, std::back_inserter(data_ptr_reg_idxs)); std::vector data_ptr_regs; transform_idxs_to_regs(data_ptr_reg_idxs, data_ptr_regs); Reg64 reg_work_amount = Reg64(in.back()); if (!evaluate_once) { for (int idx = 0; idx < data_ptr_regs.size(); idx++) { if (ptr_increments[idx] != 0) - h->add(data_ptr_regs[idx], ptr_increments[idx] * io_data_size[idx]); + h->add(data_ptr_regs[idx], ptr_increments[idx] * wa_increment * io_data_size[idx]); } h->sub(reg_work_amount, wa_increment); h->cmp(reg_work_amount, wa_increment); @@ -448,6 +422,16 @@ void LoopEndEmitter::emit_impl(const std::vector& in, } } +ParameterEmitter::ParameterEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, + const std::shared_ptr& n) : NopEmitter(h, isa, n) { + in_out_type_ = emitter_in_out_map::gpr_to_gpr; +} + +ResultEmitter::ResultEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, + const std::shared_ptr& n) : NopEmitter(h, isa, n) { + in_out_type_ = emitter_in_out_map::gpr_to_gpr; +} + BroadcastMoveEmitter::BroadcastMoveEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n) : jit_emitter(h, isa, n) { if (n->get_input_element_type(0) != n->get_output_element_type(0)) @@ -537,7 +521,7 @@ StoreEmitter::StoreEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::c if (src_prc != dst_prc) IE_THROW() << "StoreEmitter supports only equal input and output types but gets: " << src_prc.name() << " and " << dst_prc.name(); - const auto store = ov::as_type_ptr(n); + const auto store = ov::as_type_ptr(n); count = store->get_count(); byte_offset = store->get_offset(); in_out_type_ = emitter_in_out_map::vec_to_gpr; @@ -573,7 +557,7 @@ LoadEmitter::LoadEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu if (src_prc != dst_prc) IE_THROW() << "LoadEmitter supports only equal input and output types but gets: " << src_prc.name() << " and " << dst_prc.name(); - const auto load = std::dynamic_pointer_cast(n); + const auto load = std::dynamic_pointer_cast(n); count = load->get_count(); byte_offset = load->get_offset(); in_out_type_ = emitter_in_out_map::gpr_to_vec; @@ -609,7 +593,7 @@ BroadcastLoadEmitter::BroadcastLoadEmitter(dnnl::impl::cpu::x64::jit_generator* if (src_prc != dst_prc) IE_THROW() << "BroadcastEmitters support only equal input and output types but gets: " << src_prc.name() << " and " << dst_prc.name(); - const auto broadcast_load = std::dynamic_pointer_cast(n); + const auto broadcast_load = std::dynamic_pointer_cast(n); byte_offset = broadcast_load->get_offset(); in_out_type_ = emitter_in_out_map::gpr_to_vec; } @@ -646,7 +630,7 @@ void BroadcastLoadEmitter::emit_isa(const std::vector &in, const std::ve LoadConvertEmitter::LoadConvertEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n) : MemoryEmitter(h, isa, n) { - const auto load = ov::as_type_ptr(n); + const auto load = ov::as_type_ptr(n); count = load->get_count(); byte_offset = load->get_offset(); in_out_type_ = emitter_in_out_map::gpr_to_vec; @@ -679,7 +663,7 @@ void LoadConvertEmitter::emit_data() const { StoreConvertEmitter::StoreConvertEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n) : MemoryEmitter(h, isa, n) { - const auto store = ov::as_type_ptr(n); + const auto store = ov::as_type_ptr(n); count = store->get_count(); byte_offset = store->get_offset(); in_out_type_ = emitter_in_out_map::vec_to_gpr; @@ -724,14 +708,11 @@ BrgemmEmitter::BrgemmEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl: if (brgemm_node->is_dynamic()) IE_THROW() << "Snippets don't support code generation for dynamic Brgemm"; const auto brgemm_copy = brgemm_node->is_with_data_repacking() ? brgemm_node->get_brgemm_copy() : nullptr; - const OutputVector io_values {brgemm_node->input_value(0), - brgemm_copy ? brgemm_copy->input_value(0) : brgemm_node->input_value(1), - brgemm_node->output(0)}; + std::vector leading_dimensions; std::vector> io_layouts; - for (const auto& val : io_values) { - const auto& layout = ngraph::snippets::utils::get_node_output_layout(val.get_node_shared_ptr()); - const auto& io_shape = val.get_shape(); + + auto init_scheduling_params = [&](const std::vector& layout, const ov::Shape& io_shape) { if (layout.empty()) { // empty value indicates a planar layout leading_dimensions.push_back(io_shape.back()); @@ -744,17 +725,25 @@ BrgemmEmitter::BrgemmEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl: // counting from the end since shape could be prepended with ones const int64_t num_last_dims = layout.end() - std::find(layout.begin(), layout.end(), layout.size() - 2) - 1; if (layout.back() != layout.size() - 1 || num_last_dims < 1) - IE_THROW() << "BrgemmEmitter detected invalid layout values: " << - "check that this shape + layout combination is schedulable"; + IE_THROW() << "BrgemmEmitter detected invalid layout values: check that this shape + layout combination is schedulable"; leading_dimensions.emplace_back( std::accumulate(io_shape.end() - num_last_dims, io_shape.end(), 1, std::multiplies())); io_layouts.push_back(layout); } + }; + + std::vector> brgemm_inputs = {brgemm_node->input(0), + brgemm_copy ? brgemm_copy->input(0) : brgemm_node->input(1)}; + for (const auto& input : brgemm_inputs) { + init_scheduling_params(snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(input)->get_layout(), + input.get_shape()); } + init_scheduling_params(snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(brgemm_node->output(0))->get_layout(), + brgemm_node->output(0).get_shape()); - const auto& A_shape = io_values[0].get_shape(); + const auto& A_shape = brgemm_node->get_input_shape(0); const auto& A_layout = io_layouts[0]; - const auto& C_shape = io_values[2].get_shape(); + const auto& C_shape = brgemm_node->get_output_shape(0); const auto& C_layout = io_layouts[2]; // We need find original M,N,K having layouts and ordered shapes @@ -1106,7 +1095,7 @@ BrgemmCopyBEmitter::BrgemmCopyBEmitter(dnnl::impl::cpu::x64::jit_generator* h, d if (m_with_comp) m_comp_offset = brgemm_repack->get_offset_compensations(); - auto layout = ngraph::snippets::utils::get_node_output_layout(brgemm_repack->get_input_node_shared_ptr(0)); + const auto& layout = snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(brgemm_repack->input(0))->get_layout(); const auto& original_shape = brgemm_repack->get_input_shape(0); auto transposed_shape = original_shape; size_t leading_dimension = *(original_shape.rbegin()); @@ -1452,7 +1441,7 @@ void VectorBufferEmitter::emit_isa(const std::vector &in, const std::vec FillEmitter::FillEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n) : jit_emitter(h, isa, n, Precision::FP32, emitter_in_out_map::vec_to_vec) { - const auto fill = ov::as_type_ptr(n); + const auto fill = ov::as_type_ptr(n); if (fill->get_element_type().size() != 4) { IE_THROW() << "Fill emitter supports only 4 Byte element types but gets: " << fill->get_element_type(); } diff --git a/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.hpp b/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.hpp index 4fc69a1d731ed7..f41340a9223fc7 100644 --- a/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.hpp +++ b/src/plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.hpp @@ -4,9 +4,11 @@ #pragma once -#include +#include #include +#include "snippets/lowered/linear_ir.hpp" + #include "jit_emitter.hpp" #include "jit_load_store_emitters.hpp" @@ -50,8 +52,8 @@ class jit_container_emitter: public jit_emitter { // maps gpr and vec abstract registers to physical ones. Physical reg indexes are taken from the provided pools // (the first 2 args). All the used gpr and vec registers are also stored in the provided sets (the second 2 args). void map_abstract_registers(mapping_info& gpr_map_pool, mapping_info& vec_map_pool, - std::vector& allocated_emitters) const; - std::vector body; + snippets::lowered::LinearIR::container& expressions) const; + snippets::lowered::LinearIR body; }; /// /// \brief Kernel is the only entry point to Codogen Jit compilation. Kernel perform abstract-to-physical register @@ -85,19 +87,19 @@ class KernelEmitter : public jit_container_emitter { const std::vector &out) const override; void emit_impl(const std::vector& in, const std::vector& out) const override; - void init_data_pointers(size_t, size_t, bool, const Xbyak::Reg64&, const Xbyak::Reg64&, const std::vector&) const; + void init_data_pointers(size_t, size_t, size_t, const Xbyak::Reg64&, const Xbyak::Reg64&, const std::vector&) const; jit_snippets_compile_args jcp; std::vector gp_regs_pool; size_t num_inputs; size_t num_outputs; - bool is_buffer_needed; + size_t num_unique_buffer; // Vector of indices (lenght = input tensor rank) per every input and output that describes in which order // corresponding tensor dimensions are accessed (default: consecutive dense, e.g. 0,1,2,3 for 4D tensor). // Needed to calc i/o offsets. - std::vector> data_layout; + std::vector> io_data_layouts; std::vector> io_shapes = {}; - std::vector io_data_size {}; + std::vector io_data_sizes {}; // gpr's used to store data pointers, track them to apply offsets in Kernel std::vector data_ptr_regs_idx; @@ -122,8 +124,7 @@ class LoopBeginEmitter : public jit_emitter { void emit_impl(const std::vector& in, const std::vector& out) const override; - std::shared_ptr loop_begin; - size_t num_inputs = 0; + std::shared_ptr loop_begin; bool evaluate_once = false; size_t work_amount = 0; // need to store work_amount explicitly, since two loops can work on the same dim (e.g. vector + scalar) }; @@ -144,25 +145,25 @@ class LoopEndEmitter : public jit_emitter { void emit_impl(const std::vector& in, const std::vector& out) const override; - std::shared_ptr loop_begin; - std::shared_ptr loop_end; + std::shared_ptr loop_begin; + std::shared_ptr loop_end; size_t num_inputs = 0; size_t num_outputs = 0; // keep data_size int64_t to avoid conversion to size_t (and overflow) when multiplied by negative increments or offsets std::vector io_data_size {}; - size_t wa_increment = 0; - size_t work_amount = 0; + int64_t wa_increment = 0; + int64_t work_amount = 0; bool evaluate_once = false; std::vector ptr_increments; std::vector finalization_offsets; }; - class NopEmitter : public jit_emitter { public: NopEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n) : jit_emitter(h, isa, n) { + in_out_type_ = emitter_in_out_map::gpr_to_gpr; } size_t get_inputs_num() const override {return 0;} @@ -173,6 +174,20 @@ class NopEmitter : public jit_emitter { } }; +class ParameterEmitter : public NopEmitter { +public: + ParameterEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, + const std::shared_ptr& n); + + size_t get_inputs_num() const override { return 0; } +}; + +class ResultEmitter : public NopEmitter { +public: + ResultEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n); + size_t get_inputs_num() const override {return 1;} +}; + class BroadcastMoveEmitter : public jit_emitter { public: BroadcastMoveEmitter(dnnl::impl::cpu::x64::jit_generator* h, dnnl::impl::cpu::x64::cpu_isa_t isa, const std::shared_ptr& n); diff --git a/src/plugins/intel_cpu/src/extension.cpp b/src/plugins/intel_cpu/src/extension.cpp index 78096952240480..71c84045e12848 100644 --- a/src/plugins/intel_cpu/src/extension.cpp +++ b/src/plugins/intel_cpu/src/extension.cpp @@ -137,26 +137,26 @@ std::map Extension::getOpSets() { ngraph::OpSet opset; #define NGRAPH_OP(NAME, NAMESPACE) opset.insert(); - NGRAPH_OP(Brgemm, ngraph::snippets::op) - NGRAPH_OP(Buffer, ngraph::snippets::op) - NGRAPH_OP(BroadcastLoad, ngraph::snippets::op) - NGRAPH_OP(BroadcastMove, ngraph::snippets::op) - NGRAPH_OP(ConvertSaturation, ngraph::snippets::op) - NGRAPH_OP(ConvertTruncation, ngraph::snippets::op) - NGRAPH_OP(Fill, ngraph::snippets::op) - NGRAPH_OP(HorizonMax, ngraph::snippets::op) - NGRAPH_OP(HorizonSum, ngraph::snippets::op) - NGRAPH_OP(Kernel, ngraph::snippets::op) - NGRAPH_OP(Load, ngraph::snippets::op) - NGRAPH_OP(LoadReshape, ngraph::snippets::op) - NGRAPH_OP(LoopBegin, ngraph::snippets::op) - NGRAPH_OP(LoopEnd, ngraph::snippets::op) - NGRAPH_OP(Nop, ngraph::snippets::op) - NGRAPH_OP(PowerStatic, ngraph::snippets::op) - NGRAPH_OP(Scalar, ngraph::snippets::op) - NGRAPH_OP(Store, ngraph::snippets::op) - NGRAPH_OP(Subgraph, ngraph::snippets::op) - NGRAPH_OP(VectorBuffer, ngraph::snippets::op) + NGRAPH_OP(Brgemm, ov::snippets::op) + NGRAPH_OP(Buffer, ov::snippets::op) + NGRAPH_OP(BroadcastLoad, ov::snippets::op) + NGRAPH_OP(BroadcastMove, ov::snippets::op) + NGRAPH_OP(ConvertSaturation, ov::snippets::op) + NGRAPH_OP(ConvertTruncation, ov::snippets::op) + NGRAPH_OP(Fill, ov::snippets::op) + NGRAPH_OP(HorizonMax, ov::snippets::op) + NGRAPH_OP(HorizonSum, ov::snippets::op) + NGRAPH_OP(Kernel, ov::snippets::op) + NGRAPH_OP(Load, ov::snippets::op) + NGRAPH_OP(LoadReshape, ov::snippets::op) + NGRAPH_OP(LoopBegin, ov::snippets::op) + NGRAPH_OP(LoopEnd, ov::snippets::op) + NGRAPH_OP(Nop, ov::snippets::op) + NGRAPH_OP(PowerStatic, ov::snippets::op) + NGRAPH_OP(Scalar, ov::snippets::op) + NGRAPH_OP(Store, ov::snippets::op) + NGRAPH_OP(Subgraph, ov::snippets::op) + NGRAPH_OP(VectorBuffer, ov::snippets::op) NGRAPH_OP_X64(LoadConvertSaturation, ov::intel_cpu) NGRAPH_OP_X64(LoadConvertTruncation, ov::intel_cpu) NGRAPH_OP_X64(StoreConvertSaturation, ov::intel_cpu) diff --git a/src/plugins/intel_cpu/src/nodes/subgraph.cpp b/src/plugins/intel_cpu/src/nodes/subgraph.cpp index 1fee035c077ec2..d7ed25d06e4075 100644 --- a/src/plugins/intel_cpu/src/nodes/subgraph.cpp +++ b/src/plugins/intel_cpu/src/nodes/subgraph.cpp @@ -24,7 +24,7 @@ #include "snippets/pass/matmul_to_brgemm.hpp" #include "utils/cpu_utils.hpp" #include "emitters/x64/cpu_generator.hpp" -#include "transformations/snippets/x64/pass/fuse_load_store_and_convert.hpp" +#include "transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.hpp" #include "transformations/snippets/x64/pass/mul_add_to_fma.hpp" #include "transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.hpp" #include "transformations/snippets/x64/pass/remove_converts.hpp" @@ -74,25 +74,25 @@ class SnippetShapeInferFactory : public ShapeInferFactory { }; } // namespace -Snippet::Snippet(const std::shared_ptr& op, const GraphContext::CPtr context) +Snippet::Snippet(const std::shared_ptr& op, const GraphContext::CPtr context) : Node(op, context, SnippetShapeInferFactory(this)) { host_isa = dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core) ? dnnl::impl::cpu::x64::avx512_core : dnnl::impl::cpu::x64::avx2; - original_snippet = ov::as_type_ptr(op); + original_snippet = ov::as_type_ptr(op); if (!original_snippet) { IE_THROW(NotImplemented) << "Node is not an instance of snippets::op::Subgraph"; } } void Snippet::copy_snippet() { - ngraph::OutputVector subgraph_node_inputs; + ov::OutputVector subgraph_node_inputs; for (const auto &input : original_snippet->input_values()) { - auto new_input = std::make_shared(input.get_element_type(), input.get_partial_shape()); + auto new_input = std::make_shared(input.get_element_type(), input.get_partial_shape()); subgraph_node_inputs.push_back(new_input); } std::shared_ptr new_body = original_snippet->body_ptr()->clone(); - snippet = std::make_shared(subgraph_node_inputs, new_body); - ngraph::copy_runtime_info(original_snippet, snippet); + snippet = std::make_shared(subgraph_node_inputs, new_body); + ov::copy_runtime_info(original_snippet, snippet); snippet->set_friendly_name(original_snippet->get_friendly_name()); #if defined(OPENVINO_ARCH_X86_64) snippet->set_generator(std::make_shared(host_isa)); @@ -323,14 +323,14 @@ ov::PartialShape Snippet::canonicalizeBody() { // if blockDim == Shape::UNDEFINED_DIM, then it's a dynamic dimension, and we need to recreate a proper dynamic Dim for (const auto& d : blockedDesc->getBlockDims()) dims.emplace_back(d == Shape::UNDEFINED_DIM ? -1 : d); - ngraph::PartialShape shape(dims); - ngraph::AxisVector blocking(blockedDesc->getOrder()); - ngraph::element::Type precision = InferenceEngine::details::convertPrecision(blockedDesc->getPrecision()); - return ngraph::snippets::op::Subgraph::BlockedShape{shape, blocking, precision}; + ov::PartialShape shape(dims); + ov::AxisVector blocking(blockedDesc->getOrder()); + ov::element::Type precision = InferenceEngine::details::convertPrecision(blockedDesc->getPrecision()); + return snippets::op::Subgraph::BlockedShape{shape, blocking, precision}; }; inputShapeIsBlocked.resize(inputShapes.size(), false); masterShapeIsBlocked = false; - ngraph::snippets::op::Subgraph::BlockedShapeVector input_blocked_shapes; + snippets::op::Subgraph::BlockedShapeVector input_blocked_shapes; for (size_t i = 0; i < inputShapes.size(); i++) { auto blockedShape = edgeToBlockedShape(getParentEdgesAtPort(i)[0]); inputShapeIsBlocked[i] = std::get<0>(blockedShape).size() != std::get<1>(blockedShape).size(); @@ -339,7 +339,7 @@ ov::PartialShape Snippet::canonicalizeBody() { } outputShapeIsBlocked.resize(outputShapes.size(), false); - ngraph::snippets::op::Subgraph::BlockedShapeVector output_blocked_shapes; + ov::snippets::op::Subgraph::BlockedShapeVector output_blocked_shapes; for (size_t i = 0; i < outputShapes.size(); i++) { auto blockedShape = edgeToBlockedShape(getChildEdgesAtPort(i)[0]); outputShapeIsBlocked[i] = std::get<0>(blockedShape).size() != std::get<1>(blockedShape).size(); @@ -448,17 +448,21 @@ std::vector Snippet::shapeInfer() { void Snippet::prepareParams() { masterShape = getNormalizedDimsBySize(masterShape, tensorRank); - for (auto& pshape : normInputShapes) + std::vector original_input_shape_ranks; + for (auto& pshape : normInputShapes) { + original_input_shape_ranks.push_back(pshape.size()); pshape = getNormalizedDimsBySize(pshape, tensorRank); + } for (auto& pshape : normOutputShapes) pshape = getNormalizedDimsBySize(pshape, tensorRank); tileRank = 1; + bool dims_collapsed = false; fullWorkAmount = std::accumulate(masterShape.begin(), masterShape.end(), 1, std::multiplies()); if (snippet->has_domain_sensitive_ops()) { tileRank = 2; } else { - optimizeExecDomain(normInputShapes, normOutputShapes, masterShape, tileRank); + dims_collapsed = optimizeExecDomain(normInputShapes, normOutputShapes, masterShape, tileRank); } exec_domain = masterShape; @@ -495,10 +499,16 @@ void Snippet::prepareParams() { dim = 1; } - auto& body_rt_info = snippet->body_ptr()->get_rt_info(); - std::vector> new_shapes(normInputShapes); - std::copy(normOutputShapes.begin(), normOutputShapes.end(), std::back_inserter(new_shapes)); - body_rt_info["PluginShapesOverride"] = new_shapes; + if (dims_collapsed) { + std::vector new_shapes; + for (int i = 0; i < normInputShapes.size(); i++) { + const auto norm_shape = normInputShapes[i]; + size_t ndims_to_skip = norm_shape.size() - original_input_shape_ranks[i]; + new_shapes.emplace_back(norm_shape.begin() + ndims_to_skip, norm_shape.end()); + } + snippet->reshape_body(new_shapes); + } + snippet->set_master_shape(ov::PartialShape(masterShape)); snippet->set_tile_rank(tileRank); } @@ -543,7 +553,7 @@ void Snippet::generate(const jit_snippets_compile_args* jcp) { // enforce BF16 precisions to supported operations // MatMul has to be decomposed to Brgemm operations before enforcement // Note, MatMul decomposition will be ran later again for case if BF16 enforcement is not happened - CPU_REGISTER_PASS_X64(pre_dialect, ngraph::snippets::pass::MatMulToBrgemm); + CPU_REGISTER_PASS_X64(pre_dialect, ov::snippets::pass::MatMulToBrgemm); CPU_REGISTER_PASS_X64(pre_dialect, pass::EnforcePrecision, element::f32, element::bf16); } @@ -552,28 +562,16 @@ void Snippet::generate(const jit_snippets_compile_args* jcp) { ov::pass::Manager post_precision; CPU_REGISTER_PASS_X64(post_precision, ov::intel_cpu::pass::RemoveConverts); - CPU_REGISTER_PASS_X64(post_precision, ov::intel_cpu::pass::FuseLoadConvert); - CPU_REGISTER_PASS_X64(post_precision, ov::intel_cpu::pass::FuseStoreConvert); - // LoadConvert uses Load emitter that support conversion from any type to only f32 - post_precision.get_pass_config()->set_callback( - [](const std::shared_ptr& n) -> bool { - if (const auto& convert = std::dynamic_pointer_cast(n)) - return convert->get_destination_type() != ov::element::f32; - return true; - }); - // StoreConvert uses Store emitter that support conversion from only f32 to any types - post_precision.get_pass_config()->set_callback( - [](const std::shared_ptr& n) -> bool { - if (const auto& convert = std::dynamic_pointer_cast(n)) - return convert->get_input_element_type(0) != ov::element::f32; - return true; - }); CPU_REGISTER_PASS_X64(post_precision, ov::intel_cpu::pass::MulAddToFMA); + ov::snippets::lowered::pass::PassPipeline control_flow_pipeline; + CPU_REGISTER_PASS_X64(control_flow_pipeline, ov::intel_cpu::pass::FuseLoadStoreConvert); + schedule = snippet->generate( pre_dialect, post_dialect, post_precision, + control_flow_pipeline, reinterpret_cast(jcp)); } diff --git a/src/plugins/intel_cpu/src/nodes/subgraph.h b/src/plugins/intel_cpu/src/nodes/subgraph.h index f1afc1ea43805f..435b709b492f74 100644 --- a/src/plugins/intel_cpu/src/nodes/subgraph.h +++ b/src/plugins/intel_cpu/src/nodes/subgraph.h @@ -69,12 +69,12 @@ class Snippet : public Node { void schedule_nt(); // Original subgraph node - std::shared_ptr original_snippet; + std::shared_ptr original_snippet; // Local copy of subgraph node for canonization & code generation - std::shared_ptr snippet; + std::shared_ptr snippet; // Holds generated snippet with information about how to schedule it - ngraph::snippets::Schedule schedule; + snippets::Schedule schedule; // Holds ISA version used is codeGeneration target dnnl::impl::cpu::x64::cpu_isa_t host_isa; diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.cpp index 0e4004395e188a..07ff18b167c8f5 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.cpp @@ -9,21 +9,18 @@ #include "utils/general_utils.h" -using namespace std; using namespace ov; intel_cpu::BrgemmCopyB::BrgemmCopyB(const Output& x, const element::Type src_type, const Type type, - const size_t offset_in, const size_t offset_out0, const size_t offset_out1) - : ngraph::snippets::op::MemoryAccess({x}, 1, type == Type::WithCompensations ? 2 : 1), m_type(type), m_src_type(src_type) { - set_output_size(get_output_port_count()); - m_input_ports.resize(get_input_size()); - m_output_ports.resize(get_output_size()); + const size_t offset_in, const size_t offset_out0, const size_t offset_out1, std::vector layout_input) + : snippets::op::MemoryAccess({x}, 1, type == Type::WithCompensations ? 2 : 1), m_type(type), m_src_type(src_type) { + set_output_size(type == Type::WithCompensations ? 2 : 1); set_input_port_descriptor({0, offset_in}, 0); set_output_port_descriptor({0, offset_out0}, 0); if (is_with_compensations()) { set_output_port_descriptor({0, offset_out1}, 1); } - constructor_validate_and_infer_types(); + custom_constructor_validate_and_infer_types(std::move(layout_input)); } bool intel_cpu::BrgemmCopyB::visit_attributes(AttributeVisitor& visitor) { @@ -33,14 +30,27 @@ bool intel_cpu::BrgemmCopyB::visit_attributes(AttributeVisitor& visitor) { return true; } +void intel_cpu::BrgemmCopyB::custom_constructor_validate_and_infer_types(std::vector layout_input) { + INTERNAL_OP_SCOPE(BrgemmRepack_ctor_validate_and_infer_types); + // During ctor call, BrgemmCopyB doesn't know his port descriptors. + // So we use port descs from source inputs + const auto element_type = get_input_element_type(0); + const auto pshape = snippets::utils::get_reordered_planar_shape(get_input_partial_shape(0), layout_input); + validate(pshape, element_type); +} + void intel_cpu::BrgemmCopyB::validate_and_infer_types() { INTERNAL_OP_SCOPE(BrgemmRepack_validate_and_infer_types); const auto element_type = get_input_element_type(0); + const auto pshape = snippets::utils::get_port_planar_shape(input(0)); + validate(pshape, element_type); +} + +void intel_cpu::BrgemmCopyB::validate(const ov::PartialShape& pshape, const ov::element::Type& element_type) { NGRAPH_CHECK(one_of(element_type, element::bf16, element::i8), - "BrgemmCopyB doesn't support element type" + element_type.get_type_name()); + "BrgemmCopyB doesn't support element type" + element_type.get_type_name()); - const auto pshape = ngraph::snippets::utils::get_port_planar_shape(input_value(0)); if (pshape.is_dynamic()) { set_output_type(0, element_type, ov::PartialShape{ov::Dimension::dynamic()}); if (is_with_compensations()) { @@ -68,7 +78,8 @@ std::shared_ptr intel_cpu::BrgemmCopyB::clone_with_new_inputs(const Output return std::make_shared(new_args.at(0), m_src_type, m_type, get_offset_in(), get_offset_out(), - is_with_compensations() ? get_offset_compensations() : 0); + is_with_compensations() ? get_offset_compensations() : 0, + snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(input(0))->get_layout()); } size_t intel_cpu::BrgemmCopyB::get_offset_compensations() const { diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.hpp index d8db828b4a3e56..aa27296abf81b0 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_copy_b.hpp @@ -16,9 +16,9 @@ namespace intel_cpu { OneDNN requiers data repacking for second input of Brgemm with input non-fp32 precisions. * @ingroup snippets */ -class BrgemmCopyB : public ngraph::snippets::op::MemoryAccess { +class BrgemmCopyB : public snippets::op::MemoryAccess { public: - OPENVINO_OP("BrgemmCopyB", "SnippetsOpset", MemoryAccess); + OPENVINO_OP("BrgemmCopyB", "SnippetsOpset", snippets::op::MemoryAccess); enum Type { OnlyRepacking, // Just data repacking - one output @@ -26,7 +26,8 @@ class BrgemmCopyB : public ngraph::snippets::op::MemoryAccess { }; BrgemmCopyB(const Output& x, const element::Type src_type, const Type type = Type::OnlyRepacking, - const size_t offset_in = 0lu, const size_t offset_out0 = 0lu, const size_t offset_out1 = 0lu); + const size_t offset_in = 0lu, const size_t offset_out0 = 0lu, const size_t offset_out1 = 0lu, + std::vector layout_input = {}); BrgemmCopyB() = default; size_t get_offset_in() const { return get_input_offset(0); } @@ -43,6 +44,9 @@ class BrgemmCopyB : public ngraph::snippets::op::MemoryAccess { std::shared_ptr clone_with_new_inputs(const OutputVector& new_args) const override; private: + void custom_constructor_validate_and_infer_types(std::vector layout_input = {}); + void validate(const ov::PartialShape& pshape, const ov::element::Type& element_type); + Type m_type = Type::OnlyRepacking; element::Type m_src_type = ov::element::undefined; // src element type of the corresponding BRGEMM }; diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.cpp index 5f52a3ad9da98e..1a378616819293 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.cpp @@ -1,12 +1,11 @@ -// Copyright (C) 2018-2022 Intel Corporation +// Copyright (C) 2018-2023 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // -#include "snippets/itt.hpp" #include "brgemm_cpu.hpp" -#include "ngraph/runtime/host_tensor.hpp" -#include "openvino/core/rt_info.hpp" +#include "snippets/itt.hpp" #include "snippets/utils.hpp" +#include "snippets/lowered/port_descriptor.hpp" #include "utils/general_utils.h" @@ -14,56 +13,65 @@ namespace ov { namespace intel_cpu { BrgemmCPU::BrgemmCPU(const Output& A, const Output& B, const Type type, - const size_t offset_a, const size_t offset_b, const size_t offset_c) + const size_t offset_a, const size_t offset_b, const size_t offset_c, + std::vector layout_a, std::vector layout_b, std::vector layout_c) : Brgemm(), m_type(type) { // We call default ctor of Brgemm class to avoid incorrect shape infer in constructor_validate_and_type_infer() call set_arguments({A, B}); set_output_size(1); - m_input_ports.resize(get_input_size()); - m_output_ports.resize(get_output_size()); + ctor_initialize(std::set{0, 1}, std::set{0}); set_input_port_descriptor({0, offset_a}, 0); set_input_port_descriptor({0, offset_b}, 1); set_output_port_descriptor({0, offset_c}, 0); - constructor_validate_and_infer_types(); + custom_constructor_validate_and_infer_types(std::move(layout_a), std::move(layout_b), std::move(layout_c)); } BrgemmCPU::BrgemmCPU(const Output& A, const Output& B, const Output& scratch, const Type type, - const size_t offset_a, const size_t offset_b, const size_t offset_scratch, const size_t offset_c) + const size_t offset_a, const size_t offset_b, const size_t offset_scratch, const size_t offset_c, + std::vector layout_a, std::vector layout_b, std::vector layout_c) : Brgemm(), m_type(type) { set_arguments({A, B, scratch}); set_output_size(1); - m_input_ports.resize(get_input_size()); - m_output_ports.resize(get_output_size()); + ctor_initialize(std::set{0, 1, 2}, std::set{0}); set_input_port_descriptor({0, offset_a}, 0); set_input_port_descriptor({0, offset_b}, 1); set_output_port_descriptor({0, offset_c}, 0); set_input_port_descriptor({0, offset_scratch}, 2); - constructor_validate_and_infer_types(); + custom_constructor_validate_and_infer_types(std::move(layout_a), std::move(layout_b), std::move(layout_c)); +} + +void BrgemmCPU::custom_constructor_validate_and_infer_types(std::vector layout_a, std::vector layout_b, std::vector layout_c) { + INTERNAL_OP_SCOPE(BrgemmCPU_constructor_validate_and_infer_types); + validate_inputs(); + + // During ctor call, BrgemmCPU doesn't know his port descriptors. + // So we use port descs from source inputs + const auto brgemm_copy = is_with_data_repacking() ? get_brgemm_copy() : nullptr; + const auto planar_input_shapes = + std::vector{ snippets::utils::get_reordered_planar_shape(get_input_partial_shape(0), layout_a), + brgemm_copy ? snippets::utils::get_port_planar_shape(brgemm_copy->input(0)) + : snippets::utils::get_reordered_planar_shape(get_input_partial_shape(1), layout_b) }; + auto output_shape = get_output_partial_shape(planar_input_shapes); + set_output_type(0, get_output_type(), snippets::utils::get_reordered_planar_shape(output_shape, layout_c)); + + //Additional check for 3rd input + validate_with_scratchpad(planar_input_shapes[1].get_shape()); } void BrgemmCPU::validate_and_infer_types() { INTERNAL_OP_SCOPE(BrgemmCPU_validate_and_infer_types); - // If no leading dimensions are provided, assume dense row-major inputs-outputs - NODE_VALIDATION_CHECK(this, get_input_partial_shape(0).is_static() && get_input_partial_shape(1).is_static(), - "BrgemmCPU currently supports only static shapes."); - - OPENVINO_ASSERT(implication(one_of(m_type, Type::Floating, Type::WithDataRepacking), get_input_size() == 2), - "BrgemmCPU expects 2 inputs in cases, when input precisions are f32|f32, u8|i8 or bf16|bf16 (non-AMX system)"); - OPENVINO_ASSERT(implication(one_of(m_type, Type::WithCompensations, Type::AMX), get_input_size() == 3), - "BrgemmCPU expects 3 inputs with input precisions i8|i8 and bf16|bf16 on AMX system"); + validate_inputs(); const auto brgemm_copy = is_with_data_repacking() ? get_brgemm_copy() : nullptr; - std::vector planar_input_shapes = { - ngraph::snippets::utils::get_port_planar_shape(input_value(0)), - ngraph::snippets::utils::get_port_planar_shape(brgemm_copy ? brgemm_copy->input_value(0) : input_value(1)) - }; - + const auto planar_input_shapes = get_planar_input_shapes({input(0), brgemm_copy ? brgemm_copy->input(0) : input(1)}); auto output_shape = get_output_partial_shape(planar_input_shapes); - const auto& output_layout = ngraph::snippets::utils::get_node_output_layout(this); - set_output_type(0, - get_output_type(), - ngraph::snippets::utils::get_reordered_planar_shape(output_shape, output_layout)); + set_output_type(0, get_output_type(), get_planar_output_shape(output_shape)); + + //Additional check for 3rd input + validate_with_scratchpad(planar_input_shapes[1].get_shape()); +} +void BrgemmCPU::validate_with_scratchpad(const ov::Shape& shape_b) const { //Additional check for 3rd input if (one_of(m_type, Type::WithCompensations, Type::AMX)) { const auto shape = get_input_partial_shape(2); @@ -71,7 +79,6 @@ void BrgemmCPU::validate_and_infer_types() { const auto type = get_input_element_type(2); if (is_with_compensations()) { const auto element_type_b = get_input_element_type(0); - const auto shape_b = planar_input_shapes[1].get_shape(); const auto N = *shape_b.rbegin(); const auto N_blk = element_type_b == element::f32 ? N : element_type_b == element::bf16 ? 32 : 64; @@ -86,23 +93,39 @@ void BrgemmCPU::validate_and_infer_types() { } } +void BrgemmCPU::validate_inputs() const { + // If no leading dimensions are provided, assume dense row-major inputs-outputs + NODE_VALIDATION_CHECK(this, get_input_partial_shape(0).is_static() && get_input_partial_shape(1).is_static(), + "BrgemmCPU currently supports only static shapes."); + OPENVINO_ASSERT(implication(one_of(m_type, Type::Floating, Type::WithDataRepacking), get_input_size() == 2), + "BrgemmCPU expects 2 inputs in cases, when input precisions are f32|f32, u8|i8 or bf16|bf16 (non-AMX system)"); + OPENVINO_ASSERT(implication(one_of(m_type, Type::WithCompensations, Type::AMX), get_input_size() == 3), + "BrgemmCPU expects 3 inputs with input precisions i8|i8 and bf16|bf16 on AMX system"); +} + std::shared_ptr BrgemmCPU::clone_with_new_inputs(const OutputVector& new_args) const { INTERNAL_OP_SCOPE(BrgemmCPU_clone_with_new_inputs); check_new_args_count(this, new_args); std::shared_ptr new_node = nullptr; if (!is_with_scratchpad()) { new_node = std::make_shared(new_args.at(0), new_args.at(1), m_type, - get_offset_a(), get_offset_b(), get_offset_c()); + get_offset_a(), get_offset_b(), get_offset_c(), + snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(input(0))->get_layout(), + snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(input(1))->get_layout(), + snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(output(0))->get_layout()); } else { new_node = std::make_shared(new_args.at(0), new_args.at(1), new_args.at(2), m_type, - get_offset_a(), get_offset_b(), get_offset_scratch(), get_offset_c()); + get_offset_a(), get_offset_b(), get_offset_scratch(), get_offset_c(), + snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(input(0))->get_layout(), + snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(input(1))->get_layout(), + snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(output(0))->get_layout()); } return new_node; } std::shared_ptr BrgemmCPU::get_brgemm_copy() const { OPENVINO_ASSERT(one_of(m_type, Type::WithDataRepacking, Type::WithCompensations, Type::AMX), "Brgemm doesn't need BrgemmCopyB"); - if (const auto buffer = ov::as_type_ptr(get_input_node_shared_ptr(1))) { + if (const auto buffer = ov::as_type_ptr(get_input_node_shared_ptr(1))) { return ov::as_type_ptr(buffer->get_input_node_shared_ptr(0)); } OPENVINO_THROW("BrgemmCopyB hasn't been found!"); diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.hpp index 2081ca25c7528f..a5f11f1625b3ea 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/brgemm_cpu.hpp @@ -7,6 +7,8 @@ #include "snippets/op/brgemm.hpp" #include "brgemm_copy_b.hpp" +#include "snippets/lowered/port_descriptor.hpp" + namespace ov { namespace intel_cpu { @@ -16,9 +18,9 @@ namespace intel_cpu { * with support of several precisions on plugin level * @ingroup snippets */ -class BrgemmCPU : public ngraph::snippets::op::Brgemm { +class BrgemmCPU : public snippets::op::Brgemm { public: - OPENVINO_OP("BrgemmCPU", "SnippetsOpset", ngraph::snippets::op::Brgemm); + OPENVINO_OP("BrgemmCPU", "SnippetsOpset", snippets::op::Brgemm); enum Type { Floating, // f32|f32 @@ -28,9 +30,11 @@ class BrgemmCPU : public ngraph::snippets::op::Brgemm { }; BrgemmCPU(const Output& A, const Output& B, const Type type, - const size_t offset_a = 0, const size_t offset_b = 0, const size_t offset_c = 0); + const size_t offset_a = 0, const size_t offset_b = 0, const size_t offset_c = 0, + std::vector layout_a = {}, std::vector layout_b = {}, std::vector layout_c = {}); BrgemmCPU(const Output& A, const Output& B, const Output& scratch, const Type type, - const size_t offset_a = 0, const size_t offset_b = 0, const size_t offset_scratch = 0, const size_t offset_c = 0); + const size_t offset_a = 0, const size_t offset_b = 0, const size_t offset_scratch = 0, const size_t offset_c = 0, + std::vector layout_a = {}, std::vector layout_b = {}, std::vector layout_c = {}); BrgemmCPU() = default; void validate_and_infer_types() override; @@ -48,7 +52,11 @@ class BrgemmCPU : public ngraph::snippets::op::Brgemm { constexpr static size_t SCRATCH_BYTE_SIZE = 32 * 1024; private: - Type m_type = Type::Floating; + void custom_constructor_validate_and_infer_types(std::vector layout_a, std::vector layout_b, std::vector layout_c); + void validate_with_scratchpad(const ov::Shape& shape_b) const; + void validate_inputs() const; + + Type m_type = Type::Floating; }; } // namespace intel_cpu diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/fused_mul_add.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/fused_mul_add.cpp index 6b5a29d767e55d..64db193d7773f1 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/fused_mul_add.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/fused_mul_add.cpp @@ -6,8 +6,6 @@ #include "snippets/itt.hpp" #include "ngraph/op/util/elementwise_args.hpp" -#include -#include #include diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.cpp index a71181e8e2b666..08b8a1765bd6e6 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.cpp @@ -6,8 +6,6 @@ #include "load_convert.hpp" -#include "ngraph/runtime/host_tensor.hpp" - using namespace std; using namespace ov; @@ -19,7 +17,7 @@ intel_cpu::LoadConvertSaturation::LoadConvertSaturation(const Output& x, c bool intel_cpu::LoadConvertSaturation::visit_attributes(AttributeVisitor& visitor) { INTERNAL_OP_SCOPE(LoadConvert_visit_attributes); - MemoryAccess::visit_attributes(visitor); + Load::visit_attributes(visitor); visitor.on_attribute("destination_type", m_destination_type); return true; } diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.hpp index 9931a6f057d4d5..38892c0b6ade6b 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/load_convert.hpp @@ -4,7 +4,6 @@ #pragma once -#include "ngraph/op/op.hpp" #include "snippets/op/load.hpp" namespace ov { @@ -16,9 +15,9 @@ namespace intel_cpu { * The operation is used for peephole optimization during subgraph lowering. * @ingroup snippets */ -class LoadConvertSaturation : public ngraph::snippets::op::Load { +class LoadConvertSaturation : public snippets::op::Load { public: - OPENVINO_OP("LoadConvertSaturation", "SnippetsOpset", ngraph::snippets::op::Load); + OPENVINO_OP("LoadConvertSaturation", "SnippetsOpset", snippets::op::Load); LoadConvertSaturation(const Output& x, const ov::element::Type& destination_type, const size_t count = 1lu, const size_t offset = 0lu); LoadConvertSaturation() = default; @@ -43,9 +42,9 @@ class LoadConvertSaturation : public ngraph::snippets::op::Load { * The operation is used for peephole optimization during subgraph lowering. * @ingroup snippets */ -class LoadConvertTruncation : public ngraph::snippets::op::Load { +class LoadConvertTruncation : public snippets::op::Load { public: - OPENVINO_OP("LoadConvertTruncation", "SnippetsOpset", ngraph::snippets::op::Load); + OPENVINO_OP("LoadConvertTruncation", "SnippetsOpset", snippets::op::Load); LoadConvertTruncation(const Output& x, const ov::element::Type& destination_type, const size_t count = 1lu, const size_t offset = 0lu); LoadConvertTruncation() = default; diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.cpp index d7e1c9e4b0530c..4a1db1f2605122 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.cpp @@ -6,8 +6,6 @@ #include "store_convert.hpp" -#include "ngraph/runtime/host_tensor.hpp" - using namespace std; using namespace ov; @@ -19,7 +17,7 @@ intel_cpu::StoreConvertSaturation::StoreConvertSaturation(const Output& x, bool intel_cpu::StoreConvertSaturation::visit_attributes(AttributeVisitor& visitor) { INTERNAL_OP_SCOPE(StoreConvert_visit_attributes); - MemoryAccess::visit_attributes(visitor); + Store::visit_attributes(visitor); visitor.on_attribute("destination_type", m_destination_type); return true; } diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.hpp index ee6410682b871c..41ad5d8a45c408 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/op/store_convert.hpp @@ -4,7 +4,6 @@ #pragma once -#include "ngraph/op/op.hpp" #include "snippets/op/store.hpp" namespace ov { @@ -16,9 +15,9 @@ namespace intel_cpu { * The operation is used for peephole optimization during subgraph lowering. * @ingroup snippets */ -class StoreConvertSaturation : public ngraph::snippets::op::Store { +class StoreConvertSaturation : public snippets::op::Store { public: - OPENVINO_OP("StoreConvertSaturation", "SnippetsOpset", ngraph::snippets::op::Store); + OPENVINO_OP("StoreConvertSaturation", "SnippetsOpset", snippets::op::Store); StoreConvertSaturation(const Output& x, const ov::element::Type& destination_type, const size_t count = 1lu, const size_t offset = 0lu); StoreConvertSaturation() = default; @@ -43,9 +42,9 @@ class StoreConvertSaturation : public ngraph::snippets::op::Store { * The operation is used for peephole optimization during subgraph lowering. * @ingroup snippets */ -class StoreConvertTruncation : public ngraph::snippets::op::Store { +class StoreConvertTruncation : public snippets::op::Store { public: - OPENVINO_OP("StoreConvertTruncation", "SnippetsOpset", ngraph::snippets::op::Store); + OPENVINO_OP("StoreConvertTruncation", "SnippetsOpset", snippets::op::Store); StoreConvertTruncation(const Output& x, const ov::element::Type& destination_type, const size_t count = 1lu, const size_t offset = 0lu); StoreConvertTruncation() = default; diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.cpp index 345b8d8f2c92e1..0c492498af6ff3 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.cpp @@ -5,13 +5,15 @@ #include "snippets/itt.hpp" #include "brgemm_to_brgemm_cpu.hpp" -#include "snippets/snippets_isa.hpp" + #include "snippets/utils.hpp" +#include "snippets/op/brgemm.hpp" #include "transformations/snippets/x64/op/brgemm_copy_b.hpp" #include "transformations/snippets/x64/op/brgemm_cpu.hpp" -#include "ngraph/rt_info.hpp" -#include "ngraph/pattern/op/wrap_type.hpp" +#include "openvino/core/rt_info.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" +#include "openvino/pass/pattern/matcher.hpp" #include @@ -22,15 +24,32 @@ namespace ov { namespace intel_cpu { +using namespace snippets::lowered; + +namespace { +std::vector make_subtensor(const ov::Shape& tensor) { + return std::vector(std::min(tensor.size(), size_t(2)), PortDescriptor::ServiceDimensions::FULL_DIM); +} +template +void set_full_port_desc(const T& port) { + const auto& shape = port.get_shape(); + PortDescriptorUtils::set_port_descriptor_ptr(port, std::make_shared(shape, make_subtensor(shape))); +} +template +void set_port_desc(const T& port, Args... params) { + PortDescriptorUtils::set_port_descriptor_ptr(port, std::make_shared(params...)); +} +} // namespace + pass::BrgemmToBrgemmCPU::BrgemmToBrgemmCPU() { MATCHER_SCOPE(BrgemmToBrgemmCPU); - auto m_brgemm = ngraph::pattern::wrap_type(); + auto m_brgemm = ov::pass::pattern::wrap_type(); - auto callback = [=](ngraph::pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::BrgemmToBrgemmCPU") + auto callback = [=](ov::pass::pattern::Matcher& m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::BrgemmToBrgemmCPU") const auto node = m.get_match_root(); - const auto brgemm = ov::as_type_ptr(node); + const auto brgemm = ov::as_type_ptr(node); const auto brgemm_plugin = ov::as_type_ptr(node); if (!brgemm || brgemm_plugin) OPENVINO_THROW("BrgemmCPU cannot be in body before BrgemmToBrgemmCPU pass"); @@ -39,8 +58,12 @@ pass::BrgemmToBrgemmCPU::BrgemmToBrgemmCPU() { return false; } - const auto dimsMatMulIn0 = ngraph::snippets::utils::get_port_planar_shape(brgemm->input_value(0)).get_shape(); - const auto dimsMatMulIn1 = ngraph::snippets::utils::get_port_planar_shape(brgemm->input_value(1)).get_shape(); + const auto& brgemm_in0_desc = PortDescriptorUtils::get_port_descriptor_ptr(brgemm->input(0)); + const auto& brgemm_in1_desc = PortDescriptorUtils::get_port_descriptor_ptr(brgemm->input(1)); + const auto& brgemm_out_desc = PortDescriptorUtils::get_port_descriptor_ptr(brgemm->output(0)); + + const auto dimsMatMulIn0 = snippets::utils::get_port_planar_shape(brgemm->input_value(0)).get_shape(); + const auto dimsMatMulIn1 = snippets::utils::get_port_planar_shape(brgemm->input_value(1)).get_shape(); const auto K = *dimsMatMulIn0.rbegin(); const auto N = *dimsMatMulIn1.rbegin(); @@ -56,40 +79,68 @@ pass::BrgemmToBrgemmCPU::BrgemmToBrgemmCPU() { const auto offset_c = brgemm->get_offset_c(); std::shared_ptr brgemm_cpu = nullptr; + std::shared_ptr brgemm_repacking = nullptr; if (element_type_a == ov::element::f32) { brgemm_cpu = std::make_shared(brgemm->input_value(0), brgemm->input_value(1), BrgemmCPU::Type::Floating, - offset_a, offset_b, offset_c); + offset_a, offset_b, offset_c, + brgemm_in0_desc->get_layout(), brgemm_in1_desc->get_layout(), brgemm_out_desc->get_layout()); } else { - const auto layoutIn1 = ngraph::snippets::utils::get_node_output_layout(brgemm->input_value(1).get_node_shared_ptr()); const auto copy_b_type = with_comp ? BrgemmCopyB::WithCompensations : BrgemmCopyB::OnlyRepacking; - const auto brgemmRepackIn1 = std::make_shared(brgemm->input_value(1), element_type_a, copy_b_type, offset_b); - const auto buffer = std::make_shared(brgemmRepackIn1->output(0)); + brgemm_repacking = std::make_shared(brgemm->input_value(1), element_type_a, copy_b_type, offset_b, 0, 0, + brgemm_in1_desc->get_layout()); + const auto buffer = std::make_shared(brgemm_repacking->output(0)); + set_port_desc(brgemm_repacking->input(0), brgemm_in1_desc->get_shape(), brgemm_in1_desc->get_subtensor(), brgemm_in1_desc->get_layout()); + set_full_port_desc(brgemm_repacking->output(0)); + set_full_port_desc(buffer->input(0)); + set_full_port_desc(buffer->output(0)); if (with_amx) { - const auto scratch = std::make_shared(ov::Shape{BrgemmCPU::SCRATCH_BYTE_SIZE}); + const auto scratch = std::make_shared(ov::Shape{BrgemmCPU::SCRATCH_BYTE_SIZE}); brgemm_cpu = std::make_shared(brgemm->input_value(0), buffer, scratch, BrgemmCPU::Type::AMX, - offset_a, offset_b, offset_c); + offset_a, offset_b, 0, offset_c, + brgemm_in0_desc->get_layout(), std::vector{}, brgemm_out_desc->get_layout()); + set_full_port_desc(scratch->output(0)); + set_full_port_desc(brgemm_cpu->input(2)); } else if (with_comp) { - const auto scratch = std::make_shared(brgemmRepackIn1->output(1)); + const auto scratch = std::make_shared(brgemm_repacking->output(1)); brgemm_cpu = std::make_shared(brgemm->input_value(0), buffer, scratch, BrgemmCPU::Type::WithCompensations, - offset_a, offset_b, offset_c); + offset_a, offset_b, 0, offset_c, + brgemm_in0_desc->get_layout(), std::vector{}, brgemm_out_desc->get_layout()); + set_full_port_desc(brgemm_repacking->output(1)); + set_full_port_desc(scratch->input(0)); + set_full_port_desc(scratch->output(0)); + set_full_port_desc(brgemm_cpu->input(2)); } else if (one_of(element_type_a, ov::element::u8, ov::element::bf16)) { brgemm_cpu = std::make_shared(brgemm->input_value(0), buffer, BrgemmCPU::Type::WithDataRepacking, - offset_a, offset_b, offset_c); + offset_a, offset_b, offset_c, + brgemm_in0_desc->get_layout(), std::vector{}, brgemm_out_desc->get_layout()); } else { IE_THROW() << "Invalid configuration for BRGEMM CPU"; } } brgemm_cpu->set_friendly_name(brgemm->get_friendly_name()); - ngraph::snippets::utils::set_output_layout(brgemm_cpu->output(0), ngraph::snippets::utils::get_node_output_layout(brgemm)); - ngraph::copy_runtime_info(brgemm, brgemm_cpu); ngraph::replace_node(brgemm, brgemm_cpu); + // Transfer ports + set_port_desc(brgemm_cpu->input(0), brgemm_in0_desc->get_shape(), brgemm_in0_desc->get_subtensor(), brgemm_in0_desc->get_layout()); + if (brgemm_repacking) { + set_full_port_desc(brgemm_cpu->input(1)); + } else { + set_port_desc(brgemm_cpu->input(1), brgemm_in1_desc->get_shape(), brgemm_in1_desc->get_subtensor(), brgemm_in1_desc->get_layout()); + } + set_port_desc(brgemm_cpu->output(0), brgemm_out_desc->get_shape(), brgemm_out_desc->get_subtensor(), brgemm_out_desc->get_layout()); + + // need to run validate_and_infer_types manually: either input shapes were updated or + // output Layout was updated (out shape will be updated in validate_and_infer_types()) + if (brgemm_repacking) + brgemm_repacking->validate_and_infer_types(); + brgemm_cpu->validate_and_infer_types(); + return true; }; - auto m = std::make_shared(m_brgemm, matcher_name); + auto m = std::make_shared(m_brgemm, matcher_name); register_matcher(m, callback); } } // namespace intel_cpu diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.hpp index c400d2d0790035..c28db2c32a5375 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/brgemm_to_brgemm_cpu.hpp @@ -4,8 +4,7 @@ #pragma once -#include "ngraph/pass/graph_rewrite.hpp" -#include "ngraph/pattern/matcher.hpp" +#include "openvino/pass/graph_rewrite.hpp" namespace ov { namespace intel_cpu { @@ -33,7 +32,7 @@ namespace pass { * BrgemmCPU * @ingroup snippets */ -class BrgemmToBrgemmCPU: public ngraph::pass::MatcherPass { +class BrgemmToBrgemmCPU: public ov::pass::MatcherPass { public: OPENVINO_RTTI("BrgemmToBrgemmCPU", "0"); BrgemmToBrgemmCPU(); diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.cpp index d2d28defdc8e0e..064db31ed49bef 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.cpp @@ -4,22 +4,20 @@ #include "enforce_precision.hpp" -#include #include #include "ov_ops/type_relaxed.hpp" #include "snippets/itt.hpp" -#include "ngraph/rt_info.hpp" +#include "openvino/core/rt_info.hpp" #include "snippets/pass/propagate_precision.hpp" #include "cpu/x64/cpu_isa_traits.hpp" -using namespace ngraph; using namespace ov::intel_cpu::pass; EnforcePrecision::EnforcePrecision( - const element::Type source, - const element::Type target, - std::function>(const std::shared_ptr& op)> get_supported_precisions) : + const ov::element::Type source, + const ov::element::Type target, + std::function>(const std::shared_ptr& op)> get_supported_precisions) : source(source), target(target), get_supported_precisions(get_supported_precisions == nullptr ? get_supported_precisions_default : get_supported_precisions) { @@ -28,7 +26,7 @@ EnforcePrecision::EnforcePrecision( bool EnforcePrecision::run_on_model(const std::shared_ptr& f) { RUN_ON_MODEL_SCOPE(EnforcePrecision); - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::EnforcePrecision") + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::EnforcePrecision") bool was_updated = false; for (const auto& op : f->get_ordered_ops()) { @@ -82,16 +80,16 @@ bool EnforcePrecision::run_on_model(const std::shared_ptr& f) { const std::shared_ptr& op, const size_t input_index, const element::Type& target) { - auto convert = std::make_shared( + auto convert = std::make_shared( parent_output, target); - ngraph::copy_runtime_info(parent_output.get_node_shared_ptr(), convert); + ov::copy_runtime_info(parent_output.get_node_shared_ptr(), convert); op->set_argument(input_index, convert); }; for (auto index = 0ull; index < supported_precisions_to_enforce.size(); ++index) { if ((supported_precisions_to_enforce[index] == target) || (actual_precisions[index] == source)) { - const auto op_parent = ov::as_type_ptr(op->get_input_node_shared_ptr(index)); + const auto op_parent = ov::as_type_ptr(op->get_input_node_shared_ptr(index)); if ((op_parent != nullptr) && (op_parent->get_input_element_type(0) == target) && // we can remove existing convertion only if precisions before and after are appropriate for removal @@ -111,7 +109,7 @@ bool EnforcePrecision::run_on_model(const std::shared_ptr& f) { auto type_relaxed_node = std::dynamic_pointer_cast(op); if (was_updated || (type_relaxed_node != nullptr)) { - const bool res = ngraph::snippets::pass::PropagatePrecision::validate_and_infer_types_and_restore_outputs(op); + const bool res = snippets::pass::PropagatePrecision::validate_and_infer_types_and_restore_outputs(op); was_updated = was_updated || res; } } @@ -119,9 +117,9 @@ bool EnforcePrecision::run_on_model(const std::shared_ptr& f) { return was_updated; } -std::set> EnforcePrecision::get_supported_precisions_default( - const std::shared_ptr&op) noexcept { - if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core_bf16) && ov::is_type(op)) { +std::set> EnforcePrecision::get_supported_precisions_default( + const std::shared_ptr&op) noexcept { + if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core_bf16) && ov::is_type(op)) { return {{element::bf16, element::bf16}}; } return {}; diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.hpp index d32aad1e7f4edb..bbe4c686dfc374 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/enforce_precision.hpp @@ -4,15 +4,16 @@ #pragma once -#include -#include #include "snippets/generator.hpp" +#include "openvino/pass/graph_rewrite.hpp" + +#include namespace ov { namespace intel_cpu { namespace pass { -class EnforcePrecision: public ngraph::pass::FunctionPass { +class EnforcePrecision: public ov::pass::ModelPass { public: OPENVINO_RTTI("EnforcePrecision", "0"); diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/fuse_load_store_and_convert.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/fuse_load_store_and_convert.cpp deleted file mode 100644 index 3d66f78310bd14..00000000000000 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/fuse_load_store_and_convert.cpp +++ /dev/null @@ -1,113 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#include "snippets/itt.hpp" - -#include "fuse_load_store_and_convert.hpp" -#include "snippets/snippets_isa.hpp" - -#include "transformations/snippets/x64/op/load_convert.hpp" -#include "transformations/snippets/x64/op/store_convert.hpp" - -#include "ngraph/rt_info.hpp" -#include "ngraph/pattern/op/wrap_type.hpp" - -ov::intel_cpu::pass::FuseLoadConvert::FuseLoadConvert() { - MATCHER_SCOPE(FuseLoadConvert); - auto load_pattern = ngraph::pattern::wrap_type(); - auto convert_pattern = ngraph::pattern::wrap_type({load_pattern}); - - auto callback = [=](ngraph::pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::FuseLoadConvert") - auto& pm = m.get_pattern_value_map(); - const auto load_shared = pm.at(load_pattern).get_node_shared_ptr(); - if (!load_shared || load_shared->output(0).get_target_inputs().size() != 1) { - return false; - } - - const auto load = std::dynamic_pointer_cast(load_shared); - if (!load) - return false; - - const auto convert = pm.at(convert_pattern).get_node_shared_ptr(); - if (transformation_callback(convert)) - return false; - - std::shared_ptr load_convert = nullptr; - if (const auto convert_saturation = - std::dynamic_pointer_cast(convert)) { - load_convert = std::make_shared(load->input_value(0), - convert_saturation->get_destination_type(), - load->get_count(), load->get_offset()); - } else if (const auto convert_truncation = - std::dynamic_pointer_cast(convert)) { - load_convert = std::make_shared(load->input_value(0), - convert_truncation->get_destination_type(), - load->get_count(), load->get_offset()); - } else { - OPENVINO_THROW( - "Type of Convert op is undefined. Supports only fusing Load and ConvertTruncation or ConvertSaturation ops"); - } - - if (!load_convert) - return false; - - ngraph::copy_runtime_info(convert, load_convert); - ngraph::replace_node(convert, load_convert); - - return true; - }; - - auto m = std::make_shared(convert_pattern, matcher_name); - register_matcher(m, callback); -} - - -ov::intel_cpu::pass::FuseStoreConvert::FuseStoreConvert() { - MATCHER_SCOPE(FuseStoreConvert); - auto input_pattern = ngraph::pattern::any_input(); - auto convert_pattern = ngraph::pattern::wrap_type({input_pattern}); - auto store_pattern = ngraph::pattern::wrap_type({convert_pattern}); - - auto callback = [=](ngraph::pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::FuseStoreConvert") - auto& pm = m.get_pattern_value_map(); - const auto input = pm.at(input_pattern).get_node_shared_ptr(); - - const auto store = std::dynamic_pointer_cast(pm.at(store_pattern).get_node_shared_ptr()); - if (!store) - return false; - - const auto convert = pm.at(convert_pattern).get_node_shared_ptr(); - if (convert->output(0).get_target_inputs().size() != 1 || transformation_callback(convert)) - return false; - - std::shared_ptr store_convert = nullptr; - if (const auto convert_saturation = - std::dynamic_pointer_cast(convert)) { - store_convert = std::make_shared(input, - convert_saturation->get_destination_type(), - store->get_count(), store->get_offset()); - } else if (const auto convert_truncation = - std::dynamic_pointer_cast(convert)) { - store_convert = std::make_shared(input, - convert_truncation->get_destination_type(), - store->get_count(), store->get_offset()); - } else { - OPENVINO_THROW( - "Type of Convert op is undefined. Supports only fusing Store and ConvertTruncation or ConvertSaturation ops"); - } - - if (!store_convert) - return false; - - ngraph::copy_runtime_info(store, store_convert); - ngraph::replace_node(store, store_convert); - - return true; - }; - - auto m = std::make_shared(store_pattern, matcher_name); - register_matcher(m, callback); -} diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/fuse_load_store_and_convert.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/fuse_load_store_and_convert.hpp deleted file mode 100644 index 6d49bd65983802..00000000000000 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/fuse_load_store_and_convert.hpp +++ /dev/null @@ -1,40 +0,0 @@ -// Copyright (C) 2018-2023 Intel Corporation -// SPDX-License-Identifier: Apache-2.0 -// - -#pragma once - -#include "ngraph/pass/graph_rewrite.hpp" -#include "ngraph/pattern/matcher.hpp" - -namespace ov { -namespace intel_cpu { -namespace pass { - -/** - * @interface FuseLoadConvert - * @brief Fuse Load and ConvertSaturation into one op LoadConvertSaturation - * Fuse Load and ConvertTruncation into one op LoadConvertTruncation - * @ingroup snippets - */ -class FuseLoadConvert: public ngraph::pass::MatcherPass { -public: - OPENVINO_RTTI("FuseLoadConvert", "0"); - FuseLoadConvert(); -}; - -/** - * @interface FuseStoreConvert - * @brief Fuse Store and ConvertSaturation into one op StoreConvertSaturation - * Fuse Store and ConvertTruncation into one op StoreConvertTruncation - * @ingroup snippets - */ -class FuseStoreConvert: public ngraph::pass::MatcherPass { -public: - OPENVINO_RTTI("FuseStoreConvert", "0"); - FuseStoreConvert(); -}; - -} // namespace pass -} // namespace intel_cpu -} // namespace ov diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.cpp new file mode 100644 index 00000000000000..a5b99245fd789a --- /dev/null +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.cpp @@ -0,0 +1,127 @@ +// Copyright (C) 2018-2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include "snippets/itt.hpp" + +#include "fuse_load_store_and_convert.hpp" +#include "snippets/snippets_isa.hpp" + +#include "transformations/snippets/x64/op/load_convert.hpp" +#include "transformations/snippets/x64/op/store_convert.hpp" + + +bool ov::intel_cpu::pass::FuseLoadStoreConvert::fuse_load_convert(snippets::lowered::LinearIR& linear_ir, + snippets::lowered::LinearIR::constExprIt& convert_it) { + const auto& convert_expr = *convert_it; + const auto& convert = ov::as_type_ptr(convert_expr->get_node()); + const auto& input_connector = convert_expr->get_input_port_connector(0); + if (convert->get_destination_type() != ov::element::f32 && convert->get_destination_type() != ov::element::i32) + return false; + + const auto& load_output = input_connector->get_source(); + const auto& load_expr = load_output.get_expr(); + const auto load = ov::as_type_ptr(load_expr->get_node()); + if (!load || + ov::is_type(load_expr->get_node()) || + ov::is_type(load_expr->get_node())) + return false; + + const auto consumers = input_connector->get_consumers(); + if (consumers.size() != 1) + return false; + + std::shared_ptr load_convert = nullptr; + if (const auto convert_saturation = ov::as_type_ptr(convert)) { + load_convert = std::make_shared(load->input_value(0), + convert_saturation->get_destination_type(), + load->get_count(), load->get_offset()); + } else if (const auto convert_truncation = ov::as_type_ptr(convert)) { + load_convert = std::make_shared(load->input_value(0), + convert_truncation->get_destination_type(), + load->get_count(), load->get_offset()); + } else { + OPENVINO_THROW("Type of Convert op is undefined. Supports only fusing Load and ConvertTruncation or ConvertSaturation ops"); + } + + const auto out_port = convert_expr->get_output_port(0); + const auto convert_consumers = out_port.get_connected_ports(); + snippets::lowered::PortDescriptorUtils::set_port_descriptor_ptr(load_convert->output(0), out_port.get_descriptor_ptr()->clone()); + const auto load_convert_expr = linear_ir.create_expression(load_convert, { load_expr->get_input_port_connector(0) }); + const auto convert_expr_it = convert_it; + const auto insertion_pos = std::next(convert_it); + convert_it = linear_ir.insert(insertion_pos, load_convert_expr); + linear_ir.erase(std::find(linear_ir.cbegin(), convert_expr_it, load_expr)); + linear_ir.erase(convert_expr_it); + linear_ir.replace_input(convert_consumers, load_convert_expr->get_output_port_connector(0)); + return true; +} + +bool ov::intel_cpu::pass::FuseLoadStoreConvert::fuse_store_convert(snippets::lowered::LinearIR& linear_ir, + snippets::lowered::LinearIR::constExprIt& convert_it) { + const auto& convert_expr = *convert_it; + const auto& convert = convert_expr->get_node(); + const auto& input_connector = convert_expr->get_input_port_connector(0); + const auto& output_connector = convert_expr->get_output_port_connector(0); + if (convert->get_input_element_type(0) != ov::element::f32 && convert->get_input_element_type(0) != ov::element::i32) + return false; + + const auto consumers = output_connector->get_consumers(); + if (consumers.size() != 1) + return false; + + const auto store_input = *(consumers.begin()); + const auto& store_expr = store_input.get_expr(); + const auto store = ov::as_type_ptr(store_expr->get_node()); + if (!store) + return false; + + std::shared_ptr store_convert = nullptr; + if (const auto convert_saturation = ov::as_type_ptr(convert)) { + store_convert = std::make_shared(convert->input_value(0), + convert_saturation->get_destination_type(), + store->get_count(), store->get_offset()); + } else if (const auto convert_truncation = ov::as_type_ptr(convert)) { + store_convert = std::make_shared(convert->input_value(0), + convert_truncation->get_destination_type(), + store->get_count(), store->get_offset()); + } else { + OPENVINO_THROW("Type of Convert op is undefined. Supports only fusing Store and ConvertTruncation or ConvertSaturation ops"); + } + + const auto out_port = store_expr->get_output_port(0); + const auto store_consumers = out_port.get_connected_ports(); + snippets::lowered::PortDescriptorUtils::set_port_descriptor_ptr(store_convert->output(0), out_port.get_descriptor_ptr()->clone()); + const auto store_convert_expr = linear_ir.create_expression(store_convert, { input_connector }); + const auto convert_expr_it = convert_it; + const auto insertion_pos = std::next(convert_it); + convert_it = linear_ir.insert(insertion_pos, store_convert_expr); + linear_ir.erase(std::find(convert_expr_it, linear_ir.cend(), store_expr)); + linear_ir.erase(convert_expr_it); + linear_ir.replace_input(store_consumers, store_convert_expr->get_output_port_connector(0)); + return true; +} + +bool ov::intel_cpu::pass::FuseLoadStoreConvert::run(snippets::lowered::LinearIR& linear_ir) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::FuseLoadStoreConvert") + + bool modified = false; + + for (auto expr_it = linear_ir.cbegin(); expr_it != linear_ir.cend(); expr_it++) { + const auto& expr = *expr_it; + const auto& convert = expr->get_node(); + if (!ov::is_type(convert)) + continue; + + if (fuse_load_convert(linear_ir, expr_it)) { + modified = true; + continue; + } + if (fuse_store_convert(linear_ir, expr_it)) { + modified = true; + continue; + } + } + + return modified; +} diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.hpp new file mode 100644 index 00000000000000..ed31897b77c703 --- /dev/null +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/fuse_load_store_and_convert.hpp @@ -0,0 +1,36 @@ +// Copyright (C) 2018-2023 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "snippets/lowered/pass/pass.hpp" + +namespace ov { +namespace intel_cpu { +namespace pass { + +/** + * @interface FuseLoadStoreConvert + * @brief Fuse Load and ConvertSaturation into one op LoadConvertSaturation + * Fuse Load and ConvertTruncation into one op LoadConvertTruncation + * Fuse Store and ConvertSaturation into one op StoreConvertSaturation + * Fuse Store and ConvertTruncation into one op StoreConvertTruncation + * @ingroup snippets + */ +class FuseLoadStoreConvert: public snippets::lowered::pass::Pass { +public: + FuseLoadStoreConvert() = default; + OPENVINO_RTTI("FuseLoadStoreConvert", "LinearIRTransformation"); + bool run(snippets::lowered::LinearIR& linear_ir) override; + +private: + bool fuse_load_convert(snippets::lowered::LinearIR& linear_ir, + snippets::lowered::LinearIR::constExprIt& convert_it); + bool fuse_store_convert(snippets::lowered::LinearIR& linear_ir, + snippets::lowered::LinearIR::constExprIt& convert_it); +}; + +} // namespace pass +} // namespace intel_cpu +} // namespace ov diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.cpp index f3be754294b003..1d179b1ace1675 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.cpp @@ -8,22 +8,21 @@ #include "snippets/snippets_isa.hpp" #include "transformations/snippets/x64/op/fused_mul_add.hpp" -#include -#include -#include +#include "openvino/pass/pattern/op/wrap_type.hpp" +#include "openvino/pass/pattern/matcher.hpp" +#include "openvino/core/rt_info.hpp" -using namespace ngraph; ov::intel_cpu::pass::MulAddToFMA::MulAddToFMA() { MATCHER_SCOPE(MulAddToFMA); - auto mul_input_1 = pattern::any_input(); - auto mul_input_2 = pattern::any_input(); - auto mul_m = pattern::wrap_type({ mul_input_1, mul_input_2 }, pattern::consumers_count(1)); - auto add_input_2 = pattern::any_input(); - auto add_m = pattern::wrap_type({ mul_m, add_input_2 }); - - ngraph::matcher_pass_callback callback = [=](pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "Snippets::op::MulAddToFMA_callback") + auto mul_input_1 = ov::pass::pattern::any_input(); + auto mul_input_2 = ov::pass::pattern::any_input(); + auto mul_m = ov::pass::pattern::wrap_type({ mul_input_1, mul_input_2 }, ov::pass::pattern::consumers_count(1)); + auto add_input_2 = ov::pass::pattern::any_input(); + auto add_m = ov::pass::pattern::wrap_type({ mul_m, add_input_2 }); + + matcher_pass_callback callback = [=](ov::pass::pattern::Matcher& m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::MulAddToFMA_callback") const auto& pattern_map = m.get_pattern_value_map(); const auto multiply = pattern_map.at(mul_m).get_node_shared_ptr(); const auto add = pattern_map.at(add_m).get_node_shared_ptr(); @@ -37,13 +36,13 @@ ov::intel_cpu::pass::MulAddToFMA::MulAddToFMA() { const auto& c = pattern_map.at(add_input_2); const auto fma = std::make_shared(a, b, c); - ngraph::copy_runtime_info({ a.get_node_shared_ptr(), b.get_node_shared_ptr(), c.get_node_shared_ptr() }, fma); + ov::copy_runtime_info({ a.get_node_shared_ptr(), b.get_node_shared_ptr(), c.get_node_shared_ptr() }, fma); fma->set_friendly_name(add->get_friendly_name()); - ngraph::replace_node(add, fma); + ov::replace_node(add, fma); return true; }; - auto m = std::make_shared(add_m, "MulAddToFMA"); + auto m = std::make_shared(add_m, "MulAddToFMA"); register_matcher(m, callback); } diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.hpp index d205d0acefea10..a4b65bcba14c65 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/mul_add_to_fma.hpp @@ -4,7 +4,7 @@ #pragma once -#include "ngraph/pass/graph_rewrite.hpp" +#include "openvino/pass/graph_rewrite.hpp" namespace ov { namespace intel_cpu { @@ -15,7 +15,7 @@ namespace pass { * @brief Replaces mul and add with FusedMulAdd node * @ingroup snippets */ -class MulAddToFMA : public ngraph::pass::MatcherPass { +class MulAddToFMA : public ov::pass::MatcherPass { public: MulAddToFMA(); }; diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.cpp index 238fadaa47e897..64885731c8ec7e 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.cpp @@ -5,19 +5,17 @@ #include "remove_converts.hpp" #include "snippets/itt.hpp" -#include "ngraph/opsets/opset1.hpp" -#include "ngraph/rt_info.hpp" -#include "ngraph/pattern/op/wrap_type.hpp" +#include "openvino/pass/pattern/op/wrap_type.hpp" #include "snippets/op/convert_saturation.hpp" ov::intel_cpu::pass::RemoveConverts::RemoveConverts() { MATCHER_SCOPE(RemoveConverts); - auto parent_convert_wrap = ngraph::pattern::wrap_type(); - auto child_convert_wrap = ngraph::pattern::wrap_type({ parent_convert_wrap }); + auto parent_convert_wrap = ov::pass::pattern::wrap_type(); + auto child_convert_wrap = ov::pass::pattern::wrap_type({ parent_convert_wrap }); - auto callback = [=](ngraph::pattern::Matcher& m) { - OV_ITT_SCOPED_TASK(ngraph::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::RemoveConverts") + auto callback = [=](ov::pass::pattern::Matcher& m) { + OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "ov::intel_cpu::pass::RemoveConverts") const auto& pm = m.get_pattern_value_map(); const auto parent_convert = pm.at(parent_convert_wrap).get_node_shared_ptr(); const auto child_convert = pm.at(child_convert_wrap).get_node_shared_ptr(); @@ -33,6 +31,6 @@ ov::intel_cpu::pass::RemoveConverts::RemoveConverts() { return true; }; - auto m = std::make_shared(child_convert_wrap, matcher_name); + auto m = std::make_shared(child_convert_wrap, matcher_name); register_matcher(m, callback); } diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.hpp index b1fc6d4503d606..040aa2ffd916c2 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/remove_converts.hpp @@ -4,8 +4,7 @@ #pragma once -#include "ngraph/pass/graph_rewrite.hpp" -#include "ngraph/pattern/matcher.hpp" +#include "openvino/pass/graph_rewrite.hpp" namespace ov { namespace intel_cpu { @@ -16,7 +15,7 @@ namespace pass { * @brief Remove sequence of two ConvertSaturation operations for specific precisions: FP32 => BF16 => FP32 * @ingroup snippets */ -class RemoveConverts : public ngraph::pass::MatcherPass { +class RemoveConverts : public ov::pass::MatcherPass { public: OPENVINO_RTTI("RemoveConverts", "0"); RemoveConverts(); diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.cpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.cpp index 221b0145e08f0d..f6e7d988f6d23f 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.cpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.cpp @@ -2,16 +2,16 @@ // SPDX-License-Identifier: Apache-2.0 // #include "snippets_mark_skipped.hpp" + #include "snippets/pass/tokenization.hpp" #include "snippets/op/subgraph.hpp" #include "snippets/utils.hpp" -#include + #include #include #include "itt.hpp" -using namespace ngraph; namespace ov { namespace intel_cpu { @@ -44,13 +44,13 @@ int getNumNonConstInputs(const std::shared_ptr &node) { int num_non_const_inputs = 0; for (const auto &parent_out : node->input_values()) { const auto parent = parent_out.get_node_shared_ptr(); - if (ov::is_type(parent)) { + if (ov::is_type(parent)) { for (const auto &grandparent_out : parent->input_values()) { const auto grandparent = grandparent_out.get_node_shared_ptr(); - if (!ngraph::op::is_constant(grandparent)) + if (!ov::is_type(grandparent)) num_non_const_inputs++; } - } else if (!ngraph::op::is_constant(parent)) { + } else if (!ov::is_type(parent)) { num_non_const_inputs++; } } @@ -59,16 +59,16 @@ int getNumNonConstInputs(const std::shared_ptr &node) { bool SupportsFusingWithConvolution_SumActivation(const std::shared_ptr &node) { // todo: Do all PReLUs are fused? Not sure about round and softRelu // EltwiseRoundHalfToEven, EltwiseRoundHalfAwayFromZero, EltwiseSoftRelu - return ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node); + return ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node); } bool canBePerformedAsScaleShift(const std::shared_ptr &node, const int channelAxis) { @@ -77,7 +77,7 @@ bool canBePerformedAsScaleShift(const std::shared_ptr &node, const i ov::PartialShape dataShape; for (size_t i = 0; i < node->get_input_size(); i++) { const auto parent = node->get_input_node_shared_ptr(i); - if (!ngraph::op::is_constant(parent)) { + if (!ov::is_type(parent)) { fusingPort = i; dataShape = node->get_input_partial_shape(i); // only one non-const parent is allowed @@ -105,10 +105,10 @@ bool canBePerformedAsScaleShift(const std::shared_ptr &node, const i // Prelu and MulAdd are still ignored // isConvertablePowerStatic() is ignored - return (ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node)) && + return (ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node)) && isBroadcastableToDataInput(); } @@ -118,25 +118,25 @@ inline bool canBeMatMulExecutedInInt8(const ov::element::Type& firstType, const bool SupportsFusingWithConvolution_Simple(const std::shared_ptr &node, const int channelAxis = DEFAULT_AXIS) { return SupportsFusingWithConvolution_SumActivation(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || canBePerformedAsScaleShift(node, channelAxis); } // Convolution is a special case, since it supports peculiar fusings bool isSuitableConvolutionParent(const std::shared_ptr &node) { - const bool is_suitable_node = ov::is_type(node) || - ov::is_type(node); + const bool is_suitable_node = ov::is_type(node) || + ov::is_type(node); // has a single output, connected to a single child const auto out = node->outputs(); const bool has_only_child = (out.size() == 1) && (out[0].get_target_inputs().size() == 1); return is_suitable_node && has_only_child; } bool isSuitableBinaryConvolutionParent(const std::shared_ptr &node) { - const bool is_suitable_node = ov::is_type(node); + const bool is_suitable_node = ov::is_type(node); // has a single output, connected to a single child const auto out = node->outputs(); const bool has_only_child = (out.size() == 1) && (out[0].get_target_inputs().size() == 1); @@ -158,17 +158,17 @@ int getChannelAxis(const ov::AxisSet &axes, bool keep_dims) { return channelAxis; } bool isSuitableMiscParent(const std::shared_ptr &node) { - const bool is_suitable_node = ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node) || - ov::is_type(node); + const bool is_suitable_node = ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node) || + ov::is_type(node); // has a single output, connected to a single child const auto out = node->outputs(); const bool has_only_child = (out.size() == 1) && (out[0].get_target_inputs().size() == 1); @@ -176,7 +176,7 @@ bool isSuitableMiscParent(const std::shared_ptr &node) { } // Matmul is a special case, since it supports simple + bias fusings bool isSuitableMatMulParent(const std::shared_ptr &node) { - const bool is_suitable_node = ov::is_type(node); + const bool is_suitable_node = ov::is_type(node); // has a single output, connected to a single child const auto out = node->outputs(); const bool has_only_child = (out.size() == 1) && (out[0].get_target_inputs().size() == 1); @@ -191,7 +191,7 @@ inline bool isSuitableReduceParent(const std::shared_ptr &node) { } // Subtract as ZeroPoints for Convolution bool isSuitableSubtractAsZeroPointsParent(const std::shared_ptr &node) { - const bool is_suitable_node = ov::is_type(node); + const bool is_suitable_node = ov::is_type(node); const auto out = node->outputs(); const bool has_only_child = (out.size() == 1) && (out[0].get_target_inputs().size() == 1); const bool has_two_parents = node->get_input_size() == 2; @@ -230,12 +230,12 @@ bool isSuitableSubtractAsZeroPointsParent(const std::shared_ptr &nod zp_weights_is_suitable; const auto conv_weights = child->get_input_node_shared_ptr(1); - bool second_conv_input_is_suitable = ov::is_type(conv_weights) && + bool second_conv_input_is_suitable = ov::is_type(conv_weights) && conv_weights->get_output_element_type(0) == ov::element::i8; return first_conv_input_is_suitable && second_conv_input_is_suitable; } bool isSuitablePoolChild(const std::shared_ptr &node) { - const bool is_suitable_node = ov::is_type(node); + const bool is_suitable_node = ov::is_type(node); // has a single output, connected to a single child const auto out = node->outputs(); const bool has_only_child = (out.size() == 1) && (out[0].get_target_inputs().size() == 1); @@ -253,7 +253,7 @@ bool isSuitableChildForFusingMatMul(const std::shared_ptr &node, con ov::PartialShape matmul_shape; for (const auto &parent_out : node->input_values()) { const auto parent = parent_out.get_node_shared_ptr(); - if (ngraph::op::is_constant(parent)) { + if (ov::is_type(parent)) { bias_shape = parent_out.get_shape(); num_non_const_inputs++; } else { @@ -276,7 +276,7 @@ bool isSuitableChildForFusingMatMul(const std::shared_ptr &node, con return false; // Matmul / FC bias fusion - if (ov::is_type(node) && + if (ov::is_type(node) && bias_shape.is_static() && matmul_shape.rbegin()->is_static() && bias_shape.rbegin()->get_length() == matmul_shape.rbegin()->get_length() && bias_shape.rbegin()->get_length() == shape_size(bias_shape.get_shape())) { @@ -319,10 +319,10 @@ bool isSuitableChildForFusingMatMul(const std::shared_ptr &node, con } } } else if (ov::is_type(node)) { - const bool is_per_tensor_broadcasting = ngraph::snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(1)) && - ngraph::snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(2)) && - ngraph::snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(3)) && - ngraph::snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(4)); + const bool is_per_tensor_broadcasting = snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(1)) && + snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(2)) && + snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(3)) && + snippets::utils::is_scalar_constant(node->get_input_node_shared_ptr(4)); if (!is_per_tensor_broadcasting) { return false; } @@ -340,15 +340,15 @@ bool isSuitableChildForFusingMatMul(const std::shared_ptr &node, con return true; } bool isSuitableParentForFusingSumActivation(const std::shared_ptr &node) { - if (!ov::is_type(node)) + if (!ov::is_type(node)) return false; auto isFusedBiasNode = [](std::shared_ptr n){ - if (!(ov::is_type(n) && + if (!(ov::is_type(n) && GetNodeFusingType(n) == NodeFusingType::FusedWithConvolution)) return false; const auto conv = n->get_input_source_output(0); const auto bias = n->get_input_source_output(1); - if (!(ngraph::op::is_constant(bias.get_node_shared_ptr()) && isSuitableConvolutionParent(conv.get_node_shared_ptr()))) + if (!(ov::is_type(bias.get_node_shared_ptr()) && isSuitableConvolutionParent(conv.get_node_shared_ptr()))) return false; const auto conv_shape = conv.get_partial_shape(); const auto bias_shape = bias.get_partial_shape(); @@ -371,7 +371,7 @@ bool isSuitableParentForFusingSumActivation(const std::shared_ptr &n return true; }; auto isFusedFQNode = [&isFusedBiasNode](std::shared_ptr n) { - if (!(ov::is_type(n) && + if (!(ov::is_type(n) && GetNodeFusingType(n) == NodeFusingType::FusedWithConvolution)) return false; const auto& parent = n->get_input_node_shared_ptr(0); @@ -425,19 +425,19 @@ void MarkSubgraphOpAsSkipped(const std::shared_ptr &node) { } bool isSuitableConvert(const std::shared_ptr& node) { - if (!ov::is_type(node)) + if (!ov::is_type(node)) return false; auto hasResult = [](const std::shared_ptr& node){ auto consumers = node->output(0).get_target_inputs(); bool findResult = false; if (consumers.size() == 1) { - if (ov::is_type(consumers.begin()->get_node())) + if (ov::is_type(consumers.begin()->get_node())) findResult = true; } return findResult; }; // 1. check Parameter->Convert 2. check Convert->Result - if (ov::is_type(node->get_input_node_ptr(0))) { + if (ov::is_type(node->get_input_node_ptr(0))) { auto inPrc = node->get_input_element_type(0); auto outPrc = node->get_output_element_type(0); return inPrc == element::bf16 && outPrc == element::f32; @@ -455,7 +455,7 @@ bool SnippetsMarkSkipped::run_on_model(const std::shared_ptr &m) { RUN_ON_MODEL_SCOPE(SnippetsMarkSkipped); int channelAxis = DEFAULT_AXIS; for (auto &node : m->get_ordered_ops()) { - if (ngraph::op::is_constant(node) || ov::is_type(node)) + if (ov::is_type(node) || ov::is_type(node)) continue; if (isSuitableConvolutionParent(node)) { // Initiate fusing chain @@ -465,11 +465,11 @@ bool SnippetsMarkSkipped::run_on_model(const std::shared_ptr &m) { SetNodeFusingType(node, NodeFusingType::FusedWithBinaryConvolution); channelAxis = DEFAULT_AXIS; } else if (isSuitableReduceParent(node)) { - const auto reduce = std::dynamic_pointer_cast(node); + const auto reduce = std::dynamic_pointer_cast(node); channelAxis = getChannelAxis(reduce->get_reduction_axes(), reduce->get_keep_dims()); SetNodeFusingType(node, NodeFusingType::FusedWithReduce); } else if (isSuitableMiscParent(node)) { - if (const auto reduce = std::dynamic_pointer_cast(node)) { + if (const auto reduce = std::dynamic_pointer_cast(node)) { channelAxis = getChannelAxis(reduce->get_reduction_axes(), reduce->get_keep_dims()); } else { channelAxis = DEFAULT_AXIS; diff --git a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.hpp b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.hpp index 2ba3d484ac5838..5ce0b04c836255 100644 --- a/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.hpp +++ b/src/plugins/intel_cpu/src/transformations/snippets/x64/pass/snippets_mark_skipped.hpp @@ -4,7 +4,7 @@ #pragma once -#include +#include "openvino/pass/graph_rewrite.hpp" namespace ov { namespace intel_cpu { diff --git a/src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp b/src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp index 0bdc2fb380390b..eb5b6d4fd3752e 100644 --- a/src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp +++ b/src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp @@ -620,13 +620,13 @@ void Transformations::MainSnippets(void) { snippetsManager.set_per_pass_validation(false); if (snippetsMode != Config::SnippetsMode::IgnoreCallback) CPU_REGISTER_PASS_X64(snippetsManager, SnippetsMarkSkipped, enableBF16); - CPU_REGISTER_PASS_X64(snippetsManager, ngraph::snippets::pass::SnippetsTokenization); + CPU_REGISTER_PASS_X64(snippetsManager, snippets::pass::SnippetsTokenization); const bool isMHASupported = !enableBF16 && // TODO: Need to add BF16 support for MHA in Snippets dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core); // MHA has BRGEMM that is supported only on AVX512 platforms if (!isMHASupported) { - CPU_DISABLE_PASS_X64(snippetsManager, ngraph::snippets::pass::TokenizeMHASnippets); + CPU_DISABLE_PASS_X64(snippetsManager, snippets::pass::TokenizeMHASnippets); } if (snippetsMode != Config::SnippetsMode::IgnoreCallback) { CPU_SET_CALLBACK_X64(snippetsManager, @@ -651,7 +651,7 @@ void Transformations::MainSnippets(void) { const auto is_unsupported_kernel_work_amount = kernel_buffer_size > l2_cache_size; return is_unsupported_parallel_work_amount || is_unsupported_kernel_work_amount; }, - ngraph::snippets::pass::TokenizeMHASnippets); + snippets::pass::TokenizeMHASnippets); CPU_SET_CALLBACK_X64(snippetsManager, [](const std::shared_ptr& n) -> bool { // CPU Plugin support Swish in Subgraph via conversion to SwichCPU which assumes second input to be constant @@ -690,7 +690,7 @@ void Transformations::MainSnippets(void) { return has_only_const_inputs || bad_input_rank || bad_output_rank || is_unsupported_swish || is_disabled_tokenization; }, - ngraph::snippets::pass::TokenizeSnippets); + snippets::pass::TokenizeSnippets); } snippetsManager.run_passes(model); } diff --git a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/add.cpp b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/add.cpp index 86b7d6b3b11f74..499ff42ebf63fe 100644 --- a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/add.cpp +++ b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/add.cpp @@ -67,7 +67,7 @@ INSTANTIATE_TEST_SUITE_P(smoke_Snippets_Eltwise, AddRollConst, INSTANTIATE_TEST_SUITE_P(smoke_Snippets_Eltwise_BF16, AddRollConst, ::testing::Combine( - ::testing::Values(ov::Shape {1, 42, 16, 64}), + ::testing::Values(ov::Shape {1, 2, 3, 32}), ::testing::Values(ov::element::bf16), ::testing::Values(3), // Add + reorder + roll after inputs ::testing::Values(1), // Subgraph is created, since the inputs are followed by converts diff --git a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/matmul.cpp b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/matmul.cpp index 9d792f35264066..59807c50c9df9b 100644 --- a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/matmul.cpp +++ b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/matmul.cpp @@ -36,6 +36,16 @@ static inline std::vector> precisions(bool only_fp32 } return prc; } +static inline std::vector> quantized_precisions() { + std::vector> prc = {}; + // In Snippets MatMul INT8 is supported only on VNNI/AMX platforms + if (InferenceEngine::with_cpu_x86_avx512_core_vnni() || InferenceEngine::with_cpu_x86_avx512_core_amx_int8()) { + prc.emplace_back(std::vector{element::i8, element::i8}); + prc.emplace_back(std::vector{element::u8, element::i8}); + } + return prc; +} + INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MatMult, MatMul, ::testing::Combine( ::testing::ValuesIn(input_shapes), @@ -63,6 +73,35 @@ INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MatMulBias, MatMulBias, ::testing::Values(CommonTestUtils::DEVICE_CPU)), MatMul::getTestCaseName); +INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MatMulBiasQuantized, MatMulBiasQuantized, + ::testing::Combine( + ::testing::ValuesIn(std::vector>{ + std::vector{{1, 2, 69, 43}, {2, 1, 43, 49}, {1, 2, 1, 1}}, + std::vector{{1, 2, 69, 43}, {2, 1, 43, 49}, {1, 2, 69, 49}}}), + ::testing::ValuesIn(quantized_precisions()), + ::testing::Values(1), // Subgraph + ::testing::Values(1), // Tokenized MatMul+Bias + ::testing::Values(CommonTestUtils::DEVICE_CPU)), + MatMul::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MatMulsQuantized, MatMulsQuantized, + ::testing::Combine( + ::testing::Values(std::vector{{1, 16, 128, 64}, {1, 16, 64, 128}, {128, 64}}), + ::testing::ValuesIn(quantized_precisions()), + ::testing::Values(3), // Subgraph + Reshape + Subgraph + ::testing::Values(2), // Tokenized [MatMul+FQ+Matmul] and [FQ] + ::testing::Values(CommonTestUtils::DEVICE_CPU)), + MatMul::getTestCaseName); + +INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MatMulsQuantizedSoftmax, MatMulsQuantizedSoftmax, + ::testing::Combine( + ::testing::Values(std::vector{{1, 16, 128, 64}, {1, 16, 64, 128}, {128, 64}}), + ::testing::ValuesIn(quantized_precisions()), + ::testing::Values(3), // Subgraph + Reshape + Subgraph + ::testing::Values(2), // Tokenized [MatMul+FQ+Matmul] and [FQ] + ::testing::Values(CommonTestUtils::DEVICE_CPU)), + MatMul::getTestCaseName); + } // namespace } // namespace snippets } // namespace test diff --git a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/mha.cpp b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/mha.cpp index 13c3063d0d1225..8778c35ab1d7ed 100644 --- a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/mha.cpp +++ b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/mha.cpp @@ -53,7 +53,8 @@ INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MHA, MHASelect, MHA::getTestCaseName); const std::vector> inputShapesWOTranspose = { - {{1, 12, 197, 64}, {1, 12, 64, 197}, {1, 12, 197, 64}} + {{1, 12, 197, 64}, {1, 12, 64, 197}, {1, 12, 197, 64}}, + {{1, 12, 12, 64}, {1, 12, 64, 48}, {1, 12, 48, 64}} }; INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MHAWOTransposeOnInputs, MHAWOTransposeOnInputs, @@ -85,4 +86,4 @@ INSTANTIATE_TEST_SUITE_P(smoke_Snippets_MHABF16, MHAWOTranspose, } // namespace } // namespace snippets } // namespace test -} // namespace ov \ No newline at end of file +} // namespace ov diff --git a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/select.cpp b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/select.cpp index 677d7678af09e7..ae060e749ce0a8 100644 --- a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/select.cpp +++ b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/select.cpp @@ -14,9 +14,9 @@ namespace { INSTANTIATE_TEST_SUITE_P(smoke_Snippets_Select, Select, ::testing::Combine( - ::testing::ValuesIn({ov::Shape{1, 5, 5, 35}, ov::Shape{1}}), - ::testing::ValuesIn({ov::Shape{1, 5, 5, 35}, ov::Shape{1}}), - ::testing::ValuesIn({ov::Shape{1, 5, 5, 35}, ov::Shape{1}}), + ::testing::ValuesIn({ov::Shape{1, 5, 5, 35}, }), + ::testing::ValuesIn({ov::Shape{1, 5, 5, 35}, }), + ::testing::ValuesIn({ov::Shape{1}}), ::testing::ValuesIn({ov::element::f32, ov::element::i8}), ::testing::Values(1), ::testing::Values(1), diff --git a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/transpose.cpp b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/transpose.cpp index 0179adb0a7ae54..e3d25d611e629c 100644 --- a/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/transpose.cpp +++ b/src/plugins/intel_cpu/tests/functional/shared_tests_instances/snippets/transpose.cpp @@ -21,6 +21,16 @@ INSTANTIATE_TEST_SUITE_P(smoke_Snippets_Transpose, Transpose, ::testing::Values(CommonTestUtils::DEVICE_CPU)), Transpose::getTestCaseName); +INSTANTIATE_TEST_SUITE_P(smoke_Snippets_TransposeMul, TransposeMul, + ::testing::Combine( + ::testing::Values(ov::PartialShape {2, 31, 3, 5}), + ::testing::ValuesIn(std::vector{{2, 3, 5, 31}}), + ::testing::Values(std::vector {0, 2, 3, 1}), + ::testing::Values(1), // Transpose + ::testing::Values(1), // Tokenized Transpose + ::testing::Values(CommonTestUtils::DEVICE_CPU)), + TransposeMul::getTestCaseName); + } // namespace } // namespace snippets } // namespace test diff --git a/src/plugins/intel_cpu/tests/functional/subgraph_tests/src/subgraph_serialize.cpp b/src/plugins/intel_cpu/tests/functional/subgraph_tests/src/subgraph_serialize.cpp index 037d34eb030b6c..cb7f60458fa6b2 100644 --- a/src/plugins/intel_cpu/tests/functional/subgraph_tests/src/subgraph_serialize.cpp +++ b/src/plugins/intel_cpu/tests/functional/subgraph_tests/src/subgraph_serialize.cpp @@ -27,7 +27,7 @@ TEST_F(SubgraphSnippetSerializationTest, smoke_SerializeSubgraph) { auto ininput1 = std::make_shared(ov::element::f32, shape); auto add = std::make_shared(ininput0, ininput1); auto subgraph_body = std::make_shared(ov::NodeVector{add}, ov::ParameterVector{ininput0, ininput1}); - auto subgraph = std::make_shared(ov::NodeVector{input0, input1}, subgraph_body.get()->clone()); + auto subgraph = std::make_shared(ov::NodeVector{input0, input1}, subgraph_body.get()->clone()); return std::make_shared(ov::NodeVector{subgraph}, ov::ParameterVector{input0, input1}); })(); ov::Core core; @@ -75,7 +75,7 @@ TEST_F(SubgraphSnippetSerializationTest, smoke_SerializeSubgraphWithScalarConst) auto add = std::make_shared(input, constant); auto internal_add = std::make_shared(internal_input, internal_constant); auto subgraph_body = std::make_shared(ov::NodeVector{internal_add}, ov::ParameterVector{internal_input}); - auto subgraph = std::make_shared(ov::NodeVector{add}, subgraph_body.get()->clone()); + auto subgraph = std::make_shared(ov::NodeVector{add}, subgraph_body.get()->clone()); return std::make_shared(ov::NodeVector{subgraph}, ov::ParameterVector{input}); })(); ov::Core core; diff --git a/src/plugins/intel_cpu/tests/unit/generate_add.cpp b/src/plugins/intel_cpu/tests/unit/generate_add.cpp index 3693c4bd9ece89..c77e97023700d8 100644 --- a/src/plugins/intel_cpu/tests/unit/generate_add.cpp +++ b/src/plugins/intel_cpu/tests/unit/generate_add.cpp @@ -73,14 +73,14 @@ inline auto wrapAsSnippet(std::shared_ptr& f, const ngraph::Shape& shape1) -> std::shared_ptr { auto input0 = std::make_shared(ngraph::element::f32, shape0); auto input1 = std::make_shared(ngraph::element::f32, shape1); - auto snippet = std::make_shared(ngraph::OutputVector{input0, input1}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input0, input1}, ngraph::clone_function(*f.get())); return std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input0, input1}); } inline auto wrapAsSnippet(std::shared_ptr& f, const ngraph::Shape& shape0) -> std::shared_ptr { auto input0 = std::make_shared(ngraph::element::f32, shape0); - auto snippet = std::make_shared(ngraph::OutputVector{input0}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input0}, ngraph::clone_function(*f.get())); return std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input0}); } @@ -186,7 +186,7 @@ TEST(SnippetsTests, GenerateAddBroadcastX2Edges) { auto input2 = std::make_shared(ngraph::element::f32, shape0); auto input3 = std::make_shared(ngraph::element::f32, shape1); auto input4 = std::make_shared(ngraph::element::f32, shape1); - auto snippet = std::make_shared(ngraph::OutputVector{input2, input3, input4}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input2, input3, input4}, ngraph::clone_function(*f.get())); return std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input2, input3, input4}); })(shape0, shape1); @@ -283,7 +283,7 @@ TEST(SnippetsTests, GenerateAddNegate) { auto s = ([f] (const ngraph::Shape& shape) -> std::shared_ptr{ auto input2 = std::make_shared(ngraph::element::f32, shape); auto input3 = std::make_shared(ngraph::element::f32, shape); - auto snippet = std::make_shared(ngraph::OutputVector{input2, input3}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input2, input3}, ngraph::clone_function(*f.get())); return std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input2, input3}); })(shape); @@ -307,7 +307,7 @@ TEST(SnippetsTests, GenerateAddNegateAdd) { auto input11 = std::make_shared(ngraph::element::f32, shape); auto input21 = std::make_shared(ngraph::element::f32, shape); auto input31 = std::make_shared(ngraph::element::f32, shape); - auto snippet = std::make_shared(ngraph::OutputVector{input11, input21, input31}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input11, input21, input31}, ngraph::clone_function(*f.get())); std::shared_ptr s = std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input11, input21, input31}); auto referenceInputs = gen_inputs(shape, 3); @@ -344,7 +344,7 @@ TEST(SnippetsTests, GenerateAddNegateAddMultiEdgeConst) { std::shared_ptr f = std::make_shared(ngraph::NodeVector{add2}, ngraph::ParameterVector{input1}); auto input11 = std::make_shared(ngraph::element::f32, shape); - auto snippet = std::make_shared(ngraph::OutputVector{input11}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input11}, ngraph::clone_function(*f.get())); std::shared_ptr s = std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input11}); auto referenceInputs = gen_inputs(shape, 1); @@ -362,7 +362,7 @@ TEST(SnippetsTests, GenerateErf) { std::shared_ptr f = std::make_shared(ngraph::NodeVector{gelu}, ngraph::ParameterVector{input1}); auto input11 = std::make_shared(ngraph::element::f32, shape); - auto snippet = std::make_shared(ngraph::OutputVector{input11}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input11}, ngraph::clone_function(*f.get())); std::shared_ptr s = std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input11}); auto referenceInputs = gen_inputs(shape, 1); @@ -396,7 +396,7 @@ TEST(SnippetsTests, GenerateAddBroadcastAutomatic) { auto input0 = std::make_shared(ngraph::element::f32, shape0); auto input1 = std::make_shared(ngraph::element::f32, shape1); auto input2 = std::make_shared(ngraph::element::f32, shape2); - auto snippet = std::make_shared(ngraph::OutputVector{input0, input1, input2}, ngraph::clone_function(*f.get())); + auto snippet = std::make_shared(ngraph::OutputVector{input0, input1, input2}, ngraph::clone_function(*f.get())); return std::make_shared(ngraph::NodeVector{snippet}, ngraph::ParameterVector{input0, input1, input2}); })(shapes[0], shapes[1], shapes[2]); diff --git a/src/plugins/intel_cpu/tests/unit/snippets_transformations/mul_add_to_fma.cpp b/src/plugins/intel_cpu/tests/unit/snippets_transformations/mul_add_to_fma.cpp index e46bbd57a0c0b4..5184d9211f8105 100644 --- a/src/plugins/intel_cpu/tests/unit/snippets_transformations/mul_add_to_fma.cpp +++ b/src/plugins/intel_cpu/tests/unit/snippets_transformations/mul_add_to_fma.cpp @@ -7,7 +7,6 @@ #include #include -#include "snippets/pass/loop_helpers.hpp" #include "lowering_utils.hpp" namespace ov { @@ -68,30 +67,13 @@ class EltwiseWithMulAddFunction : public SnippetsFunctionBase { data2 = parameter; } - auto load0 = std::make_shared(data0); - auto load1 = std::make_shared(data1); - auto load2 = scalar_input ? data2 : std::make_shared(data2); - auto a = scalar_input || add_input_idx == 0 ? load0 : load1; - auto b = scalar_input || add_input_idx == 0 ? load1 : load2; - auto c = scalar_input || add_input_idx == 0 ? load2 : load0; + auto a = scalar_input || add_input_idx == 0 ? data0 : data1; + auto b = scalar_input || add_input_idx == 0 ? data1 : data2; + auto c = scalar_input || add_input_idx == 0 ? data2 : data0; auto fma = std::make_shared(a, b, c); - auto store = std::make_shared(fma); - auto model = std::make_shared(NodeVector{store}, parameters); - - ResultVector results({model->get_results()[0]}); - const auto& inner_loop_begin = ngraph::snippets::op::insertLoopBegin(parameters); - // control dependency is added only in the case when scalar are located before loopBegin in topological order - if (scalar_input && add_input_idx == 1) { - data2->add_control_dependency(inner_loop_begin); - } - std::vector apply_increments(parameters.size() + results.size(), true); - ngraph::snippets::op::insertLoopEnd(results, inner_loop_begin, 1, 1, apply_increments); - const auto& outer_loop_begin = ngraph::snippets::op::insertLoopBegin(parameters); - ngraph::snippets::op::insertLoopEnd(results, outer_loop_begin, 1, 1, apply_increments); - - return model; + return std::make_shared(NodeVector{fma}, parameters); } void validate_function(const std::shared_ptr &m) const override { diff --git a/src/tests/functional/plugin/shared/include/snippets/matmul.hpp b/src/tests/functional/plugin/shared/include/snippets/matmul.hpp index 3e2a0ab015e988..d4139b11de07a9 100644 --- a/src/tests/functional/plugin/shared/include/snippets/matmul.hpp +++ b/src/tests/functional/plugin/shared/include/snippets/matmul.hpp @@ -19,22 +19,39 @@ typedef std::tuple< > MatMulParams; class MatMul : public testing::WithParamInterface, - virtual public ov::test::SnippetsTestsCommon { + virtual public ov::test::SnippetsTestsCommon { public: static std::string getTestCaseName(testing::TestParamInfo obj); protected: void SetUp() override; + + virtual void init_subgraph(const std::vector& inputShapes, const std::vector& types); }; class MatMulFQ : public MatMul { protected: - void SetUp() override; + void init_subgraph(const std::vector& inputShapes, const std::vector& types) override; }; class MatMulBias : public MatMul { protected: - void SetUp() override; + void init_subgraph(const std::vector& inputShapes, const std::vector& types) override; +}; + +class MatMulBiasQuantized : public MatMul { +protected: + void init_subgraph(const std::vector& inputShapes, const std::vector& types) override; +}; + +class MatMulsQuantized : public MatMul { +protected: + void init_subgraph(const std::vector& inputShapes, const std::vector& types) override; +}; + +class MatMulsQuantizedSoftmax : public MatMul { +protected: + void init_subgraph(const std::vector& inputShapes, const std::vector& types) override; }; } // namespace snippets diff --git a/src/tests/functional/plugin/shared/include/snippets/mha.hpp b/src/tests/functional/plugin/shared/include/snippets/mha.hpp index 7794c9b286d312..8c15adbc8c3fc4 100644 --- a/src/tests/functional/plugin/shared/include/snippets/mha.hpp +++ b/src/tests/functional/plugin/shared/include/snippets/mha.hpp @@ -29,22 +29,24 @@ class MHA : public testing::WithParamInterface, void SetUp() override; void generate_inputs(const std::vector& targetInputStaticShapes) override; + virtual void init_subgraph(); + + bool m_with_mul = false; }; class MHASelect : public MHA { protected: - void SetUp() override; - void generate_inputs(const std::vector& targetInputStaticShapes) override; + void init_subgraph() override; }; class MHAWOTransposeOnInputs : public MHA { protected: - void SetUp() override; + void init_subgraph() override; }; class MHAWOTranspose : public MHA { - void SetUp() override; + void init_subgraph() override; }; } // namespace snippets diff --git a/src/tests/functional/plugin/shared/include/snippets/transpose.hpp b/src/tests/functional/plugin/shared/include/snippets/transpose.hpp index e1491ebe8b1333..37a374d4229f8d 100644 --- a/src/tests/functional/plugin/shared/include/snippets/transpose.hpp +++ b/src/tests/functional/plugin/shared/include/snippets/transpose.hpp @@ -18,6 +18,15 @@ typedef std::tuple< std::string // Target Device > TransposeParams; +typedef std::tuple< + ov::PartialShape, // Input 0 Shape + ov::PartialShape, // Input 1 Shape + std::vector, // Transpose order + size_t, // Expected num nodes + size_t, // Expected num subgraphs + std::string // Target Device +> TransposeMulParams; + class Transpose : public testing::WithParamInterface, virtual public ov::test::SnippetsTestsCommon { public: @@ -27,6 +36,15 @@ class Transpose : public testing::WithParamInterface, + virtual public ov::test::SnippetsTestsCommon { +public: + static std::string getTestCaseName(testing::TestParamInfo obj); + +protected: + void SetUp() override; +}; + } // namespace snippets } // namespace test } // namespace ov \ No newline at end of file diff --git a/src/tests/functional/plugin/shared/src/snippets/matmul.cpp b/src/tests/functional/plugin/shared/src/snippets/matmul.cpp index 06a37e2fd1ffed..6ef643e3efeee0 100644 --- a/src/tests/functional/plugin/shared/src/snippets/matmul.cpp +++ b/src/tests/functional/plugin/shared/src/snippets/matmul.cpp @@ -35,40 +35,41 @@ void MatMul::SetUp() { std::tie(input_shapes, elem_types, ref_num_nodes, ref_num_subgraphs, targetDevice) = this->GetParam(); init_input_shapes(static_partial_shapes_to_test_representation(input_shapes)); - auto f = ov::test::snippets::MatMulFunction(input_shapes, elem_types); - function = f.getOriginal(); + init_subgraph(input_shapes, elem_types); if (!configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { configuration.insert({InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE, InferenceEngine::PluginConfigInternalParams::IGNORE_CALLBACK}); } } -void MatMulFQ::SetUp() { - std::vector input_shapes; - std::vector elem_types; - std::tie(input_shapes, elem_types, ref_num_nodes, ref_num_subgraphs, targetDevice) = this->GetParam(); - init_input_shapes(static_partial_shapes_to_test_representation(input_shapes)); +void MatMul::init_subgraph(const std::vector& inputShapes, const std::vector& types) { + auto f = ov::test::snippets::MatMulFunction(inputShapes, types); + function = f.getOriginal(); +} - auto f = ov::test::snippets::FQMatMulFunction(input_shapes); +void MatMulFQ::init_subgraph(const std::vector& inputShapes, const std::vector& types) { + auto f = ov::test::snippets::FQMatMulFunction(inputShapes); function = f.getOriginal(); - if (!configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { - configuration.insert({InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE, - InferenceEngine::PluginConfigInternalParams::IGNORE_CALLBACK}); - } } -void MatMulBias::SetUp() { - std::vector input_shapes; - std::vector elem_types; - std::tie(input_shapes, elem_types, ref_num_nodes, ref_num_subgraphs, targetDevice) = this->GetParam(); - init_input_shapes(static_partial_shapes_to_test_representation(input_shapes)); +void MatMulBias::init_subgraph(const std::vector& inputShapes, const std::vector& types) { + auto f = ov::test::snippets::MatMulBiasFunction(inputShapes, types); + function = f.getOriginal(); +} - auto f = ov::test::snippets::MatMulBiasFunction(input_shapes, elem_types); +void MatMulBiasQuantized::init_subgraph(const std::vector& inputShapes, const std::vector& types) { + auto f = ov::test::snippets::MatMulBiasQuantizedFunction(inputShapes, types); + function = f.getOriginal(); +} + +void MatMulsQuantized::init_subgraph(const std::vector& inputShapes, const std::vector& types) { + auto f = ov::test::snippets::MatMulsQuantizedFunction(inputShapes, types); + function = f.getOriginal(); +} + +void MatMulsQuantizedSoftmax::init_subgraph(const std::vector& inputShapes, const std::vector& types) { + auto f = ov::test::snippets::MatMulsQuantizedSoftmaxFunction(inputShapes, types); function = f.getOriginal(); - if (!configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { - configuration.insert({InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE, - InferenceEngine::PluginConfigInternalParams::IGNORE_CALLBACK}); - } } TEST_P(MatMul, CompareWithRefImpl) { @@ -89,6 +90,24 @@ TEST_P(MatMulBias, CompareWithRefImpl) { validateNumSubgraphs(); } +TEST_P(MatMulBiasQuantized, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() + run(); + validateNumSubgraphs(); +} + +TEST_P(MatMulsQuantized, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() + run(); + validateNumSubgraphs(); +} + +TEST_P(MatMulsQuantizedSoftmax, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() + run(); + validateNumSubgraphs(); +} + } // namespace snippets } // namespace test } // namespace ov diff --git a/src/tests/functional/plugin/shared/src/snippets/mha.cpp b/src/tests/functional/plugin/shared/src/snippets/mha.cpp index 714038ac726629..2f5d17dbd8159a 100644 --- a/src/tests/functional/plugin/shared/src/snippets/mha.cpp +++ b/src/tests/functional/plugin/shared/src/snippets/mha.cpp @@ -43,23 +43,23 @@ std::string MHA::getTestCaseName(testing::TestParamInfo inputShapes; - bool withMul; ov::element::Type prc; std::map additionalConfig; - std::tie(inputShapes, withMul, prc, ref_num_nodes, ref_num_subgraphs, targetDevice, additionalConfig) = this->GetParam(); + std::tie(inputShapes, m_with_mul, prc, ref_num_nodes, ref_num_subgraphs, targetDevice, additionalConfig) = this->GetParam(); init_input_shapes(static_partial_shapes_to_test_representation(inputShapes)); - auto f = ov::test::snippets::MHAFunction(inputDynamicShapes, withMul); - function = f.getOriginal(); + init_subgraph(); configuration.insert(additionalConfig.begin(), additionalConfig.end()); - if (!configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { + if (additionalConfig.empty() && !configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { configuration.insert({InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE, InferenceEngine::PluginConfigInternalParams::IGNORE_CALLBACK}); } setInferenceType(prc); inType = outType = prc; + if (prc == ov::element::bf16) + abs_threshold = 0.3; } void MHA::generate_inputs(const std::vector& targetInputStaticShapes) { @@ -73,25 +73,9 @@ void MHA::generate_inputs(const std::vector& targetInputStaticSha } } -void MHASelect::SetUp() { - std::vector inputShapes; - bool withMul; - ov::element::Type prc; - std::map additionalConfig; - std::tie(inputShapes, withMul, prc, ref_num_nodes, ref_num_subgraphs, targetDevice, additionalConfig) = this->GetParam(); - init_input_shapes(static_partial_shapes_to_test_representation(inputShapes)); - - auto f = ov::test::snippets::MHASelectFunction(inputDynamicShapes); +void MHA::init_subgraph() { + auto f = ov::test::snippets::MHAFunction(inputDynamicShapes, m_with_mul); function = f.getOriginal(); - - configuration.insert(additionalConfig.begin(), additionalConfig.end()); - if (!configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { - configuration.insert({InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE, - InferenceEngine::PluginConfigInternalParams::IGNORE_CALLBACK}); - } - - setInferenceType(prc); - inType = outType = prc; } void MHASelect::generate_inputs(const std::vector& targetInputStaticShapes) { @@ -112,58 +96,35 @@ void MHASelect::generate_inputs(const std::vector& targetInputSta } } -void MHAWOTransposeOnInputs::SetUp() { - std::vector inputShapes; - bool withMul; - ov::element::Type prc; - std::map additionalConfig; - std::tie(inputShapes, withMul, prc, ref_num_nodes, ref_num_subgraphs, targetDevice, additionalConfig) = this->GetParam(); - init_input_shapes(static_partial_shapes_to_test_representation(inputShapes)); +void MHASelect::init_subgraph() { + auto f = ov::test::snippets::MHASelectFunction(inputDynamicShapes); + function = f.getOriginal(); +} +void MHAWOTransposeOnInputs::init_subgraph() { auto f = ov::test::snippets::MHAWOTransposeOnInputsFunction(inputDynamicShapes); function = f.getOriginal(); - - configuration.insert(additionalConfig.begin(), additionalConfig.end()); - if (!configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { - configuration.insert({InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE, - InferenceEngine::PluginConfigInternalParams::IGNORE_CALLBACK}); - } - - setInferenceType(prc); - inType = outType = prc; } -void MHAWOTranspose::SetUp() { - std::vector inputShapes; - bool withMul; - ov::element::Type prc; - std::map additionalConfig; - std::tie(inputShapes, withMul, prc, ref_num_nodes, ref_num_subgraphs, targetDevice, additionalConfig) = this->GetParam(); - init_input_shapes(static_partial_shapes_to_test_representation(inputShapes)); - +void MHAWOTranspose::init_subgraph() { auto f = ov::test::snippets::MHAWOTransposeFunction(inputDynamicShapes); function = f.getOriginal(); - - configuration.insert(additionalConfig.begin(), additionalConfig.end()); - - setInferenceType(prc); - inType = outType = prc; - if (prc == ov::element::bf16) - abs_threshold = 0.3; } - TEST_P(MHA, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } TEST_P(MHASelect, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } TEST_P(MHAWOTransposeOnInputs, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } diff --git a/src/tests/functional/plugin/shared/src/snippets/softmax.cpp b/src/tests/functional/plugin/shared/src/snippets/softmax.cpp index 13b452832783bd..2ed21d027b746f 100644 --- a/src/tests/functional/plugin/shared/src/snippets/softmax.cpp +++ b/src/tests/functional/plugin/shared/src/snippets/softmax.cpp @@ -77,11 +77,13 @@ void AddSoftmax::SetUp() { } TEST_P(Softmax, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } TEST_P(AddSoftmax, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } diff --git a/src/tests/functional/plugin/shared/src/snippets/transpose.cpp b/src/tests/functional/plugin/shared/src/snippets/transpose.cpp index c5886fe74a867d..d14e0344ded61e 100644 --- a/src/tests/functional/plugin/shared/src/snippets/transpose.cpp +++ b/src/tests/functional/plugin/shared/src/snippets/transpose.cpp @@ -42,7 +42,44 @@ void Transpose::SetUp() { } } +std::string TransposeMul::getTestCaseName(testing::TestParamInfo obj) { + std::vector inputShapes(2); + std::vector order; + std::string targetDevice; + size_t num_nodes, num_subgraphs; + std::tie(inputShapes[0], inputShapes[1], order, num_nodes, num_subgraphs, targetDevice) = obj.param; + + std::ostringstream result; + for (int i = 0; i < inputShapes.size(); i++) + result << "IS[" << i << "]=" << CommonTestUtils::partialShape2str({inputShapes[i]}) << "_"; + result << "Order=" << CommonTestUtils::vec2str(order) << "_"; + result << "#N=" << num_nodes << "_"; + result << "#S=" << num_subgraphs << "_"; + result << "targetDevice=" << targetDevice; + return result.str(); +} + +void TransposeMul::SetUp() { + std::vector inputShapes(2); + std::vector order; + std::tie(inputShapes[0], inputShapes[1], order, ref_num_nodes, ref_num_subgraphs, targetDevice) = this->GetParam(); + init_input_shapes(static_partial_shapes_to_test_representation(inputShapes)); + auto f = ov::test::snippets::TransposeMulFunction(inputShapes, order); + function = f.getOriginal(); + if (!configuration.count(InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE)) { + configuration.insert({InferenceEngine::PluginConfigInternalParams::KEY_SNIPPETS_MODE, + InferenceEngine::PluginConfigInternalParams::IGNORE_CALLBACK}); + } +} + TEST_P(Transpose, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() + run(); + validateNumSubgraphs(); +} + +TEST_P(TransposeMul, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } diff --git a/src/tests/functional/plugin/shared/src/snippets/transpose_softmax.cpp b/src/tests/functional/plugin/shared/src/snippets/transpose_softmax.cpp index aecdd418f05ead..c60c52d6875ca8 100644 --- a/src/tests/functional/plugin/shared/src/snippets/transpose_softmax.cpp +++ b/src/tests/functional/plugin/shared/src/snippets/transpose_softmax.cpp @@ -67,11 +67,13 @@ void TransposeSoftmaxEltwise::SetUp() { } TEST_P(TransposeSoftmax, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } TEST_P(TransposeSoftmaxEltwise, CompareWithRefImpl) { + SKIP_IF_CURRENT_TEST_IS_DISABLED() run(); validateNumSubgraphs(); } diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_lowered.hpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_lowered.hpp index 40f8c20c9f3a65..4d147f3b7eb4be 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_lowered.hpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_lowered.hpp @@ -62,23 +62,6 @@ class Transpose0213MatMulLoweredFunction : public Transpose0213MatMulFunction { std::shared_ptr initLowered() const override; }; -class SoftmaxLoweredFunction : public SoftmaxFunction { -public: - explicit SoftmaxLoweredFunction(const std::vector& inputShapes, int axis) : SoftmaxFunction(inputShapes, axis) {} - -protected: - std::shared_ptr initLowered() const override; -}; - -// With LoopFusion pass -class AddSoftmaxLoweredFunction : public AddSoftmaxFunction { -public: - explicit AddSoftmaxLoweredFunction(const std::vector& inputShapes, int axis) : AddSoftmaxFunction(inputShapes, axis) {} - -protected: - std::shared_ptr initLowered() const override; -}; - class BroadcastAddLoweredFunction : public BroadcastAddFunction { public: explicit BroadcastAddLoweredFunction(const std::vector& inputShapes, const PartialShape& targetShape) : diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_matmul.hpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_matmul.hpp index 15954605e69fdd..55799aa0cecf84 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_matmul.hpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_matmul.hpp @@ -30,9 +30,9 @@ class MatMulFunction : public SnippetsFunctionBase { } static void verify_precisions(const std::vector& precisions) { NGRAPH_CHECK(precisions.size() == 2, "Got invalid number of input element types"); - const bool is_f32 = ngraph::snippets::utils::everyone_is(element::f32, precisions[0], precisions[1]); - const bool is_int8 = ngraph::snippets::utils::one_of(precisions[0], element::i8, element::u8) && precisions[1] == element::i8; - const bool is_bf16 = ngraph::snippets::utils::everyone_is(element::bf16, precisions[0], precisions[1]); + const bool is_f32 = ov::snippets::utils::everyone_is(element::f32, precisions[0], precisions[1]); + const bool is_int8 = ov::snippets::utils::one_of(precisions[0], element::i8, element::u8) && precisions[1] == element::i8; + const bool is_bf16 = ov::snippets::utils::everyone_is(element::bf16, precisions[0], precisions[1]); NGRAPH_CHECK(is_f32 || is_bf16 || is_int8, "Invalid precisions"); } protected: @@ -70,6 +70,41 @@ class MatMulBiasFunction : public SnippetsFunctionBase { std::vector precisions; }; + +// Quantized MatMul +// FQ[I8] +// Add +class MatMulBiasQuantizedFunction : public SnippetsFunctionBase { +public: + explicit MatMulBiasQuantizedFunction(const std::vector& inputShapes, const std::vector& precisions) + : SnippetsFunctionBase(inputShapes), precisions(precisions) { + NGRAPH_CHECK(input_shapes.size() == 3, "Got invalid number of input shapes"); + MatMulFunction::verify_precisions(precisions); + } +protected: + std::shared_ptr initOriginal() const override; + + std::vector precisions; +}; + +// Quantized MatMul FQ[I8] +// FQ[U8] Reshape <- To have only one sequence in Subgraph: MatMuL->FQ[U8]->MatMul->FQ[I8] +// \ / +// MatMul +// FQ[I8] +class MatMulsQuantizedFunction : public SnippetsFunctionBase { +public: + explicit MatMulsQuantizedFunction(const std::vector& inputShapes, const std::vector& precisions) + : SnippetsFunctionBase(inputShapes), precisions(precisions) { + NGRAPH_CHECK(input_shapes.size() == 3, "Got invalid number of input shapes"); + MatMulFunction::verify_precisions(precisions); + } +protected: + std::shared_ptr initOriginal() const override; + + std::vector precisions; +}; + /// Minimal graph to test MatMul+Transpose combinations. Transpose location is specified via the position argument: /// 0 - before the first MatMul input; 1 - before the second MatMul input; 2 - after the MatMul output. /// Tokenized simply by starting subgraph, @@ -121,6 +156,24 @@ class TransposeMulMatMulBiasFunction : public SnippetsFunctionBase { std::shared_ptr initOriginal() const override; }; +// Quantized MatMul FQ[I8] +// Softmax Reshape <- To have only one sequence in Subgraph: MatMuL->Softmax>FQ[U8]->MatMul->FQ[I8] +// FQ[U8] / +// MatMul +// FQ[I8] +class MatMulsQuantizedSoftmaxFunction : public SnippetsFunctionBase { +public: + explicit MatMulsQuantizedSoftmaxFunction(const std::vector& inputShapes, const std::vector& precisions) + : SnippetsFunctionBase(inputShapes), precisions(precisions) { + NGRAPH_CHECK(input_shapes.size() == 3, "Got invalid number of input shapes"); + MatMulFunction::verify_precisions(precisions); + } +protected: + std::shared_ptr initOriginal() const override; + + std::vector precisions; +}; + } // namespace snippets } // namespace test } // namespace ov diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_transpose.hpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_transpose.hpp index b77ea54e2575d9..27af52fdd84d32 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_transpose.hpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/include/subgraph_transpose.hpp @@ -15,7 +15,7 @@ namespace ov { namespace test { namespace snippets { -/// Minimal graph to test Transpose support: Parameter->Sinh->Transpose->Result +/// Minimal graph to test Transpose support: Parameter->Transpose->Result /// Tokenized simply by starting subgraph, supported through TransposeDecomposition // in1 Const(order) // Transpose @@ -31,6 +31,23 @@ class TransposeFunction : public SnippetsFunctionBase { std::shared_ptr initReference() const override; std::vector order; }; +/// Testing Transpose + Eltwise support on the example of Mul op +/// Tokenized simply by starting subgraph, supported through TransposeDecomposition +// in1 Const(order) +// Transpose +// in2 | +// Multiply +// Result +class TransposeMulFunction : public SnippetsFunctionBase { +public: + explicit TransposeMulFunction(const std::vector& inputShapes, std::vector order) + : SnippetsFunctionBase(inputShapes), order(std::move(order)) { + NGRAPH_CHECK(input_shapes.size() == 2, "Got invalid number of input shapes"); + } +protected: + std::shared_ptr initOriginal() const override; + std::vector order; +}; } // namespace snippets } // namespace test } // namespace ov diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/fake_quantize_function.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/fake_quantize_function.cpp index f4100bbba38562..8717476ed86409 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/fake_quantize_function.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/fake_quantize_function.cpp @@ -177,7 +177,7 @@ std::shared_ptr FakeQuantizeFunction::getSubgraphWithFakeQuantize( auto parent = FunctionHelper::applyPrerequisites(parameter, prerequisites); - const auto subgraph = std::make_shared( + const auto subgraph = std::make_shared( ngraph::OutputVector{ parent }, getSubgraphBody(inputShape, inputType, fakeQuantizeShapes, zeroPoint, beforeFakeQuantizeOperations)); subgraph->set_friendly_name("subgraph"); @@ -248,7 +248,7 @@ std::shared_ptr FakeQuantizeFunction::getSubgraphWithDecomposedFakeQu const auto parameter = std::make_shared(inputType, inputShape); parameter->set_friendly_name("parameter"); - const auto subgraph = std::make_shared( + const auto subgraph = std::make_shared( ngraph::OutputVector {parameter}, getSubgraphBody(inputShape, inputType, fakeQuantizeShapes, zeroPoint)); subgraph->set_friendly_name("subgraph"); diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/function_helper.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/function_helper.cpp index 68b41831fe6b3d..5bbc14e959ad19 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/function_helper.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/function_helper.cpp @@ -50,9 +50,9 @@ std::shared_ptr FunctionHelper::applyPrerequisites(const std::shared_ptr FunctionHelper::getSubgraph(const std::shared_ptr& f, const int index) { int currentIndex = 0; - std::shared_ptr subgraph; + std::shared_ptr subgraph; for (const auto& op : f->get_ordered_ops()) { - auto tmp_subgraph = as_type_ptr(op); + auto tmp_subgraph = as_type_ptr(op); if (tmp_subgraph != nullptr) { if (index == currentIndex) { return tmp_subgraph; diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/precision_propagation_function.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/precision_propagation_function.cpp index c416573cf5cbdc..6e6b91c1778909 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/precision_propagation_function.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/precision_propagation_function.cpp @@ -24,7 +24,7 @@ std::shared_ptr PrecisionPropagationAddFunction::get( const auto create_convert = [](std::shared_ptr parent, const element::Type convertion_type) -> std::shared_ptr { return convertion_type == element::undefined ? std::dynamic_pointer_cast(parent) - : std::make_shared(parent, convertion_type); + : std::make_shared(parent, convertion_type); }; const auto make_branch = [&create_convert]( @@ -53,7 +53,7 @@ std::shared_ptr PrecisionPropagationAddFunction::get( convertion_before_op2_2.second; if ((convertion_before_op2_2.first == element::undefined) && (parent->get_output_element_type(0) != maximum_in2_type)) { - parent = std::make_shared(parent, maximum_in2_type); + parent = std::make_shared(parent, maximum_in2_type); } parent = std::make_shared( diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_convert.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_convert.cpp index 17f419757a0ac0..00143f9f623f4e 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_convert.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_convert.cpp @@ -25,8 +25,8 @@ std::shared_ptr ConvertFunction::initOriginal() const { std::shared_ptr ConvertFunction::initReference() const { auto data0 = std::make_shared(inType, input_shapes[0]); auto indata0 = std::make_shared(inType, data0->get_shape()); - auto subgraph = std::make_shared(NodeVector{data0}, - std::make_shared(NodeVector{std::make_shared(indata0, outType)}, + auto subgraph = std::make_shared(NodeVector{data0}, + std::make_shared(NodeVector{std::make_shared(indata0, outType)}, ParameterVector{indata0})); return std::make_shared(NodeVector{subgraph}, ParameterVector{data0}); } @@ -43,8 +43,8 @@ std::shared_ptr ConvertInputFunction::initReference() const { auto data1 = std::make_shared(outType, input_shapes[1]); auto indata0 = std::make_shared(inType, data0->get_shape()); auto indata1 = std::make_shared(outType, data1->get_shape()); - auto convert = std::make_shared(indata0, outType); - auto subgraph = std::make_shared(NodeVector{data0, data1}, + auto convert = std::make_shared(indata0, outType); + auto subgraph = std::make_shared(NodeVector{data0, data1}, std::make_shared( NodeVector{std::make_shared(convert, indata1)}, ParameterVector{indata0, indata1})); @@ -64,8 +64,8 @@ std::shared_ptr ConvertOutputFunction::initReference() const { auto indata0 = std::make_shared(inType, data0->get_shape()); auto indata1 = std::make_shared(inType, data1->get_shape()); auto add = std::make_shared(indata0, indata1); - auto convert = std::make_shared(add, outType); - auto subgraph = std::make_shared(NodeVector{data0, data1}, + auto convert = std::make_shared(add, outType); + auto subgraph = std::make_shared(NodeVector{data0, data1}, std::make_shared( NodeVector{convert}, ParameterVector{indata0, indata1})); @@ -86,12 +86,12 @@ std::shared_ptr ConvertStubFunction::initReference() const { auto indata0 = std::make_shared(inType, data0->get_shape()); auto indata1 = std::make_shared(inType, data1->get_shape()); auto add = std::make_shared(indata0, indata1); - auto convert = std::make_shared(add, outType); - auto subgraph0 = std::make_shared( + auto convert = std::make_shared(add, outType); + auto subgraph0 = std::make_shared( NodeVector{data0, data1}, std::make_shared(NodeVector{convert}, ParameterVector{indata0, indata1})); auto indata2 = std::make_shared(convert->get_destination_type(), convert->get_shape()); auto relu = std::make_shared(indata2); - auto subgraph1 = std::make_shared( + auto subgraph1 = std::make_shared( NodeVector{subgraph0}, std::make_shared(NodeVector{relu}, ParameterVector{indata2})); return std::make_shared(NodeVector{subgraph1}, ParameterVector{data0, data1}); } @@ -116,13 +116,13 @@ std::shared_ptr ConvertPartialInputsAndResultsFunction::initReference auto indata0 = std::make_shared(inTypes[0], data0->get_shape()); auto indata1 = std::make_shared(inTypes[1], data1->get_shape()); auto indata2 = std::make_shared(inTypes[2], data2->get_shape()); - auto convert0 = std::make_shared(indata0, outTypes[0]); - auto convert1 = std::make_shared(indata1, outTypes[0]); + auto convert0 = std::make_shared(indata0, outTypes[0]); + auto convert1 = std::make_shared(indata1, outTypes[0]); auto add = std::make_shared(convert0, convert1); auto relu = std::make_shared(add); auto sub = std::make_shared(relu, indata2); - auto convert2 = std::make_shared(relu, outTypes[1]); - auto subgraph = std::make_shared( + auto convert2 = std::make_shared(relu, outTypes[1]); + auto subgraph = std::make_shared( NodeVector{data0, data1, data2}, std::make_shared(NodeVector{sub, convert2}, ParameterVector{indata0, indata1, indata2})); auto stub3 = createRollAsStub(subgraph); return std::make_shared(OutputVector{subgraph->output(1), stub3->output(0)}, @@ -144,11 +144,11 @@ std::shared_ptr ConvertManyOnInputsFunction::initReference() const { auto indata0 = std::make_shared(types[0], data0->get_shape()); std::shared_ptr out = indata0; for (auto i = 1; i < types.size(); i++) { - auto convert = std::make_shared(out, types[i]); + auto convert = std::make_shared(out, types[i]); out = convert; } auto relu = std::make_shared(out); - auto subgraph = std::make_shared(NodeVector{data0}, + auto subgraph = std::make_shared(NodeVector{data0}, std::make_shared(NodeVector{relu}, ParameterVector{indata0})); return std::make_shared(NodeVector{subgraph}, ParameterVector{data0}); } @@ -169,10 +169,10 @@ std::shared_ptr ConvertManyOnOutputsFunction::initReference() const { auto relu = std::make_shared(indata0); std::shared_ptr out = relu; for (auto i = 1; i < types.size(); i++) { - auto convert = std::make_shared(out, types[i]); + auto convert = std::make_shared(out, types[i]); out = convert; } - auto subgraph = std::make_shared(NodeVector{data0}, + auto subgraph = std::make_shared(NodeVector{data0}, std::make_shared(NodeVector{out}, ParameterVector{indata0})); return std::make_shared(NodeVector{subgraph}, ParameterVector{data0}); } @@ -206,7 +206,7 @@ std::shared_ptr ConvertManyOnInputOutputFunction::initReference() con auto convert = std::make_shared(out, outTypes[i]); out = convert; } - auto subgraph = std::make_shared(NodeVector{data0}, + auto subgraph = std::make_shared(NodeVector{data0}, std::make_shared(NodeVector{out}, ParameterVector{indata0})); return std::make_shared(NodeVector{subgraph}, ParameterVector{data0}); } diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_customizable.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_customizable.cpp index 9975f5185c1b61..f410eb87f5b65e 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_customizable.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_customizable.cpp @@ -53,7 +53,7 @@ std::shared_ptr ConvMulActivationFunction::initReference() const { auto ineltwise_unary_1 = custom_ops[1]->clone_with_new_inputs({ineltwise_binary->output(0)}); auto ineltwise_unary_2 = custom_ops[2]->clone_with_new_inputs({ineltwise_unary_1->output(0)}); - auto subgraph = std::make_shared(NodeVector{conv, eltwise_sinh}, + auto subgraph = std::make_shared(NodeVector{conv, eltwise_sinh}, std::make_shared(NodeVector{ineltwise_unary_2}, ParameterVector{indata0, indata1})); return std::make_shared(NodeVector{subgraph}, ParameterVector{conv_param, eltwise_param}); diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_fq.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_fq.cpp index 35bdbcacba04a5..13b3a3f3ef795a 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_fq.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_fq.cpp @@ -59,7 +59,7 @@ std::shared_ptr ThreeFQFunction::initReference() const { std::vector{127}, ov::element::i8); auto fq1 = ngraph::builder::subgraph::makeFakeQuantizeTypeRelaxed(fq0, ov::element::f32, fq1_data); - auto subgraph0 = std::make_shared(NodeVector{data0}, + auto subgraph0 = std::make_shared(NodeVector{data0}, std::make_shared(NodeVector{fq1}, ParameterVector{indata0})); auto indata1 = std::make_shared(precision, subgraph0->get_shape()); @@ -71,7 +71,7 @@ std::shared_ptr ThreeFQFunction::initReference() const { std::vector{255}, ov::element::u8); auto fq2 = ngraph::builder::subgraph::makeFakeQuantizeTypeRelaxed(indata1, ov::element::f32, fq2_data); - auto subgraph1 = std::make_shared(NodeVector{subgraph0}, + auto subgraph1 = std::make_shared(NodeVector{subgraph0}, std::make_shared(NodeVector{fq2}, ParameterVector{indata1})); return std::make_shared(NodeVector{subgraph1}, ParameterVector{data0}); diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_lowered.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_lowered.cpp index 6c818b6078cdc6..6d9bb3e93f1cb9 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_lowered.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_lowered.cpp @@ -6,7 +6,6 @@ #include "common_test_utils/data_utils.hpp" #include #include "ngraph_functions/builders.hpp" -#include "snippets/pass/loop_helpers.hpp" namespace ov { namespace test { @@ -16,38 +15,22 @@ std::shared_ptr AddFunctionLoweredBroadcast::initLowered() const { auto data0 = std::make_shared(precision, input_shapes[0]); std::shared_ptr add_input0 = nullptr; if (!broadcast_shapes[0].empty() && broadcast_shapes[0].back() != input_shapes[0].rbegin()->get_length()) { - add_input0 = std::make_shared(data0, broadcast_shapes[0]); + add_input0 = std::make_shared(data0, broadcast_shapes[0]); } else { - add_input0 = std::make_shared(data0); + add_input0 = std::make_shared(data0); } auto data1 = std::make_shared(precision, input_shapes[1]); std::shared_ptr add_input1 = nullptr; if (!broadcast_shapes[1].empty() && broadcast_shapes[1].back() != input_shapes[1].rbegin()->get_length()) { - add_input1 = std::make_shared(data1, broadcast_shapes[1]); + add_input1 = std::make_shared(data1, broadcast_shapes[1]); } else { - add_input1 = std::make_shared(data1); + add_input1 = std::make_shared(data1); } auto add = std::make_shared(add_input0, add_input1); - auto store = std::make_shared(add); + auto store = std::make_shared(add); ParameterVector input_params {data0, data1}; - auto model = std::make_shared(NodeVector{store}, input_params); - - // Create dummy scheduler to pass graph comparison tests - // Note that if there is more than one results, they should be reverted - ResultVector results({model->get_results()[0]}); - const auto& inner_loop_begin = ngraph::snippets::op::insertLoopBegin(input_params); - std::vector apply_increments(input_params.size() + results.size(), true); - insertLoopEnd(results, inner_loop_begin, 1, 1, apply_increments); - auto outer_WA = std::accumulate(input_shapes.begin(), input_shapes.end(), 0, - [](int64_t max_val, const PartialShape& ps) { - return std::max(ps[ps.size() - 2].get_length(), max_val); - }); - if (outer_WA > 1) { - const auto& outer_loop_begin = ngraph::snippets::op::insertLoopBegin(input_params); - insertLoopEnd(results, outer_loop_begin, 1, 1, apply_increments); - } - return model; + return std::make_shared(NodeVector{store}, input_params); } std::shared_ptr EltwiseThreeInputsLoweredFunction::initLowered() const { // todo: implement conversion between std::vector and std::vector @@ -58,52 +41,36 @@ std::shared_ptr EltwiseThreeInputsLoweredFunction::initLowered() cons auto load_or_broadcastload = [&](size_t i) -> std::shared_ptr { // user specified that no broadcasting is required if (broadcast_shapes[i].empty()) { - return std::make_shared(input_params[i]); + return std::make_shared(input_params[i]); // broadcasting is required: could be Load + BroadcastMove or BroiadcastLoad } else { // The last dim is processed by vector Tile, so BroadcastLoad is required if the last dim being broadcasted if (input_shapes[i].rbegin()->get_length() == 1 && broadcast_shapes[i].back() != 1) { - return std::make_shared(input_params[i], broadcast_shapes[i]); + return std::make_shared(input_params[i], broadcast_shapes[i]); // Todo: Cover this logics with functional tests, Review FakeBroadcast Emitter // Broadcasting of other dims is handled by BroadcastMove. Strictly speaking, broadcasting is achieved via // appropriate pointer arithmetics in this case. } else { - auto load = std::make_shared(input_params[i]); - return std::make_shared(load, broadcast_shapes[i]); + auto load = std::make_shared(input_params[i]); + return std::make_shared(load, broadcast_shapes[i]); } } }; auto add = std::make_shared(load_or_broadcastload(0), load_or_broadcastload(1)); const std::vector const_values = CommonTestUtils::generate_float_numbers(1, -10., 10.); - auto sub_scalar = std::make_shared(precision, Shape{1}, const_values[0]); + auto sub_scalar = std::make_shared(precision, Shape{1}, const_values[0]); std::shared_ptr sub_load; - sub_load = std::make_shared(input_params[2]); + sub_load = std::make_shared(input_params[2]); auto sub = std::make_shared(sub_load, sub_scalar); std::shared_ptr sub_out; if (broadcast_shapes[2].empty()) sub_out = sub; else - sub_out = std::make_shared(sub, broadcast_shapes[2]); + sub_out = std::make_shared(sub, broadcast_shapes[2]); auto mul = std::make_shared(add, sub_out); - auto store = std::make_shared(mul); - auto model = std::make_shared(NodeVector{store}, input_params); - - // Create dummy scheduler to pass graph comparison tests - // Note that if there is more than one results, they should be reverted - ResultVector results({model->get_results()[0]}); - const auto& inner_loop_begin = ngraph::snippets::op::insertLoopBegin(input_params); - std::vector apply_increments(input_params.size() + results.size(), true); - const auto& inner_loop_end = insertLoopEnd(results, inner_loop_begin, 1, 1, apply_increments); - auto outer_WA = std::accumulate(input_shapes.begin(), input_shapes.end(), 0, - [](int64_t max_val, const PartialShape& ps) { - return std::max(ps[ps.size() - 2].get_length(), max_val); - }); - if (outer_WA > 1) { - const auto& outer_loop_begin = ngraph::snippets::op::insertLoopBegin(input_params); - insertLoopEnd(results, outer_loop_begin, 1, 1, apply_increments); - } - return model; + auto store = std::make_shared(mul); + return std::make_shared(NodeVector{store}, input_params); } std::shared_ptr Transpose0213MatMulLoweredFunction::initLowered() const { @@ -111,279 +78,40 @@ std::shared_ptr Transpose0213MatMulLoweredFunction::initLowered() con std::make_shared(precisions[1], input_shapes[1])}; std::vector layout{0, 2, 1, 3}; // Note: validity of transpose_position values is checked in Transpose0213MatMulSinhFunction constructor - if (transpose_position <= 1) { - auto &rt_info = data[transpose_position]->get_rt_info(); - rt_info["Layout"] = layout; - } - auto matmul = std::make_shared(data[0], data[1]); + if (transpose_position < 2) { + const auto& anchor = data[transpose_position]->output(0); + const auto& td = ov::snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(anchor); + const auto& tensor = td->get_shape(); + const auto& subtensor = td->get_subtensor(); + } + auto matmul = std::make_shared(data[0], data[1], 0, 0, 0, transpose_position == 0 ? layout : std::vector{}, + transpose_position == 1 ? layout : std::vector{}, + transpose_position == 2 ? layout : std::vector{}); + auto result = std::make_shared(matmul); if (transpose_position == 2) { - auto &rt_info = matmul->get_rt_info(); - rt_info["Layout"] = layout; - matmul->validate_and_infer_types(); - } + const auto& anchor = matmul->output(0); + const auto& td = ov::snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(anchor); + const auto& tensor = td->get_shape(); + const auto& subtensor = td->get_subtensor(); + ov::snippets::lowered::PortDescriptorUtils::set_port_descriptor_ptr(anchor, + std::make_shared(tensor, + subtensor, + layout)); + } + if (transpose_position < 2) { + const auto& anchor = data[transpose_position]->output(0); + const auto& td = ov::snippets::lowered::PortDescriptorUtils::get_port_descriptor_ptr(anchor); + const auto& tensor = td->get_shape(); + const auto& subtensor = td->get_subtensor(); + ov::snippets::lowered::PortDescriptorUtils::set_port_descriptor_ptr(matmul->input(transpose_position), + std::make_shared(tensor, + subtensor, + layout)); + } + matmul->validate_and_infer_types(); return std::make_shared(NodeVector{matmul}, data); } -std::shared_ptr SoftmaxLoweredFunction::initLowered() const { - auto input_params = ngraph::builder::makeParams(precision, {input_shapes[0].get_shape()}); - - const auto data = input_params.front(); - - const auto master_shape = input_shapes[0].get_shape(); - const auto shape_rank = master_shape.size(); - const auto dimension = shape_rank - 1; - const auto work_amount = master_shape[dimension]; - const auto increment = 10; - const auto inner_dim = shape_rank - 1; - const auto inner_master_wa = static_cast(master_shape[inner_dim]); - const int outer_dim = shape_rank > 1 ? shape_rank - 2 : -1; - const auto has_outer_loop = outer_dim >= 0 && master_shape[outer_dim] > 1; - const bool is_scalar = work_amount == 1; - - /* ====== ReduceMax decomposition ====== */ - - const auto vector_buffer_max = std::make_shared(); - const auto loop_max_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{data, data}); - - // we don't insert Fill here after load_max to verify because in generate() call Fill op is inserted only on vector representation - const auto load_max = std::make_shared(loop_max_begin->output(0), increment); - const auto max = std::make_shared(load_max, vector_buffer_max); - - std::vector apply_increments_max(3, false); - std::vector finalization_offsets_max(3, 0); - apply_increments_max[0] = data->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - finalization_offsets_max[0] = data->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - const auto loop_max_end = std::make_shared(ngraph::OutputVector{loop_max_begin->output(1), loop_max_begin->output(2)}, - work_amount, increment, apply_increments_max, finalization_offsets_max); - - std::shared_ptr horizon_max = std::make_shared(max); - horizon_max->add_control_dependency(loop_max_end); - const auto prev_horizon_max = horizon_max; - if (!is_scalar) { - horizon_max = std::make_shared(horizon_max, horizon_max->get_input_partial_shape(0)); - } - - loop_max_begin->add_control_dependency(vector_buffer_max); - loop_max_end->add_control_dependency(max); - - /* =========================================== */ - - /* === Sub + Exp + ReduceSum decomposition === */ - - const auto vector_buffer_sum = std::make_shared(); - const auto loop_sum_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{loop_max_end->output(0)}); - - const auto load_sub = std::make_shared(loop_sum_begin->output(0), increment); - const auto sub = std::make_shared(load_sub, horizon_max); - // we don't insert Fill here after Exp to verify because in generate() call Fill op is inserted only on vector representation - const auto exp = std::make_shared(sub); - const auto sum = std::make_shared(exp, vector_buffer_sum); - const auto store_exp = std::make_shared(exp, increment); - - std::vector apply_increments_sum(2, false); - std::vector finalization_offsets_sum(2, 0); - apply_increments_sum[0] = load_sub->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - apply_increments_sum[1] = store_exp->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - finalization_offsets_sum[0] = has_outer_loop && load_sub->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - finalization_offsets_sum[1] = store_exp->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - const auto loop_sum_end = std::make_shared( - ngraph::OutputVector{store_exp, loop_sum_begin->output(1)}, work_amount, increment, - apply_increments_sum, finalization_offsets_sum); - loop_sum_end->add_control_dependency(sum); - - const auto horizon_sum = std::make_shared(sum); - horizon_sum->add_control_dependency(loop_sum_end); - - const auto size_exp = std::make_shared(ov::element::i32, ov::Shape{2}); - const auto buffer_exp = std::make_shared(loop_sum_end->output(0)); - - loop_sum_begin->add_control_dependency(vector_buffer_sum); - loop_sum_begin->add_control_dependency(horizon_max); - loop_sum_begin->add_control_dependency(prev_horizon_max); - - /* =========================================== */ - - /* ================== Div ==================== */ - - std::shared_ptr pow = std::make_shared(horizon_sum, -1); - const auto prev_pow = pow; - if (!is_scalar) { - pow = std::make_shared(pow, horizon_sum->get_input_partial_shape(0)); - } - - const auto loop_div_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{buffer_exp}); - - const auto load_div = std::make_shared(loop_div_begin->output(0), increment); - const auto mul = std::make_shared(load_div, pow); - const auto store_div = std::make_shared(mul, increment); - - std::vector apply_increments_div(2, false); - std::vector finalization_offsets_div(2, 0); - apply_increments_div[0] = load_div->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - apply_increments_div[1] = store_div->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - finalization_offsets_div[0] = has_outer_loop && load_div->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - finalization_offsets_div[1] = has_outer_loop && store_div->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - const auto loop_div_end = std::make_shared( - ngraph::OutputVector{store_div, loop_div_begin->output(1)}, work_amount, increment, - apply_increments_div, finalization_offsets_div); - loop_div_begin->add_control_dependency(horizon_sum); - loop_div_begin->add_control_dependency(pow); - loop_div_begin->add_control_dependency(prev_pow); - - /* =========================================== */ - - const auto result = std::make_shared(loop_div_end); - if (has_outer_loop) { - const auto need_increment = input_shapes[0].get_shape()[outer_dim] != 1 && input_shapes[0].get_shape()[inner_dim] == 1; - const auto& outer_loop_begin = ngraph::snippets::op::insertLoopBegin(input_params); - const auto outer_loop_end = insertLoopEnd(NodeVector{result}, outer_loop_begin, 1, 1, std::vector{need_increment, need_increment}); - vector_buffer_max->add_control_dependency(outer_loop_begin); - } - - return std::make_shared(ResultVector{result}, input_params); -} -std::shared_ptr AddSoftmaxLoweredFunction::initLowered() const { - auto input_params = ngraph::builder::makeParams(precision, {input_shapes[0].get_shape(), input_shapes[1].get_shape()}); - - auto master_pshape = input_shapes[0]; - ov::PartialShape::broadcast_merge_into(master_pshape, input_shapes[1], op::AutoBroadcastType::NUMPY); - const auto master_shape = master_pshape.get_shape(); - const auto shape_rank = master_shape.size(); - const auto dimension = shape_rank - 1; - const auto work_amount = master_shape[dimension]; - const auto increment = 10; - const auto inner_dim = shape_rank - 1; - const auto inner_master_wa = static_cast(master_shape[inner_dim]); - const int outer_dim = shape_rank > 1 ? shape_rank - 2 : -1; - const auto has_outer_loop = outer_dim >= 0 && master_shape[outer_dim] > 1; - const bool is_scalar = work_amount == 1; - - /* ================== Add + ReduceMax ==================== */ - - const auto vector_buffer_max = std::make_shared(); - const auto loop_max_begin = ngraph::snippets::op::insertLoopBegin(input_params); - - std::shared_ptr load0 = std::make_shared(loop_max_begin->output(0), increment); - if (!is_scalar && input_shapes[0].get_shape().back() == 1) { - auto new_shape = input_shapes[0].get_shape(); - new_shape[new_shape.size() - 1] = static_cast(inner_master_wa); - load0 = std::make_shared(loop_max_begin->output(0), new_shape); - } - std::shared_ptr load1 = std::make_shared(loop_max_begin->output(1), increment); - if (!is_scalar && input_shapes[1].get_shape().back() == 1) { - auto new_shape = input_shapes[1].get_shape(); - new_shape[new_shape.size() - 1] = static_cast(inner_master_wa); - load1 = std::make_shared(loop_max_begin->output(1), new_shape); - } - const auto add = std::make_shared(load0, load1); - const auto store = std::make_shared(add, increment); - - // we don't insert Fill here after load_max to verify because in generate() call Fill op is inserted only on vector representation - const auto max = std::make_shared(add, vector_buffer_max); - - std::vector apply_increments_max(3, false); - std::vector finalization_offsets_max(3, 0); - apply_increments_max[0] = input_shapes[0].get_shape()[inner_dim] != 1 && inner_master_wa != 1; - apply_increments_max[1] = input_shapes[1].get_shape()[inner_dim] != 1 && inner_master_wa != 1; - apply_increments_max[2] = master_shape[inner_dim] != 1 && inner_master_wa != 1; - finalization_offsets_max[0] = input_shapes[0].get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - finalization_offsets_max[1] = input_shapes[1].get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - finalization_offsets_max[2] = master_shape[inner_dim] != 1 ? -inner_master_wa : 0; - const auto loop_max_end = std::make_shared(ngraph::OutputVector{store, loop_max_begin->output(2)}, - work_amount, increment, apply_increments_max, finalization_offsets_max); - - std::shared_ptr horizon_max = std::make_shared(max); - horizon_max->add_control_dependency(loop_max_end); - const auto prev_horizon_max = horizon_max; - if (!is_scalar) { - horizon_max = std::make_shared(horizon_max, horizon_max->get_input_partial_shape(0)); - } - - loop_max_begin->add_control_dependency(vector_buffer_max); - loop_max_end->add_control_dependency(max); - - /* =========================================== */ - - const auto size_add = std::make_shared(ov::element::i32, ov::Shape{2}); - const auto buffer_add = std::make_shared(loop_max_end->output(0)); - - /* === Sub + Exp + ReduceSum decomposition === */ - - const auto vector_buffer_sum = std::make_shared(); - const auto loop_sum_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{buffer_add->output(0)}); - - const auto load_sub = std::make_shared(loop_sum_begin->output(0), increment); - const auto sub = std::make_shared(load_sub, horizon_max); - // we don't insert Fill here after exp to verify because in generate() call Fill op is inserted only on vector representation - const auto exp = std::make_shared(sub); - const auto sum = std::make_shared(exp, vector_buffer_sum); - const auto store_exp = std::make_shared(exp, increment); - - std::vector apply_increments_sum(2, false); - std::vector finalization_offsets_sum(2, 0); - apply_increments_sum[0] = load_sub->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - apply_increments_sum[1] = store_exp->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - finalization_offsets_sum[0] = has_outer_loop && load_sub->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - finalization_offsets_sum[1] = store_exp->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - const auto loop_sum_end = std::make_shared( - ngraph::OutputVector{store_exp, loop_sum_begin->output(1)}, work_amount, increment, - apply_increments_sum, finalization_offsets_sum); - loop_sum_end->add_control_dependency(sum); - - const auto horizon_sum = std::make_shared(sum); - horizon_sum->add_control_dependency(loop_sum_end); - - const auto size_exp = std::make_shared(ov::element::i32, ov::Shape{2}); - const auto buffer_exp = std::make_shared(loop_sum_end->output(0)); - - loop_sum_begin->add_control_dependency(vector_buffer_sum); - loop_sum_begin->add_control_dependency(horizon_max); - loop_sum_begin->add_control_dependency(prev_horizon_max); - - /* =========================================== */ - - /* ================== Div ==================== */ - - std::shared_ptr pow = std::make_shared(horizon_sum, -1); - const auto prev_pow = pow; - if (!is_scalar) { - pow = std::make_shared(pow, horizon_sum->get_input_partial_shape(0)); - } - - const auto loop_div_begin = ngraph::snippets::op::insertLoopBegin(ngraph::OutputVector{buffer_exp}); - - const auto load_div = std::make_shared(loop_div_begin->output(0), increment); - const auto mul = std::make_shared(load_div, pow); - const auto store_div = std::make_shared(mul, increment); - - std::vector apply_increments_div(2, false); - std::vector finalization_offsets_div(2, 0); - apply_increments_div[0] = load_div->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - apply_increments_div[1] = store_div->get_shape()[inner_dim] != 1 && inner_master_wa != 1; - finalization_offsets_div[0] = has_outer_loop && load_div->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - finalization_offsets_div[1] = has_outer_loop && store_div->get_shape()[inner_dim] != 1 ? -inner_master_wa : 0; - const auto loop_div_end = std::make_shared( - ngraph::OutputVector{store_div, loop_div_begin->output(1)}, work_amount, increment, - apply_increments_div, finalization_offsets_div); - loop_div_begin->add_control_dependency(horizon_sum); - loop_div_begin->add_control_dependency(pow); - loop_div_begin->add_control_dependency(prev_pow); - - /* =========================================== */ - - const auto result = std::make_shared(loop_div_end); - if (has_outer_loop) { - const auto need_increment0 = input_shapes[0].get_shape()[outer_dim] != 1 && input_shapes[0].get_shape()[inner_dim] == 1; - const auto need_increment1 = input_shapes[1].get_shape()[outer_dim] != 1 && input_shapes[1].get_shape()[inner_dim] == 1; - const auto need_increment2 = master_shape[outer_dim] != 1 && master_shape[inner_dim] == 1; - const auto outer_loop_begin = ngraph::snippets::op::insertLoopBegin(input_params); - const auto outer_loop_end = insertLoopEnd( - NodeVector{result}, outer_loop_begin, 1, 1, std::vector{need_increment0, need_increment1, need_increment2}); - vector_buffer_max->add_control_dependency(outer_loop_begin); - } - - return std::make_shared(ResultVector{result}, input_params); -} std::shared_ptr BroadcastAddLoweredFunction::initLowered() const { auto data0 = std::make_shared(precision, input_shapes[0]); auto data1 = std::make_shared(precision, input_shapes[1]); @@ -394,30 +122,14 @@ std::shared_ptr BroadcastAddLoweredFunction::initLowered() const { if (input_shapes[i].get_shape().back() != last_dim) { auto new_shape = input_shapes[i]; new_shape[new_shape.size() - 1] = last_dim; - loads[i] = std::make_shared(datas[i], new_shape); + loads[i] = std::make_shared(datas[i], new_shape); } else { - loads[i] = std::make_shared(datas[i]); + loads[i] = std::make_shared(datas[i]); } } auto add = std::make_shared(loads[0], loads[1]); - auto store = std::make_shared(add); - auto model = std::make_shared(NodeVector{store}, ParameterVector{data0, data1}); - - // Create dummy scheduler to pass graph comparison tests - // Note that if there is more than one results, they should be reverted - ResultVector results({model->get_results()[0]}); - const auto& inner_loop_begin = ngraph::snippets::op::insertLoopBegin(datas); - std::vector apply_increments(datas.size() + results.size(), true); - insertLoopEnd(results, inner_loop_begin, 1, 1, apply_increments); - auto outer_WA = std::accumulate(input_shapes.begin(), input_shapes.end(), 0, - [](int64_t max_val, const PartialShape& ps) { - return std::max(ps[ps.size() - 2].get_length(), max_val); - }); - if (outer_WA > 1) { - const auto& outer_loop_begin = ngraph::snippets::op::insertLoopBegin(datas); - insertLoopEnd(results, outer_loop_begin, 1, 1, apply_increments); - } - return model; + auto store = std::make_shared(add); + return std::make_shared(NodeVector{store}, ParameterVector{data0, data1}); } } // namespace snippets } // namespace test diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_matmul.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_matmul.cpp index b213c66eccacc6..c5086525ec1e52 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_matmul.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_matmul.cpp @@ -42,7 +42,7 @@ std::shared_ptr MatMulFunction::initReference() const { } else { matmul = std::make_shared(indata0, indata1); } - const auto subgraph = std::make_shared(NodeVector{data0, data1}, + const auto subgraph = std::make_shared(NodeVector{data0, data1}, std::make_shared(NodeVector{matmul}, ParameterVector{indata0, indata1})); return std::make_shared(NodeVector{subgraph}, ParameterVector{data0, data1}); @@ -91,6 +91,41 @@ std::shared_ptr MatMulBiasFunction::initOriginal() const { auto bias = std::make_shared(matmul, data2); return std::make_shared(NodeVector{bias}, ParameterVector{data0, data1, data2}); } +std::shared_ptr MatMulBiasQuantizedFunction::initOriginal() const { + auto data0 = std::make_shared(precisions[0], input_shapes[0]); + auto data1 = std::make_shared(precisions[1], input_shapes[1]); + auto data2 = std::make_shared(precision, input_shapes[2]); + auto matmul = std::make_shared>( + std::vector{element::f32, element::f32}, + std::vector{ element::f32 }, + ov::op::TemporaryReplaceOutputType(data0, element::f32).get(), + ov::op::TemporaryReplaceOutputType(data1, element::f32).get()); + auto fq2 = ngraph::builder::makeFakeQuantize(matmul, ov::element::f32, 256, {1}, {-35.0172004}, {34.7436294}, {-35.0172004}, {34.7436294}); + auto bias = std::make_shared(fq2, data2); + return std::make_shared(NodeVector{bias}, ParameterVector{data0, data1, data2}); +} +std::shared_ptr MatMulsQuantizedFunction::initOriginal() const { + auto data0 = std::make_shared(precisions[0], input_shapes[0]); + auto data1 = std::make_shared(precisions[1], input_shapes[1]); + auto data2 = std::make_shared(precision, input_shapes[2]); + auto matmul0 = std::make_shared>( + std::vector{element::f32, element::f32}, + std::vector{ element::f32 }, + ov::op::TemporaryReplaceOutputType(data0, element::f32).get(), + ov::op::TemporaryReplaceOutputType(data1, element::f32).get()); + auto fq0 = ngraph::builder::makeFakeQuantize(matmul0, ov::element::f32, 256, {1}, {0}, {0.820726}, {0}, {0.820726}); + auto fq2 = ngraph::builder::makeFakeQuantize(data2, ov::element::f32, 256, {1}, {-35.0172004}, {34.7436294}, {-35.0172004}, {34.7436294}); + auto new_shape = std::make_shared(ov::element::u64, ov::Shape{4}, + std::vector{1, 1, input_shapes[2].get_shape()[0], input_shapes[2].get_shape()[1]}); + auto reshape = std::make_shared(fq2, new_shape, false); + auto matmul1 = std::make_shared>( + std::vector{element::f32, element::f32}, + std::vector{ element::f32 }, + ov::op::TemporaryReplaceOutputType(fq0, element::f32).get(), + ov::op::TemporaryReplaceOutputType(reshape, element::f32).get()); + auto fq3 = ngraph::builder::makeFakeQuantize(matmul1, ov::element::f32, 256, {1}, {-35.0172004}, {34.7436294}, {-35.0172004}, {34.7436294}); + return std::make_shared(NodeVector{fq3}, ParameterVector{data0, data1, data2}); +} std::shared_ptr Transpose0213MatMulFunction::initOriginal() const { auto data0 = std::make_shared(precisions[0], input_shapes[0]); auto data1 = std::make_shared(precisions[1], input_shapes[1]); @@ -169,6 +204,29 @@ std::shared_ptr TransposeMulMatMulBiasFunction::initOriginal() const auto bias = std::make_shared(matmul, data3); return std::make_shared(NodeVector{bias}, ParameterVector{data0, data1, data2, data3}); } +std::shared_ptr MatMulsQuantizedSoftmaxFunction::initOriginal() const { + auto data0 = std::make_shared(precisions[0], input_shapes[0]); + auto data1 = std::make_shared(precisions[1], input_shapes[1]); + auto data2 = std::make_shared(precision, input_shapes[2]); + auto matmul0 = std::make_shared>( + std::vector{element::f32, element::f32}, + std::vector{ element::f32 }, + ov::op::TemporaryReplaceOutputType(data0, element::f32).get(), + ov::op::TemporaryReplaceOutputType(data1, element::f32).get()); + auto softmax = std::make_shared(matmul0, -1); + auto fq0 = ngraph::builder::makeFakeQuantize(softmax, ov::element::f32, 256, {1}, {0}, {0.820726}, {0}, {0.820726}); + auto fq2 = ngraph::builder::makeFakeQuantize(data2, ov::element::f32, 256, {1}, {-35.0172004}, {34.7436294}, {-35.0172004}, {34.7436294}); + auto new_shape = std::make_shared(ov::element::u64, ov::Shape{4}, + std::vector{1, 1, input_shapes[2].get_shape()[0], input_shapes[2].get_shape()[1]}); + auto reshape = std::make_shared(fq2, new_shape, false); + auto matmul1 = std::make_shared>( + std::vector{element::f32, element::f32}, + std::vector{ element::f32 }, + ov::op::TemporaryReplaceOutputType(fq0, element::f32).get(), + ov::op::TemporaryReplaceOutputType(reshape, element::f32).get()); + auto fq3 = ngraph::builder::makeFakeQuantize(matmul1, ov::element::f32, 256, {1}, {-35.0172004}, {34.7436294}, {-35.0172004}, {34.7436294}); + return std::make_shared(NodeVector{fq3}, ParameterVector{data0, data1, data2}); +} } // namespace snippets } // namespace test diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_mha.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_mha.cpp index b3b3c8d0f9b13e..d904790bce2902 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_mha.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_mha.cpp @@ -128,7 +128,7 @@ std::shared_ptr MHAFunction::initReference() const { const auto matMul1 = std::make_shared(reshape1, transpose2, transA, transB); const auto transpose3 = std::make_shared(matMul1, transpose3Const); - auto subgraph = std::make_shared(subgraph_inputs, + auto subgraph = std::make_shared(subgraph_inputs, std::make_shared(NodeVector{transpose3}, subgraph_params)); return std::make_shared(NodeVector{subgraph}, ngraphParams); @@ -242,7 +242,7 @@ std::shared_ptr MHAMatMul0TransposeFunction::initReference() const { const auto matMul1 = std::make_shared(reshape1, transpose2, transA, transB); const auto transpose3 = std::make_shared(matMul1, transpose3Const); - auto subgraph = std::make_shared( + auto subgraph = std::make_shared( NodeVector{data0, data1, data2, data3}, std::make_shared(NodeVector{transpose3}, subgraphParams)); diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_simple.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_simple.cpp index b2baf3e4bed25f..9211ed9b076492 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_simple.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_simple.cpp @@ -21,7 +21,7 @@ std::shared_ptr AddFunction::initReference() const { auto data1 = std::make_shared(precision, input_shapes[1]); auto indata0 = std::make_shared(precision, data0->get_shape()); auto indata1 = std::make_shared(precision, data1->get_shape()); - auto add = std::make_shared(NodeVector{data0, data1}, + auto add = std::make_shared(NodeVector{data0, data1}, std::make_shared(NodeVector{std::make_shared(indata0, indata1)}, ParameterVector{indata0, indata1})); return std::make_shared(NodeVector{add}, ParameterVector{data0, data1}); @@ -67,7 +67,7 @@ std::shared_ptr EltwiseFunction::initReference() const { auto indata2 = std::make_shared(precision, data1->get_shape()); auto add = std::make_shared(indata0, indata1); auto sub = std::make_shared(add, const_data); - auto mul = std::make_shared(NodeVector{data0, data1, const_data}, + auto mul = std::make_shared(NodeVector{data0, data1, const_data}, std::make_shared(NodeVector{std::make_shared(add, sub)}, ParameterVector{indata0, indata1, indata2})); return std::make_shared(NodeVector{mul}, ParameterVector{data0, data1}); @@ -139,10 +139,10 @@ std::shared_ptr MatMulEltwiseBranchesFunction::initReference() const const std::vector const_values = CommonTestUtils::generate_float_numbers(4, -10., 10.); // snippet inputs auto non_snippet_op = std::make_shared(sinh_1, sinh_2); - auto mul_const_1 = std::make_shared(precision, Shape{1}, const_values[0]); - auto add_const_1 = std::make_shared(precision, Shape{1}, const_values[1]); - auto mul_const_2 = std::make_shared(precision, Shape{1}, const_values[2]); - auto sub_const_2 = std::make_shared(precision, Shape{1}, const_values[3]); + auto mul_const_1 = std::make_shared(precision, Shape{1}, const_values[0]); + auto add_const_1 = std::make_shared(precision, Shape{1}, const_values[1]); + auto mul_const_2 = std::make_shared(precision, Shape{1}, const_values[2]); + auto sub_const_2 = std::make_shared(precision, Shape{1}, const_values[3]); // snippet function Shape matMulOutShape = input_shapes[0].get_shape(); @@ -162,7 +162,7 @@ std::shared_ptr MatMulEltwiseBranchesFunction::initReference() const auto snippet_function = std::make_shared(NodeVector{ add }, subgraph_params); ngraph::NodeVector snippet_inputs{ non_snippet_op }; - auto snippet = std::make_shared(snippet_inputs, snippet_function); + auto snippet = std::make_shared(snippet_inputs, snippet_function); auto result = std::make_shared(snippet); return std::make_shared(NodeVector{ result }, ParameterVector{ data_1, data_2 }); @@ -185,14 +185,14 @@ std::shared_ptr EltwiseLogLoopFunction::initReference() const { auto inAdd = std::make_shared(indata0, indata1); auto inHswish = std::make_shared(inAdd); auto body = std::make_shared(NodeVector{inAdd, inHswish}, ParameterVector{indata0, indata1}); - auto subgraph = std::make_shared(NodeVector{data0, data1}, body); + auto subgraph = std::make_shared(NodeVector{data0, data1}, body); auto log = std::make_shared(subgraph->output(0)); //Note that log is not currently supported by snippets, so it won't be converted to subgraph. // Todo: Note that collapse_subgraph changes the output ports so that the input subgraph's outputs come // before the node outputs. So the Subgraph{Add}.output(1)->Log{} becomes Subgraph{Add+Hswish}.output(0)->Log{} auto subgraph_param = std::make_shared(precision, subgraph->get_output_shape(1)); auto log_param = std::make_shared(precision, log->get_output_shape(0)); - auto mul = std::make_shared(OutputVector{subgraph->output(1), log->output(0)}, + auto mul = std::make_shared(OutputVector{subgraph->output(1), log->output(0)}, std::make_shared(NodeVector{std::make_shared(subgraph_param, log_param)}, ParameterVector{subgraph_param, log_param})); return std::make_shared(NodeVector{mul}, ParameterVector{data0, data1}); @@ -238,14 +238,14 @@ std::shared_ptr EltwiseTwoResultsFunction::initReference() const { add->set_friendly_name("add"); auto hswish = std::make_shared(add); hswish->set_friendly_name("hswish"); - auto subgraph0 = std::make_shared(NodeVector{data0, data1}, + auto subgraph0 = std::make_shared(NodeVector{data0, data1}, std::make_shared(NodeVector{add, hswish}, ParameterVector{indata0, indata1})); subgraph0->set_friendly_name("add"); auto indata2 = std::make_shared(precision, subgraph0->get_output_shape(1)); auto relu = std::make_shared(indata2); relu->set_friendly_name("relu"); - auto subgraph1 = std::make_shared(OutputVector{subgraph0->output(1)}, + auto subgraph1 = std::make_shared(OutputVector{subgraph0->output(1)}, std::make_shared(NodeVector{relu}, ParameterVector{indata2})); subgraph1->set_friendly_name("relu"); diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_transpose.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_transpose.cpp index dcfb04a74d9fa3..cfafc18af7ee1f 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_transpose.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/subgraph_transpose.cpp @@ -21,11 +21,19 @@ std::shared_ptr TransposeFunction::initReference() const { auto indata0 = std::make_shared(precision, data->get_output_partial_shape(0)); auto indata1 = std::make_shared(const_order->get_output_element_type(0), const_order->get_output_partial_shape(0)); - auto transpose = std::make_shared(NodeVector{data, const_order}, + auto transpose = std::make_shared(NodeVector{data, const_order}, std::make_shared(NodeVector{std::make_shared(indata0, indata1)}, ParameterVector{indata0, indata1})); return std::make_shared(NodeVector{transpose}, ParameterVector{data}); } +std::shared_ptr TransposeMulFunction::initOriginal() const { + auto data0 = std::make_shared(precision, input_shapes[0]); + auto data1 = std::make_shared(precision, input_shapes[1]); + auto const_order = std::make_shared(ov::element::i32, Shape {order.size()}, order); + auto transpose = std::make_shared(data0, const_order); + auto multiply = std::make_shared(transpose, data1); + return std::make_shared(NodeVector{multiply}, ParameterVector{data0, data1}); +} } // namespace snippets } // namespace test diff --git a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/two_binary_ops_function.cpp b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/two_binary_ops_function.cpp index 03ec7514ced08e..24892edda3f920 100644 --- a/src/tests/ngraph_helpers/snippets_ngraph_functions/src/two_binary_ops_function.cpp +++ b/src/tests/ngraph_helpers/snippets_ngraph_functions/src/two_binary_ops_function.cpp @@ -24,7 +24,7 @@ std::shared_ptr TwoBinaryOpsFunction::get( const auto create_convert = [](std::shared_ptr parent, const element::Type convertion_type) -> std::shared_ptr { return convertion_type == element::undefined ? std::dynamic_pointer_cast(parent) - : std::make_shared(parent, convertion_type); + : std::make_shared(parent, convertion_type); }; const auto make_branch = [&create_convert](