Skip to content

Commit

Permalink
[IE][VPU]: Enables Extract Dynamic Batch Transformation (#3715)
Browse files Browse the repository at this point in the history
* [IE][nGraph]: Enables begin/end iterators for PartialShape

It's convenient to be able to use STL algorithms on
PartialShape since semantically PartialShape is a
sequence of Dimensions.

* [IE][VPU][nGraph]: Introduces tree utilities

Introduces Depth-First-Search and Breadth-First-Search
utilities for tree traversal. Templated arguments
makes them extensible for different use-case scenarios.

BFS is designed in way to make it possible to guarantee
node will be visited only after all its predecessors
have been visited:

       a
      / \
     b   c
     |   |
     d   |
     \  /
       e

There with accordingly provided functors (NumEntries) it's
guaranteed node "e" will be visited after "d" and "c".
Such a property is important for nodes depth evaluation.

* [IE][VPU][nGraph]: Fixes printTo for nGraph type

For some reason if printTo for nGraph type is
usual function it's not picked up by VPU_THROW_UNLESS
triggered inside DynamicToStaticShape transformations.

Making it template specialization does the job.

* [IE][VPU]: Introduces SliceConfiguration class

SliceConfiguration is a class that's intended
to express the result of operation slicing by
batch. The result of slicing is configuration
that specifies what to do with each data object
associated with operation. There are two options
defined: Slice and Unchanged. Typical slice
scenario is Slice, when operation has the same
batch for all inputs and outputs, so all
corresponding data object will be "sliced"
(replaced with copy where batch equal to 1).

At some cases, data object should not sliced
(ex. if operation has constant input which
is the same for all input data batches and
so, does not have batch - Add of 2 tensors
with shapes [10, 1000] and [1000]). To
represent such cases there is option
"Unchanged".

At cases when operation should not be sliced
at all (ex. does not have batch, have different
batch for inputs and outputs, has static
batch and so on) SliceConfiguration object will
return false for "hasSlice" method call. In
these cases inputs and outputs methods calls
will throw an exception.

* [IE][VPU][nGraph]: Enables MatMul operation slice

In case of static batch, operation is not going to be sliced,
since for handling such cases other transformation is used.
Such approach allows both passes to co-exist while one is
being replaced with another.

If data input has other dynamic dimension than batch error
will be thrown since Myriad-X plugin does not support
convolutions (HW accelerated operations) with dynamism in
spatial dimensions.

* [IE][VPU][nGraph]: Enables Convolution operations slice

In case of static batch, operation is not going to be sliced,
since for handling such cases other transformation is used.
Such approach allows both passes to co-exist while one is
being replaced with another.

If data input has other dynamic dimension than batch error
will be thrown since Myriad-X plugin does not support
convolutions (HW accelerated operations) with dynamism in
spatial dimensions.

* [IE][VPU][nGraph]: Enables unary eltwise slice

Since extract dynamic batch transformation will handle
dynamism only by batch (so requires body loop to be static)
operations with dynamism in dimension other than batch should
not be covered by loop.

In case of dynamism in dimension other than batch eltwise
will be considered unsupported for sub-graph extraction.

* [IE][VPU][nGraph]: Enables binary eltwise slice

Since extract dynamic batch transformation will handle
dynamism only by batch (so requires body loop to be static)
operations with dynamism in dimension other than batch should
not be covered by loop.

In case of dynamism in dimension other than batch eltwise
will be considered unsupported for sub-graph extraction.

It's template function since different binary eltwise
operations have the same broadcasting rules.

* [IE][VPU][nGraph]: Enables extract dynamic batch transformation

General approach is following:

1. Extracted sub-graphs should have exactly one input and output
   operation. Otherwise, it's possible that memory consumption of
   model will be increased since loops implementation on Myriad-X
   requires to keep all inputs and outputs of loop to be alive
   along with memory used by loop body. In layout consolidation
   scenario it reflects intention to use minimized amount of
   permutations.

2. Extracted sub-graph should not have external connections (
   the only nodes that allowed to have predecessor or successor
   outside of sub-graph are input and output). Otherwise, it's
   possible that memory consumption of model will be increased
   for the same reason as in previous point.

   To make sure this restriction is met transformation looks
   for leaves in both directions, finds corresponding LCA
   (Lowest Common Ancestor) and checks if such sub-graph has
   external connections. If so, it repeats leaves search
   procedure stopping if it approaches leaves from previous
   iteration and finds LCA again. It is repeated until
   sub-graph without external connections is found (it exists,
   at least source itself forms it).

   Leaf in current context is a node which satisfies one of
   the following conditions (depending on direction):
     Top:
       1. It has no predecessors which are neither Parameter,
          nor Constant
       2. It's unknown how to slice this operation
       3. It could not be sliced (different batch for inputs and
          outputs)
     Bottom:
       1. It has no successors which are not Result
       2. It's unknown how to slice this operation
       3. It could not be sliced (different batch for inputs and
          outputs)

Signed-off-by: Gladilov, Gleb <[email protected]>
  • Loading branch information
ggladilo authored Jan 13, 2021
1 parent 9fa8ad5 commit 1601c7f
Show file tree
Hide file tree
Showing 16 changed files with 992 additions and 6 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include <vector>

namespace vpu {

enum class SliceMode {
Slice,
Unchanged
};

class SliceConfiguration {
public:
SliceConfiguration() = default;
SliceConfiguration(std::vector<SliceMode> inputs, std::vector<SliceMode> outputs);

bool isSliceSupported() const;
const std::vector<SliceMode>& inputs() const;
const std::vector<SliceMode>& outputs() const;

private:
bool m_isSliceSupported = false;
std::vector<SliceMode> m_inputs;
std::vector<SliceMode> m_outputs;
};

} // namespace vpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include "ngraph/pass/graph_rewrite.hpp"

#include <memory>

namespace vpu {

class ExtractBatch: public ngraph::pass::FunctionPass {
public:
NGRAPH_RTTI_DECLARATION;

explicit ExtractBatch(std::unordered_set<ngraph::Node::type_info_t> targets);
bool run_on_function(std::shared_ptr<ngraph::Function> function) override;

private:
std::unordered_set<ngraph::Node::type_info_t> targets;
};

} // namespace vpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include "ngraph/ngraph.hpp"
#include "batch_extraction_configuration.hpp"

namespace vpu {

SliceConfiguration sliceBinaryEltwise(const ngraph::Node& node);

} // namespace vpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include "ngraph/ngraph.hpp"
#include "batch_extraction_configuration.hpp"

namespace vpu {

SliceConfiguration sliceConvolution(const ngraph::Node& node);

} // namespace vpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include "ngraph/ngraph.hpp"
#include "batch_extraction_configuration.hpp"

namespace vpu {

SliceConfiguration sliceMatMul(const ngraph::Node& node);

} // namespace vpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include "ngraph/ngraph.hpp"
#include "batch_extraction_configuration.hpp"

namespace vpu {

SliceConfiguration sliceUnaryEltwise(const ngraph::Node& node);

} // namespace vpu
61 changes: 60 additions & 1 deletion inference-engine/src/vpu/common/include/vpu/ngraph/utilities.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@
#include "ngraph/node.hpp"
#include "ngraph/type/element_type.hpp"

#include "vpu/utils/error.hpp"

#include <stack>
#include <deque>

namespace vpu {

std::vector<std::int64_t> evaluateTargetShape(const ngraph::Output<ngraph::Node>& value);
Expand All @@ -15,6 +20,60 @@ std::shared_ptr<ngraph::Node> shapeToConstant(const ngraph::element::Type& type,

std::shared_ptr<ngraph::Node> gatherShapeElements(const ngraph::Output<ngraph::Node>&, int startIndex, size_t elemCount);

void printTo(std::ostream& stream, const ngraph::NodeTypeInfo& object);
template<>
inline void printTo(std::ostream& stream, const ngraph::NodeTypeInfo& object) {
stream << object.name << " ver. " << object.version;
}

using Nodes = std::unordered_set<ngraph::Node*>;

template<class GetNext, class Visit>
Nodes dfs(ngraph::Node* root, GetNext&& getNext, Visit&& visit) {
Nodes visited;
std::stack<ngraph::Node*> stack{{root}};
while (!stack.empty()) {
const auto current = stack.top();
stack.pop();

if (!visited.emplace(current).second) {
continue;
}

if (!visit(current)) {
continue;
}

for (const auto& next : getNext(current)) {
stack.push(next);
}
}
return visited;
}

template<class NumEntries, class Visit, class MoveForward>
void bfs(ngraph::Node* root, NumEntries&& getNumEntries, Visit&& visit, MoveForward&& moveForward) {
std::deque<ngraph::Node*> deque{root};
std::unordered_map<ngraph::Node*, std::size_t> visits;
while (!deque.empty()) {
const auto current = deque.front();
deque.pop_front();

const auto numEntries = current == root ? 1 : getNumEntries(current);

const auto visitsCount = ++visits[current];
VPU_THROW_UNLESS(visitsCount <= numEntries, "Encountered loop at {}", current);

if (visitsCount < numEntries) {
VPU_THROW_UNLESS(!deque.empty(), "Node {} should be visited only after all predecessors, but it is not available through all of them", current);
continue;
}

if (!visit(current)) {
continue;
}

moveForward(deque, current);
}
}

} // namespace vpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#include "vpu/utils/error.hpp"
#include "vpu/ngraph/transformations/extract_dynamic_batch/batch_extraction_configuration.hpp"

namespace vpu {

SliceConfiguration::SliceConfiguration(std::vector<SliceMode> inputs, std::vector<SliceMode> outputs)
: m_isSliceSupported(true)
, m_inputs(std::move(inputs))
, m_outputs(std::move(outputs)) {}

bool SliceConfiguration::isSliceSupported() const {
return m_isSliceSupported;
}

const std::vector<SliceMode>& SliceConfiguration::inputs() const {
VPU_THROW_UNLESS(m_isSliceSupported, "Encountered an attempt to access inputs slice configuration for a case when slice is unsupported");
return m_inputs;
}

const std::vector<SliceMode>& SliceConfiguration::outputs() const {
VPU_THROW_UNLESS(m_isSliceSupported, "Encountered an attempt to access outputs slice configuration for a case when slice is unsupported");
return m_outputs;
}

} // namespace vpu

Loading

0 comments on commit 1601c7f

Please sign in to comment.