Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subgraph extraction in ONNX models #4107

Merged
merged 34 commits into from
Mar 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
0446814
Subgraph extraction in ONNX models
Jan 19, 2021
0c8a4f7
Windows compilation error fix + docs update
Feb 1, 2021
4a512fc
Code cleanup after the first round of reviews
Feb 1, 2021
49d2908
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 1, 2021
15bcb02
CI compilation error fix
Feb 1, 2021
fe7b3b1
Even more CI compilation error fixes
Feb 1, 2021
7e70fda
Proper usage of ADL in generic code
Feb 2, 2021
53667fa
ONNX shape inference related code cleanup
Feb 2, 2021
ddcffa3
Disable the onnx test utils when pb-lite is used
Feb 2, 2021
0078926
PB dependency removal from UT, strong types for input and output edge…
Feb 4, 2021
b2373e4
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 4, 2021
b8fb3bd
Fix for the protobuf descriptor database corruption
Feb 8, 2021
45d11e1
testing visibility changes
Feb 8, 2021
2d3e298
Revert the changes that didn't work
Feb 8, 2021
f2fdf5e
Make tests green again?
Feb 9, 2021
48b3cd4
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 9, 2021
92addf2
Make the current tests pass
Feb 9, 2021
59d7ef4
Remove the ONNX header from editor's tests
Feb 9, 2021
6eaee42
Switch from stable_partition to remove_if because of compiler bugs
Feb 10, 2021
0e7822f
Obsolete test removal and cmakelists cleanup in tests
Feb 11, 2021
985ef97
Macos failed, reverting some changes
Feb 11, 2021
263cd4b
Handle the multiple output consumers UC
Feb 11, 2021
1f18183
Keep the tensor name when replacing an initializer
Feb 11, 2021
9972ac0
Cutting a graph with multiple consumers of inputs and initializers
Feb 11, 2021
a39cbd0
Subgraph extraction with multiple initializer consumers
Feb 11, 2021
4b86f01
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 12, 2021
ad7100c
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 15, 2021
2a3744b
Docs update
Feb 15, 2021
6980948
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 17, 2021
8d5f739
Get rid of test code from the onnx_importer
Feb 17, 2021
b337772
Producer name update in test onnx models
Feb 18, 2021
6c67ec8
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 18, 2021
0ae591d
More comments in the subgraph extraction code
Feb 19, 2021
ee6831c
Merge remote-tracking branch 'upstream/master' into onnx_model_cutting
Feb 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
//*****************************************************************************
// Copyright 2017-2021 Intel Corporation
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//*****************************************************************************

#pragma once

#include <map>
#include <set>
#include <string>
#include <unordered_map>
#include <vector>

namespace ONNX_NAMESPACE
{
class GraphProto;
class NodeProto;
class ValueInfoProto;
} // namespace ONNX_NAMESPACE

namespace ngraph
{
enum class EdgeType
{
INPUT,
OUTPUT
};

template <EdgeType>
struct Edge
{
Edge() = delete;
Edge(const int node_idx, std::string tensor_name)
: m_node_idx{node_idx}
, m_tensor_name{std::move(tensor_name)}
{
}

const int m_node_idx;
Copy link
Contributor

@slyalin slyalin Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can one obtain a valid value for m_node_idx? Briefly looking in the code I see that a valid value is really required but there is no way for user to deduce this index based on some user-visible name without parsing model proto file. Even Netron doesn't give this index.

Can we make it at least optional? In this case it would mean cutting by a tensor name and providing a single input for all the consumers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarily to the comment below - TBD in 47578

const std::string m_tensor_name;
};
namespace onnx_import
{
/// \brief Defines an edge connected to an input of any node in the graph.
/// It consists of a node index in the processed ONNX model and the input name.
tomdol marked this conversation as resolved.
Show resolved Hide resolved
/// The index should point to a node in the topological sort of the underlying graph
/// which means it has to be in range: 0 <= node_idx < graph.node_size()
///
/// For a node number 5, with 3 inputs:
///
/// ----(in_A)----> +--------+
/// ----(in_B)----> | node 5 | ----(out)---->
/// ----(in_C)----> +--------+
///
/// there are 3 possible valid instances of this struct:
/// InputEdge(5, "in_A")
/// InputEdge(5, "in_B")
/// InputEdge(5, "in_C")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that node 5 is an operation and (in_A) etc. are tensors, this documentation doesn't provide rationale why we are specifying node index, because there is only one consumer for each of the tensors on the schematics. In this case the tensor name should be enough to specify the cutting point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, this will be addressed in a separate ticket: 47578

using InputEdge = Edge<EdgeType::INPUT>;

/// \brief Defines an edge connected to an output of any node in the graph.
/// It consists of a node index in the processed ONNX model and the output name.
///
/// For a node number 5, with 2 outputs:
///
/// +--------+ ----(out1)---->
/// ----(in_A)----> | node 5 |
/// +--------+ ----(out2)---->
///
/// there are 2 possible valid instances of this struct:
/// OutputEdge(5, "out1")
/// OutputEdge(5, "out2")
using OutputEdge = Edge<EdgeType::OUTPUT>;

/// \brief Subgraph extraction helper structure
struct SubgraphExtractor
{
SubgraphExtractor(ONNX_NAMESPACE::GraphProto& graph);

/// \brief Adds new inputs to the graph and connects them to the nodes indicated by
/// the provided input edges.
void add_new_inputs(const std::vector<InputEdge>& new_inputs);

/// \brief Adds new outputs to the graph with the same name as the nodes pointed to
/// by the input edges "new_outputs".
void add_new_outputs(const std::vector<OutputEdge>& new_outputs);

/// \brief Extracts the final subgraph by traversing the original model bottom-up
/// starting at each of the provided output edges. The extracted subgraph
/// contains all previously added inputs and potentially a subset of original
/// model's inputs that contribute to the value calculated in the output tensors.
/// In the end the underlying GraphProto is modified and obsolete elements
/// are discarded after this method call has finished.
///
/// \param subgraph_outputs A list of expected outputs of the extracted subgraph.
void extract_subgraph(std::vector<OutputEdge> subgraph_outputs);

/// \brief Represents a subgraph of an ONNX model by holding a subset of nodes, inputs,
/// outputs and initializers of the original graph. Objects of this struct can be
/// merged into other instances using the += operator to build a subgraph from
/// smaller clusters.
struct SubgraphComponents
{
SubgraphComponents() = default;
SubgraphComponents(const SubgraphComponents&) = delete;
SubgraphComponents(SubgraphComponents&&) = default;
SubgraphComponents& operator=(const SubgraphComponents&) = delete;
SubgraphComponents& operator=(SubgraphComponents&&) = default;

std::set<int> nodes;
std::set<std::string> inputs;
std::set<std::string> initializers;
std::set<std::string> outputs;

SubgraphComponents& operator+=(SubgraphComponents&& other)
{
nodes.insert(other.nodes.begin(), other.nodes.end());
inputs.insert(other.inputs.begin(), other.inputs.end());
initializers.insert(other.initializers.begin(), other.initializers.end());
outputs.insert(other.outputs.begin(), other.outputs.end());
return *this;
}
};

private:
ONNX_NAMESPACE::GraphProto& m_onnx_graph;

// Graph traversal helper: node index -> node inputs (one-to-many)
std::unordered_multimap<int, std::string> m_node_inputs;
// Number of consumers of all tensors in the graph
std::map<std::string, int> m_tensor_consumers;

/// \brief Replaces the old input edge with a new one in the helper struct.
/// This is used by the output contributors discovery.
void replace_input_edge(const InputEdge& old_edge, const InputEdge& new_edge);

/// \brief Returns a list of edges of each outputs of the graph "m_onnx_graph"
std::vector<OutputEdge> all_output_edges() const;

/// \brief Traverses the graph bottom-up and collects all nodes, inputs and initializers
/// that contribute to an output designated by the provided output edge.
/// A sum of such SubgraphComponents objects forms a target extracted subgraph.
SubgraphComponents
discover_output_contributors(const OutputEdge& output_edge,
const SubgraphComponents& already_collected) const;

/// \brief Modifies the underlying GraphProto object and discards all obsolete elements.
///
/// \param subgraph An object describing the subgraph to be extracted (elems to be kept)
void extract_subgraph_from_onnx_model(const SubgraphComponents& subgraph);
};
} // namespace onnx_import
} // namespace ngraph
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "ngraph/op/constant.hpp"
#include "ngraph/partial_shape.hpp"
#include "ngraph/type/element_type.hpp"
#include "onnx_import/editor/detail/subgraph_extraction.hpp"
#include "onnx_import/utils/onnx_importer_visibility.hpp"

namespace ONNX_NAMESPACE
Expand Down Expand Up @@ -53,7 +54,7 @@ namespace ngraph
/// \param model_path Path to the file containing the model.
ONNXModelEditor(const std::string& model_path);

/// \brief Modifies the in-memory representation of the model (m_model_proto) by setting
/// \brief Modifies the in-memory representation of the model by setting
/// custom input types for all inputs specified in the provided map.
///
/// \param input_types A collection of pairs {input_name: new_input_type} that should be
Expand All @@ -62,7 +63,7 @@ namespace ngraph
/// the inputs specified in its parameter.
void set_input_types(const std::map<std::string, element::Type_t>& input_types);

/// \brief Modifies the in-memory representation of the model (m_model_proto) by setting
/// \brief Modifies the in-memory representation of the model by setting
/// custom input shapes for all inputs specified in the provided map.
///
/// \param input_shapes A collection of pairs {input_name: new_input_shape} that should
Expand All @@ -71,6 +72,18 @@ namespace ngraph
/// the inputs specified in its parameter.
void set_input_shapes(const std::map<std::string, ngraph::PartialShape>& input_shapes);

/// \brief Extracts a subgraph constrained by input edges and output edges. In the end
/// the underlying ModelProto is modified - obsolete inputs, initializers, nodes
/// and outputs are removed from the in-memory model.
///
/// \node Please look at the declaration of InputEdge and OutputEdge for explanation
/// how those objects can be created. If the outputs parameter is empty
/// this method keeps all of the original outputs of the model.
///
/// \param inputs A collection of input edges which become new inputs to the graph
/// \param outputs A collection of output edges which become new outputs of the graph
void cut_graph_fragment(const std::vector<InputEdge>& inputs,
const std::vector<OutputEdge>& outputs);
/// \brief Modifies the in-memory representation of the model by setting custom input
/// values for inputs specified in the provided map.
///
Expand All @@ -91,19 +104,28 @@ namespace ngraph
/// \return A reference to ONNX ModelProto object containing the in-memory model
ONNX_NAMESPACE::ModelProto& model() const;

/// \brief Returns a serialized ONNX model, possibly modified by the editor.
std::string model_string() const;
ilyachur marked this conversation as resolved.
Show resolved Hide resolved

/// \brief Returns a list of all inputs of the in-memory model, including initializers.
/// The returned value might depend on the previous operations executed on an
/// instance of the model editor, in particular the subgraph extraction which
/// can discard some inputs and initializers from the original graph.
std::vector<std::string> model_inputs() const;

/// \brief Returns the path to the original model file
const std::string& model_path() const;

/// \brief Saves the possibly model held by this class to a file. Serializes in binary
/// mode.
/// \brief Saves the possibly modified model held by this class to a file.
/// Serializes in binary mode.
///
/// \param out_file_path A path to the file where the modified model should be dumped.
void serialize(const std::string& out_file_path) const;

private:
const std::string m_model_path;

class Impl;
struct Impl;
std::unique_ptr<Impl, void (*)(Impl*)> m_pimpl;
};
} // namespace onnx_import
Expand Down
Loading