-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix for duplicate subgraph inputs/outputs #16131
Conversation
@zheng-da @reminisce could you please review? |
@mxnet-label-bot add [pr-awaiting-review] |
@ZhennanQin for review. Any suggestions on how to better fix the problem? |
@ZhennanQin ive made the changes, do they still look good? If so we can start going through and updating all other subgraph properties. |
LGTM. Please coordinate all backends with new ConnectSubgraphOutputs(). |
I've noticed that MXNet does not preserve what I'm calling 'output independence' during it's memory planning (issue forthcoming). While I'm talking here about the model as a whole, I've noticed many graph passes are applied to subgraphs as well, so perhaps the issue is relevent to this PR. Any update on status here? |
Hi @DickJC123 sorry I havent been able to prioritize finishing off this PR. IMO the problem is similar and endemic in mxnet but not related to this PR. Please do file an issue at the very least for posterity. |
1e4fdc5
to
7a210d6
Compare
@mseth10 looks like we made an aux an arg somehow, thats why the MKLDNN tests are failing:
So this is why the error says
Theres one extra name in |
@@ -537,10 +537,11 @@ void FindOutputEntries(nnvm::Graph* g, | |||
*/ | |||
void CutGraphInputs(const std::vector<nnvm::NodeEntry*> &input_entries, | |||
std::vector<nnvm::NodeEntry> *orig_entries, | |||
std::vector<nnvm::NodeEntry> *unique_inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an alternative implementation that does not need t track the actual unique inputs separately.
A counter would allow us to correctly modify orig_entries into unique_entries.
This keeps the function signature unchanged and minimizes changes elsewhere.
It is already working in a private build. I add it here for consideration if desired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @HahTK, but we cant modify orig_entries
since we need it unmodified for ReattachGraphInputs:
https://github.com/apache/incubator-mxnet/blob/e249d71ad4621afcba7f0f3af77095a3f9a4bc83/src/operator/subgraph/build_subgraph.cc#L578
this is used to reject a subgraph from the reviewSubgraph API:
https://github.com/apache/incubator-mxnet/blob/e249d71ad4621afcba7f0f3af77095a3f9a4bc83/src/operator/subgraph/build_subgraph.cc#L657
the example you provided doesnt work in the latest version of MXNet where you can reject a subgraph by returning nullptr from CreateSubgraphNode here: https://github.com/apache/incubator-mxnet/blob/e249d71ad4621afcba7f0f3af77095a3f9a4bc83/src/operator/subgraph/build_subgraph.cc#L630-L633
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed change was tested on 1.5.1 and it seems like newer versions of MXNet have new features that needs orig_entries to be preserved. So acknowledged !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix.
* fix for duplicate inputs * fixed error * fixed whitespace * Remove duplicate outputs from subgraphs * changed subgraph to create map of outputs * added static_cast * changed map<int,v> to vector * sanity fix * sanity2 * updated backends with new connectSubgraphOutputs API * fixed map creation logic * added updates for reattach function * creating node only if it is not an input to subgraph * creating object based on var_name only * updating ConnectSubgraphOutputs for mkldnn_elemwisemul_post_quantize_property.h * add debug prints to debug error in CI * remove prints * added prints to debug in the CI * revert changes * reverted changes * deduplicaated inputs to subgraph * deduplicated subgraph inputs * simplified inputs * cleaned up * deduplicate outputs * cleand up * added deduplication to subgraph node outputs * fixed prev compare * fixed issue with inputs and added test * fixd whitespace, removed prints Co-authored-by: Sam Skalicky <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>
For the record @mseth10 i finally found the issue that caused this "making an aux an arg" problem: #19112 (comment) |
…9112) * Fix for duplicate subgraph inputs/outputs (#16131) * fix for duplicate inputs * fixed error * fixed whitespace * Remove duplicate outputs from subgraphs * changed subgraph to create map of outputs * added static_cast * changed map<int,v> to vector * sanity fix * sanity2 * updated backends with new connectSubgraphOutputs API * fixed map creation logic * added updates for reattach function * creating node only if it is not an input to subgraph * creating object based on var_name only * updating ConnectSubgraphOutputs for mkldnn_elemwisemul_post_quantize_property.h * add debug prints to debug error in CI * remove prints * added prints to debug in the CI * revert changes * reverted changes * deduplicaated inputs to subgraph * deduplicated subgraph inputs * simplified inputs * cleaned up * deduplicate outputs * cleand up * added deduplication to subgraph node outputs * fixed prev compare * fixed issue with inputs and added test * fixd whitespace, removed prints Co-authored-by: Sam Skalicky <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]> * added flag to enable dedupe ondemand * fixed dedup logic * improved dedup logic * fixed sanity * propogated option * check option in custom subgraph prop * fixed options map * fixed missing * added dedup to subgraph_prop base class for testing * added test for dedup * added comments Co-authored-by: Sam Skalicky <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>
Description
Fixes #16108
Subgraph API can create duplicated tensors for both inputs to a subgraph and outputs from a subgraph. This happens when there are multiple nodes in a subgraph that consume the same input, or when there are multiple nodes that consume a single subgraph output. This results in tensor duplication, and causes OOM due to excessive memory usage.
Fix for duplicate inputs
Currently, inputs to a subgraph are duplicated when there are multiple nodes in a subgraph consuming the same input (see diagram below). In this PR we change the behavior of CutGraphInputs function to store a NodeEntry for each input to the subgraph in a map, and re-use that node for each node in the subgraph that consumes that input (see diagram below). This prevents input nodes from being duplicated, and results in less copies of tensors at runtime and a much smaller memory footprint.
Fix for duplicate outputs
Similar to the inputs, when an output from a subgraph is consumed by multiple nodes outside the subgraph, the output is duplicated for each consumer (see diagram below). In this PR we change the behavior of CreateSubgraphNode to collapse duplicate output dependencies to a single output by checking if neighboring output entries are the same. If so we do not add another output, preventing duplicates.
Then in the ConnectSubgraphOutputs function we create dependencies from each node outside the subgraph that consumes a subgraph output. If neighboring nodes share the same subgraph output, they point to the same subgraph output.
Example
Added an example graph with duplicate inputs/outputs to the
test_subgraph_op.py
. Heres the original graph partitioned before this PR. Notice that it has two input nodes:data0
anddata1
that both come from the same top leveldata
. Similarly, notice that it has two outputs coming from the same node_plus0
.Heres the same graph partitioned after this PR. Notice that now there is only 1 input node
data0
and bothsin
ops use it. Also notice that now there is only 1 output from the subgraph, and bothcos
ops use it.