Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Multiple subgraph properties with single backend issue with optimize_for #19256

Closed
samskalicky opened this issue Sep 30, 2020 · 0 comments · Fixed by #19263
Closed

Multiple subgraph properties with single backend issue with optimize_for #19256

samskalicky opened this issue Sep 30, 2020 · 0 comments · Fixed by #19263

Comments

@samskalicky
Copy link
Contributor

samskalicky commented Sep 30, 2020

Description

Theres a difference in the flow between get_backend_symbol and optimize_for when a single backend has multiple subgraph properties that results in the indexed graph not being updated between subgraph properties.

Error Message

Traceback (most recent call last):
  File "test.py", line 22, in <module>
    sym_block.optimize_for(mx.nd.zeros((64, 4, 10, 10)), backend='MKLDNN_QUANTIZE')
  File "/home/ubuntu/v1.7.x/python/mxnet/gluon/block.py", line 1089, in optimize_for
    self._build_cache(x, *args)
  File "/home/ubuntu/v1.7.x/python/mxnet/gluon/block.py", line 979, in _build_cache
    out = out.optimize_for(self._backend, arg_array, aux_array, ctx, **self._backend_opts)
  File "/home/ubuntu/v1.7.x/python/mxnet/symbol/symbol.py", line 1531, in optimize_for
    ctypes.byref(new_aux_names)))
  File "/home/ubuntu/v1.7.x/python/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "src/operator/subgraph/build_subgraph.cc", line 80
MXNetError: Check failed: input_nid < simple_nodes->size() (6 vs. 6) :

To Reproduce

from mxnet.util import use_np
from mxnet.gluon import nn, HybridBlock
import mxnet as mx
import numpy as np
attr = {'sg_mkldnn_conv_bn_0' : {'with_bn': 'true'}}
data = mx.symbol.Variable('data', shape=(64, 4, 10, 10), dtype='float32')
data2 = mx.symbol.Variable('data2', shape=(64, 64, 10, 10), dtype='float32')
weight1 = mx.symbol.Variable('conv1_weight', dtype='float32')
weight2 = mx.symbol.Variable('conv2_weight', dtype='float32', shape=(64,64,1,1))
conv1 = mx.symbol.Convolution(data=data, weight=weight1, name='conv1', num_filter=64,
                           kernel=(1, 1), stride=(1, 1), no_bias=True)
bn1 = mx.symbol.BatchNorm(data=conv1, name="bn1")
conv2 = mx.symbol.Convolution(data=bn1, weight=weight2, name='conv2', num_filter=64,
                           kernel=(1, 1), stride=(1, 1), no_bias=True)
bn2 = mx.symbol.BatchNorm(data=conv2, name="bn2")
sum = bn2 + data2
inputs = mx.sym.var('data', dtype='float32')
sym_block = mx.gluon.SymbolBlock(sum, [inputs])
for k, v in sym_block.collect_params().items():
    v.initialize()
mm = sym_block(mx.nd.zeros((64, 4, 10, 10)))
sym_block.optimize_for(mx.nd.zeros((64, 4, 10, 10)), backend='MKLDNN_QUANTIZE')

Steps to reproduce

  1. clone v1.7.x branch and build from source
  2. run code above

Analysis

The error is occurring at the CHECK_LT on line 80 where theres a mismatch in the Graph definition (the input of a node is pointing to another node ID that hasnt been seen yet, so this is breaking topological ordering):
https://github.com/apache/incubator-mxnet/blob/93b7ff787f56ab0e386df0fc177ae1da26dcd47e/src/operator/subgraph/build_subgraph.cc#L70-L80
Some debugging finds that the indexed graph inside of an nnvm::Graph object is only built once:

const IndexedGraph& Graph::indexed_graph() const {
  if (indexed_graph_ == nullptr) {
    indexed_graph_.reset(new IndexedGraph(*this));
  }
  return *indexed_graph_;
}

So after subsequent subgraph properties the graph is not updated, causing the error in this issue. Since the indexed_graph_ is private, theres no way to reset it.

Comparing the get_backend_symbol flow the Graph is created every iteration for each subgraph property and stored in the Symbol between iterations:
https://github.com/apache/incubator-mxnet/blob/16280ad72aaa35f089ca4d4db680a5c5c7ff1676/src/c_api/c_api_symbolic.cc#L1302-L1308
Where as in the optimize_for flow the Graph is created once:
https://github.com/apache/incubator-mxnet/blob/16280ad72aaa35f089ca4d4db680a5c5c7ff1676/src/c_api/c_api_symbolic.cc#L1375
and then re-used for each subgraph propery:
https://github.com/apache/incubator-mxnet/blob/16280ad72aaa35f089ca4d4db680a5c5c7ff1676/src/c_api/c_api_symbolic.cc#L1498-L1502
So we need to create a new Graph object for each iteration in order to fix this.

FYI @mseth10 @Kh4L @ptrendx

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant