This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Gluoncv HRNet Code raise error in mxnet-1.6.0 gpu hybridize mode #18624
Comments
@JONGGON can you derive a minimal example to reproduce the issue? |
@JONGGON It's caused by the _FusedOp in cached graph. It's fixed by tvm apache/tvm#5238. However, It's not merged into mxnet until now. I think you can fix it and build mxnet youself. |
oh, thank you!!!!! |
chinakook
added a commit
to chinakook/gluon-cv
that referenced
this issue
Jul 24, 2020
change it to broadcast_add to solve the problem even. I think it's a bug with _FusedOp with GPU because CPU is OK. apache/mxnet#18624
chinakook
added a commit
to chinakook/gluon-cv
that referenced
this issue
Jul 25, 2020
change it to broadcast_add to solve the problem even. I think it's a bug with _FusedOp with GPU because CPU is OK. apache/mxnet#18624
zhreshold
pushed a commit
to dmlc/gluon-cv
that referenced
this issue
Aug 14, 2020
* Fix HRNet bottleneck stride schema After this fix, the result is always same to the torch original implementation in every forward pass. There were sometimes different before. The params has no need to update. * CachedOp on GPU may be fail with add change it to broadcast_add to solve the problem even. I think it's a bug with _FusedOp with GPU because CPU is OK. apache/mxnet#18624
3 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
In mxnet 1.6.0 gpu hybridize mode, the below code raise error .
(As a result of checking,
no problem occurs in 1.5.0 gpu mode and 1.5.0 cpu mode(hybridize or not hybridize)
no problem occurs in 1.6.0 cpu mode(hybridize or not hybridize)
The problem only occurs in version 1.6.0 gpu hybridize mode.
I have confirmed that the same error occurs in 1.7.0 gpu hybridize mode which is not released now.
Error Message
Traceback (most recent call last):
File "/home/jg/Desktop/mountain/2frameCenter/core/model/backbone/HrNet.py", line 664, in
output = net(mx.nd.random_uniform(low=0, high=1, shape=(1, 3, input_size[0], input_size[1]), ctx=mx.gpu(0)))
File "/home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/gluon/block.py", line 693, in call
out = self.forward(*args)
File "/home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/gluon/block.py", line 1148, in forward
return self._call_cached_op(x, args)
File "/home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/gluon/block.py", line 1020, in _call_cached_op
out = self._cached_op(cargs)
File "/home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/_ctypes/ndarray.py", line 170, in call
ctypes.byref(out_stypes)))
File "/home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/base.py", line 255, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [02:01:14] src/core/graph.cc:102: Check failed: it != node2index.end() && it->first == e.node.get():
Stack trace:
[bt] (0) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6b8b5b) [0x7f83ce9d2b5b]
[bt] (1) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x91bdd08) [0x7f83d74d7d08]
[bt] (2) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x91bf0a8) [0x7f83d74d90a8]
[bt] (3) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x91bfc30) [0x7f83d74d9c30]
[bt] (4) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38500aa) [0x7f83d1b6a0aa]
[bt] (5) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3870b2a) [0x7f83d1b8ab2a]
[bt] (6) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38804c2) [0x7f83d1b9a4c2]
[bt] (7) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3881959) [0x7f83d1b9b959]
[bt] (8) /home/jg/anaconda3/envs/mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::Forward(std::shared_ptrmxnet::CachedOp const&, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray* > const&)+0x181) [0x7f83d1b9f4c1]
Process finished with exit code 1
To Reproduce
code!!!
Steps to reproduce
What have you tried to solve it?
I checked line by line. -> An error occurred in the HighResolutionBaseNet class.
Environment
ubuntu16.04 / cuda 10.1 / 418.56 driver
The text was updated successfully, but these errors were encountered: