rcnn example throws CUDNN_STATUS_BAD_PARAM when running under cudnn 6.0 #11240

ghost · 2018-06-12T03:24:06Z

After I update the mxnet version from 1.1.0 to 1.2.0 and build the repository with CUDA 8.0.61 and cudnn 6.0, the rcnn training throws the following error when evaluating the rpn accuracy.

check failed: e == cuDNN: CUDNN_STATUS_SUCCESS(3 vs. 0) cuDNN: CUDNN_STATUS_BAD_PARAM

The error occurred when executing the following code in example/rcnn/rcnn/core/metric.py:
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
Any ideas to address this without disabling cudnn or rolling to a former version?

The text was updated successfully, but these errors were encountered:

kalyc · 2018-06-12T18:17:33Z

Thanks for submitting this issue @xioryu You could post this on discuss.mxnet.io for further details on usage of mxnet.

kalyc · 2018-06-15T18:57:12Z

@nswamy could you add label "Question", "CUDA" to this?

vrakesh · 2018-06-18T23:16:07Z

@nswamy requesting a label for "Question", or CUDA to this issue

ijkguo · 2018-07-13T01:08:05Z

It was probably related to old SoftmaxActivation layer. Now changed to mx.sym.softmax in #11373.

thomelane · 2018-08-17T21:01:51Z

@xioryu are you able to provide some sample code that reproduces this issue? many thanks!

It's not possible to diagnose from just knowing the error occurred on line pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32'). I'd expect any fatal error in the network to appear when this line is run, just because .asnumpy() blocks and waits for all the async operations to complete (i.e. waits for the network computation to complete).

ghost · 2018-08-20T14:34:11Z

@thomelane The new simplified repo is OK.

ijkguo · 2018-08-20T18:16:24Z

The cause of this issue is operator SoftmaxActivation, used in the old complex rcnn example. Two fixes were made and either fixed this issue:

fix a bug in cudnn softmax activation. #10918 fixed SoftmaxActivation cudnn bug.
update rcnn example #11373 removed any usage of SoftmaxActivation.

thomelane · 2018-08-20T21:12:29Z

@xioryu @ijkguo great, and thanks for confirming!

@sandeep-krishnamurthy good to close this ticket now, cheers.

sandeep-krishnamurthy added Question Example CUDA labels Jun 26, 2018

sandeep-krishnamurthy closed this as completed Aug 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rcnn example throws CUDNN_STATUS_BAD_PARAM when running under cudnn 6.0 #11240

rcnn example throws CUDNN_STATUS_BAD_PARAM when running under cudnn 6.0 #11240

ghost commented Jun 12, 2018 •

edited by ghost

Loading

kalyc commented Jun 12, 2018

kalyc commented Jun 15, 2018

vrakesh commented Jun 18, 2018

ijkguo commented Jul 13, 2018

thomelane commented Aug 17, 2018

ghost commented Aug 20, 2018

ijkguo commented Aug 20, 2018

thomelane commented Aug 20, 2018

rcnn example throws CUDNN_STATUS_BAD_PARAM when running under cudnn 6.0 #11240

rcnn example throws CUDNN_STATUS_BAD_PARAM when running under cudnn 6.0 #11240

Comments

ghost commented Jun 12, 2018 • edited by ghost Loading

kalyc commented Jun 12, 2018

kalyc commented Jun 15, 2018

vrakesh commented Jun 18, 2018

ijkguo commented Jul 13, 2018

thomelane commented Aug 17, 2018

ghost commented Aug 20, 2018

ijkguo commented Aug 20, 2018

thomelane commented Aug 20, 2018

ghost commented Jun 12, 2018 •

edited by ghost

Loading