-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNet 1.5.0.rc2] Issues with asnumpy() method #15431
Comments
Hey, this is the MXNet Label Bot. |
@Wallart Hi, thanks for this issue!
running the following commands works fine for me on cpu with mxnet mkldnn
|
I have the same results as you both on cpu and gpu(0) |
I just discovered that crashes are not random. When I put a breakpoint on faulty lines there is no exceptions when I resume the code execution. Same error with that type of instructions. Line is passing when a breakpoint is used
|
Hi @Wallart running the two faulty lines you provided alone does not throw any errors. Maybe the probelms is when running above code with dataloader & multiple workers? I'm still not able to reprdudce your error with your two lines |
Hi @Wallart are you able to come up with a reproduciable example? I suspect it may only occur when using dataloader with multi-workers, but i can't reproduce the error. You can also turn on DEBUG flag to get more information on the error. |
@roywei I put the source code on Github. |
@mxnet-label-bot add [Bug] |
@Wallart your code is running fine on my machine. I'm testing on LJ-Speech Dataset added a few lines in
|
@roywei I reproduced the exact same environment outside the Docker. Still using intelpython through miniconda, sames build flags, and the error is still here EDIT : I uninstalled my custom build of mxnet and did a 'pip install mxnet==1.5.0' and now it's working (I don't know what flags are used for the pip release). I will build the newer 1.5.0 to see if I can reproduce. |
Same problem. I am building MXNet with something that makes it crash EDIT : 'pip install mxnet-cu101-mkl==1.5.0' works. I will try to rebuild without LAPACK flag |
You can find the build flags used for pip packages here: https://github.com/apache/incubator-mxnet/tree/master/make/pip |
I finally found the issue. At runtime I was linking the outdated MKLDNN (v0.14) provided by Anaconda whereas MXNet is probably using v1.0+ |
@Wallart, the master branch uses v0.20 of MKL-DNN while the 1.5.0 release uses v0.19. Please use the self-contained MKL-DNN in MXNet. The anaconda MKL-DNN distribution is not actively maintained. |
This issue came back if using mxnet 1.6 and softmax outputs
|
Hi @aGiant , I think your issue is a different problem. Looking at the stack trace it says |
Hello,
I've decided to try MXNet 1.5.0.rc2.
And I have a lot of crashes due to asnumpy() calls like the following one :
phase = nd.array(np.arctan2(imag_part.asnumpy(), real_part.asnumpy()))
The same issue occurs if I store asnumpy() result in a temporary variable. The crash seems random.
I didn't had any problems in MXNet 1.4.1
The text was updated successfully, but these errors were encountered: