-
Notifications
You must be signed in to change notification settings - Fork 6.8k
inference results unstable in mxnet_mkl-1.2.0b20180416 #10580
Comments
Is it possible for you to give a repeatable model for this issue? Thanks. |
try this. python3 inference.py |
did you try new mkldnn backbend? |
could you provide us a minimum script to reproduce the error? Thanks |
@dwSun Thanks for the scripts. Here is my update: |
Does it mean the bug is in the MKLDNN library? |
I guess yes, but not sure. I opened MXNET_MKLDNN_DEBUG but no complaints. Still need minimum case to reproduce it. |
On latest master, this issue for @dwSun 's script can be resolved by removing below line from mkldnn_convolution.cc
However this change only works for inference. We still need more comprehensive solution for it. |
The reason we push async here is to change the layout of weight arrays during inference so that we don't need to change the layout every time. How is @dwSun code different from here https://github.com/apache/incubator-mxnet/blob/master/tests/python/gpu/test_gluon_model_zoo_gpu.py#L41 |
update: Thanks. |
@TaoLv can you provide a pre-compiled pip package? |
@dwSun the bug should have been fixed. the code has been merged to the master branch. Could you please try and see if the fix has solved your problem? Thanks. |
Is there any nightly build or something like this?
No idea what I should do next (。﹏。). |
what is |
|
@dwSun when I unzip the file, it's a single file of 13.4MB. It doesn't contain |
@zheng-da try this command: tar -vxf issue_10580.gz |
I tried your script on my own machine with PR #10731. |
waiting for PR #10731 to be merged |
just tested with mxnet-mkl-1.2.0b20180508, it works well. |
sysinfo
Python 3.6.5
debian sid
desc
I am using mx.mod.Module to build my inference program.
When run a same inference several time without restart my program the results are unstable, only the first one is correct, others are differ from each other.
Then I change back to mxnet-mkl 1.1.0, the results become same again.
The text was updated successfully, but these errors were encountered: