-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Sockeye failure with MXNet #15297
Comments
Hey, this is the MXNet Label Bot. |
@anirudh2290 thanks for the issue. |
I m still not able to reproduce the crash. My steps are below, could you help point out what's wrong? Machine: AWS Deeplearning Base AMI, P3.8xLarge, Ubuntu 16.05
Sockeye:
changed requirements
run:
result:
|
did you modify the mxnet version in requirements file ? |
@pengzhao-intel @roywei I am currently building with @ZhennanQin cmmit and will try it out. |
With the PR : #15298 also it segfaults and core dumps. |
Thanks, we are trying to reproduce the crash (we can't reproduce till now). |
@pengzhao-intel were you able to reproduce. Did you make sure you modified requirements file in sockeye? |
I was able to reproduce the failure at |
@anirudh2290 yes, we can reproduce the issue and WIP to fix it :) |
Confirmed that this can be reproduced. Need more time to investigate. |
@mxnet-label-bot add [Build, Bug] |
#15298 can fix sockeye failure on my machine. @anirudh2290 @roywei Please have a try. |
Hi @ZhennanQin, could you give a small explanation of the issue and the fix? I see 2 changes in that PR - one related to converting older models and 1 that looks like no-op (moving from arrays to arrays_with_in_out). Which one fixes the segfault? |
@ptrendx They worked together to fix the segfault. |
Description
Install sockeye and run python setup.py test.
Change line in requirements.txt and requirements.gpu-cu100.txt and change mxnet version to nightly after commit at or after 09202f7.
Run the following from inside sockeye directory.
The text was updated successfully, but these errors were encountered: