MNIST training example not exiting on CPU instances after training completion #5065

sampathchanda · 2017-02-20T00:18:06Z

Hi,

Am running MNIST training on 2 CPU instances using MXNET with 10 epochs (Default script, provided). However, even after the training is all done ( I could see that the execution reaches the end of train_mnist.py script), execution isn't exiting.

Can anyone help me with this issue.

Environment info

Operating System: Amazon Linux
Package used (Python/R/Scala/Julia): Python
MXNet Installed from sources
MXNet commit hash (git rev-parse HEAD): 266e439
Python version and distribution: Python 2.7.12

Steps to reproduce

cd $HOME/mxnet/examples/image_classification/
../../tools/launch.py -n 1 -H hosts python train_mnist.py

hosts file consists of the worker hostnames (2 workers in my case, as follows):
deeplearning-worker1
deeplearning-worker2

The text was updated successfully, but these errors were encountered:

Soonhwan-Kwon · 2017-04-16T10:24:44Z

If you launched it by ../../tools/launch.py, there is ../../tools/kill-mxnet.py to kill the process.

sampathchanda · 2017-04-17T18:22:17Z

@Soonhwan-Kwon Thanks for the comment. But it seems that there was some issue with the way i setup my environment for running the example. I fixed the issue, with help from @qiyuangong in issue #5094.

For those who are also facing this issue, please follow the steps mentioned by @qiyuangong in the issue #5094 to get the scripts working as expected. Am now able to successfully run the example given on multi CPU cluster, and hence closing this issue.

matt32106 mentioned this issue Apr 8, 2017

[QA] why not all examples run out of the box? #5717

Open

sampathchanda closed this as completed Apr 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNIST training example not exiting on CPU instances after training completion #5065

MNIST training example not exiting on CPU instances after training completion #5065

sampathchanda commented Feb 20, 2017

Soonhwan-Kwon commented Apr 16, 2017 •

edited

Loading

sampathchanda commented Apr 17, 2017 •

edited

Loading

MNIST training example not exiting on CPU instances after training completion #5065

MNIST training example not exiting on CPU instances after training completion #5065

Comments

sampathchanda commented Feb 20, 2017

Environment info

Steps to reproduce

Soonhwan-Kwon commented Apr 16, 2017 • edited Loading

sampathchanda commented Apr 17, 2017 • edited Loading

Soonhwan-Kwon commented Apr 16, 2017 •

edited

Loading

sampathchanda commented Apr 17, 2017 •

edited

Loading