You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Am running MNIST training on 2 CPU instances using MXNET with 10 epochs (Default script, provided). However, even after the training is all done ( I could see that the execution reaches the end of train_mnist.py script), execution isn't exiting.
Can anyone help me with this issue.
Environment info
Operating System: Amazon Linux
Package used (Python/R/Scala/Julia): Python
MXNet Installed from sources
MXNet commit hash (git rev-parse HEAD): 266e439
Python version and distribution: Python 2.7.12
@Soonhwan-Kwon Thanks for the comment. But it seems that there was some issue with the way i setup my environment for running the example. I fixed the issue, with help from @qiyuangong in issue #5094.
For those who are also facing this issue, please follow the steps mentioned by @qiyuangong in the issue #5094 to get the scripts working as expected. Am now able to successfully run the example given on multi CPU cluster, and hence closing this issue.
Hi,
Am running MNIST training on 2 CPU instances using MXNET with 10 epochs (Default script, provided). However, even after the training is all done ( I could see that the execution reaches the end of train_mnist.py script), execution isn't exiting.
Can anyone help me with this issue.
Environment info
Operating System: Amazon Linux
Package used (Python/R/Scala/Julia): Python
MXNet Installed from sources
MXNet commit hash (
git rev-parse HEAD
): 266e439Python version and distribution: Python 2.7.12
Steps to reproduce
hosts file consists of the worker hostnames (2 workers in my case, as follows):
deeplearning-worker1
deeplearning-worker2
The text was updated successfully, but these errors were encountered: