Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[scala-package] Running example on spark has the error java bin #4967

Closed
yxzf opened this issue Feb 10, 2017 · 8 comments
Closed

[scala-package] Running example on spark has the error java bin #4967

yxzf opened this issue Feb 10, 2017 · 8 comments

Comments

@yxzf
Copy link

yxzf commented Feb 10, 2017

@Javelinjs @CodingCat
I run the spark example of mxnet on spark cluster. The error raised in the staring of scheduler.
The main error is:
java.io.IOException: Cannot run program "java": error=2, No such file or directory
I was confused, and tried to change it to the java bin '/usr/bin/java', but still failed.

I compiled mxnet with the latest version and the environment of cluster is centos 6.5

This is the logs:

17/02/10 15:17:03 INFO FileInputFormat: Total input paths to process : 1
17/02/10 15:17:03 INFO MXNet: repartitioning training set to 10 partitions
17/02/10 15:17:03 INFO MXNet: Starting scheduler on 10.16.43.26:38348
17/02/10 15:17:03 INFO ParameterServer: Start process: java -cp /data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_25560/spark-4b7478a0-ca31-48ca-8e6d-62540f7935ae/userFiles-855124ea-e736-4690-9781-82b35e3e58bd/mxnet-core_2.11-0.1.2-SNAPSHOT.jar:/data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_25560/spark-4b7478a0-ca31-48ca-8e6d-62540f7935ae/userFiles-855124ea-e736-4690-9781-82b35e3e58bd/mxnet-full_2.11-linux-x86_64-cpu-0.1.2-SNAPSHOT.jar:/data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_25560/spark-4b7478a0-ca31-48ca-8e6d-62540f7935ae/userFiles-855124ea-e736-4690-9781-82b35e3e58bd/mxnet-spark_2.11-0.1.2-SNAPSHOT.jar ml.dmlc.mxnet.spark.ParameterServer --role=scheduler --root-uri=10.16.43.26 --root-port=38348 --num-server=1 --num-worker=10 --timeout=300
java.io.IOException: Cannot run program "java": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at java.lang.Runtime.exec(Runtime.java:617)
	at java.lang.Runtime.exec(Runtime.java:450)
	at java.lang.Runtime.exec(Runtime.java:347)
	at ml.dmlc.mxnet.spark.ParameterServer.startProcess(ParameterServer.scala:132)
	at ml.dmlc.mxnet.spark.MXNet.fit(MXNet.scala:130)
	at com.meituan.pay.ClassificationExample$.main(App.scala:51)
	at com.meituan.pay.ClassificationExample.main(App.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 12 more
17/02/10 15:17:03 ERROR ClassificationExample: requirement failed: Failed to start ps scheduler process
java.lang.IllegalArgumentException: requirement failed: Failed to start ps scheduler process

@CodingCat
Copy link
Contributor

CodingCat commented Feb 10, 2017

in your terminal, after you type java, what you got?

@yzhliu
Copy link
Member

yzhliu commented Feb 10, 2017

You can use ml.dmlc.mxnet.spark.MXNet.setJava to specify the java binary.

@yxzf
Copy link
Author

yxzf commented Feb 11, 2017

@CodingCat In my terminal, when I type java, I get this

Usage: java [-options] class [args...]
           (to execute a class)
   or  java [-options] -jar jarfile [args...]
           (to execute a jar file)
where options include:
    -d32	  use a 32-bit data model if available
    -d64	  use a 64-bit data model if available
    -server	  to select the "server" VM
                  The default VM is server,
                  because you are running on a server-class machine.
...
...
...

@yxzf
Copy link
Author

yxzf commented Feb 11, 2017

@Javelinjs I run the example https://github.com/dmlc/mxnet/blob/master/scala-package/spark/src/main/scala/ml/dmlc/mxnet/spark/example/ClassificationExample.scala

It calls ml.dmlc.mxnet.spark.MXNet.setJava to specify the java binary.

I see in the source code, the start process of scheduler is in ParameterServer.scala, the error raised in here

val cp = if (classpath == null) "" else s"-cp $classpath"
    val cmd = s"$java $jvmOpts $cp $runningClass " +
      s"--role=$role --root-uri=$rootUri --root-port=$rootPort " +
      s"--num-server=$numServer --num-worker=$numWorker --timeout=$timeout"
    logger.info(s"Start process: $cmd")
    try {
      val childProcess = Runtime.getRuntime.exec(cmd)

From my spark log, I got this:

Start process: java -cp /data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_285419/spark-c7a44e41-5826-425c-9754-30f54279e209/userFiles-d45b4f4c-4e49-4051-a5e9-1be5d2023b59/mxnet-core_2.11-0.1.2-SNAPSHOT.jar:/data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_285419/spark-c7a44e41-5826-425c-9754-30f54279e209/userFiles-d45b4f4c-4e49-4051-a5e9-1be5d2023b59/mxnet-full_2.11-linux-x86_64-cpu-0.1.2-SNAPSHOT.jar:/data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_285419/spark-c7a44e41-5826-425c-9754-30f54279e209/userFiles-d45b4f4c-4e49-4051-a5e9-1be5d2023b59/mxnet-spark_2.11-0.1.2-SNAPSHOT.jar ml.dmlc.mxnet.spark.ParameterServer --role=scheduler --root-uri=10.16.186.35 --root-port=54270 --num-server=1 --num-worker=10 --timeout=300
java.io.IOException: Cannot run program "java": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at java.lang.Runtime.exec(Runtime.java:617)
	at java.lang.Runtime.exec(Runtime.java:450)
	at java.lang.Runtime.exec(Runtime.java:347)
	at ml.dmlc.mxnet.spark.ParameterServer.startProcess(ParameterServer.scala:132)
	at ml.dmlc.mxnet.spark.MXNet.fit(MXNet.scala:130)
	at com.meituan.pay.ClassificationExample$.main(App.scala:51)
	at com.meituan.pay.ClassificationExample.main(App.scala)

I was confused, why the error is Cannot run program "java": error=2, No such file or directory, not this Cannot run program java -cp ... . I mean it just say 'java', not the whole cmd?

@yzhliu
Copy link
Member

yzhliu commented Feb 11, 2017

Seems that the environment PATH is not set properly for subprocess to find java. Try specifying --java to what whereis java shows.

@yxzf
Copy link
Author

yxzf commented Feb 11, 2017

@Javelinjs

whereis java
java: /usr/bin/java /usr/local/java /usr/share/java

And I set the --java to /usr/bin/java, but still has the error:

17/02/11 12:24:00 INFO ParameterServer: Start process: /usr/bin/java -cp /data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_293387/spark-751bc30c-469f-4515-af55-05f7b6f3eed9/userFiles-5f0436a5-9b42-463e-84d9-c9d4405b15dc/mxnet-core_2.11-0.1.2-SNAPSHOT.jar:/data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_293387/spark-751bc30c-469f-4515-af55-05f7b6f3eed9/userFiles-5f0436a5-9b42-463e-84d9-c9d4405b15dc/mxnet-full_2.11-linux-x86_64-cpu-0.1.2-SNAPSHOT.jar:/data2/hadoop/yarn/nm-local-dir/usercache/hadoop-pay-dev/appcache/application_1486701291637_293387/spark-751bc30c-469f-4515-af55-05f7b6f3eed9/userFiles-5f0436a5-9b42-463e-84d9-c9d4405b15dc/mxnet-spark_2.11-0.1.2-SNAPSHOT.jar ml.dmlc.mxnet.spark.ParameterServer --role=scheduler --root-uri=10.16.55.39 --root-port=21904 --num-server=1 --num-worker=10 --timeout=300
java.io.IOException: Cannot run program "/usr/bin/java": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at java.lang.Runtime.exec(Runtime.java:617)
	at java.lang.Runtime.exec(Runtime.java:450)
	at java.lang.Runtime.exec(Runtime.java:347)

@yxzf
Copy link
Author

yxzf commented Feb 14, 2017

set --java to /usr/bin/java/bin/java

@szha
Copy link
Member

szha commented Sep 28, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!

@szha szha closed this as completed Sep 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants