Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLModel's internalTransform will throw IllegalArgumentException on yarn #2268

Closed
qiuxin2012 opened this issue Feb 2, 2018 · 9 comments
Closed
Assignees

Comments

@qiuxin2012
Copy link
Contributor

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 12 in stage 0.0 failed 4 times, most recent failure: Lost task 12.3 in stage 0.0 (TID 42, Gondolin-Node-077): java.lang.IllegalArgumentException: requirement failed: Engine.init: Node number is not initialized. Do you call Engine.init? See more at https://bigdl-project.github.io/master/#APIGuide/Engine/
	at scala.Predef$.require(Predef.scala:233)
	at com.intel.analytics.bigdl.utils.Engine$.nodeNumber(Engine.scala:203)
	at com.intel.analytics.bigdl.dataset.Utils$.getBatchSize(Utils.scala:26)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch.<init>(Transformer.scala:317)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch$.apply(Transformer.scala:375)
	at org.apache.spark.ml.DLModel$$anonfun$16$$anonfun$apply$1.apply(DLEstimator.scala:301)
	at org.apache.spark.ml.DLModel$$anonfun$16$$anonfun$apply$1.apply(DLEstimator.scala:292)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
	at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$collect$1.apply(DataFrame.scala:1503)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$collect$1.apply(DataFrame.scala:1503)
	at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1503)
	at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1480)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:70)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:37)
	at scala.Option.map(Option.scala:145)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$.main(ImagePredictor.scala:37)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor.main(ImagePredictor.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by a missing using of SampleToMiniBatch.
Shouldn't new SampleToMiniBatch in executor.

@qiuxin2012 qiuxin2012 changed the title DLModel's internalTransform will throw IllegalArgumentException: requirement failed: Engine.init: Node number is not initialized. DLModel's internalTransform will throw IllegalArgumentException Feb 2, 2018
@qiuxin2012
Copy link
Contributor Author

qiuxin2012 commented Feb 2, 2018

@hhbyyh
I change the code to create SampleToMiniBatch in driver. But I get an NoSuchMethodError when running with Spark 1.6.

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.ml.util.SchemaUtils$.appendColumn(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;)Lorg/apache/spark/sql/types/StructType;
	at org.apache.spark.ml.DLClassifierModel.transformSchema(DLClassifier.scala:81)
	at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:68)
	at org.apache.spark.ml.DLTransformerBase.transform(DLTransformerBase.scala:32)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:68)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:37)
	at scala.Option.map(Option.scala:145)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$.main(ImagePredictor.scala:37)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor.main(ImagePredictor.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Spark_2.1 works fine.

@hhbyyh
Copy link

hhbyyh commented Feb 2, 2018

Thanks for reporting it. Is this tested against master or 0.4?
On yarn cluster?

@yiheng
Copy link
Contributor

yiheng commented Feb 3, 2018

@lopelopelope why this is not caught in our integration test?

@hhbyyh
Copy link

hhbyyh commented Feb 3, 2018

I did some test with Standalone cluster and didn't meet error with 0.4. Please also attach the code producing the error. Thanks

@qiuxin2012 qiuxin2012 changed the title DLModel's internalTransform will throw IllegalArgumentException DLModel's internalTransform will throw IllegalArgumentException on yarn Feb 5, 2018
@qiuxin2012
Copy link
Contributor Author

@hhbyyh On yarn, both 0.4.0 and master.
Standalone is OK, because mapPartitions is running on the driver actually.

@hhbyyh
Copy link

hhbyyh commented Feb 5, 2018

I tested on Yarn again and didn't get an error. We may need to sync again...

@hhbyyh
Copy link

hhbyyh commented Feb 6, 2018

Tested on Yarn with Spark 2.2 and Spark 1.6.3 and didn't meet an error.

On Spark 1.6.0 the second error java.lang.NoSuchMethodError: org.apache.spark.ml.util.SchemaUtils$.appendColumn(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;)Lorg/apache/spark/sql/types/StructType; was thrown. It's because SchemaUtils$.appendColumn was only defined in Spark 1.6.2.

@qiuxin2012
Copy link
Contributor Author

qiuxin2012 commented Feb 7, 2018

#2194 is fixing the first error

@yiheng
Copy link
Contributor

yiheng commented Feb 11, 2018

@hhbyyh I can find that API in Spark 1.5, while it is changed in 1.6.2. So if you want to run on Spark 1.6.0, you may build with an option -Dspark.version=1.6.0.

our pr validation test build package against the test env

@yiheng yiheng closed this as completed Mar 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants