DLModel's internalTransform will throw IllegalArgumentException on yarn #2268

qiuxin2012 · 2018-02-02T08:26:11Z

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 12 in stage 0.0 failed 4 times, most recent failure: Lost task 12.3 in stage 0.0 (TID 42, Gondolin-Node-077): java.lang.IllegalArgumentException: requirement failed: Engine.init: Node number is not initialized. Do you call Engine.init? See more at https://bigdl-project.github.io/master/#APIGuide/Engine/
	at scala.Predef$.require(Predef.scala:233)
	at com.intel.analytics.bigdl.utils.Engine$.nodeNumber(Engine.scala:203)
	at com.intel.analytics.bigdl.dataset.Utils$.getBatchSize(Utils.scala:26)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch.<init>(Transformer.scala:317)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch$.apply(Transformer.scala:375)
	at org.apache.spark.ml.DLModel$$anonfun$16$$anonfun$apply$1.apply(DLEstimator.scala:301)
	at org.apache.spark.ml.DLModel$$anonfun$16$$anonfun$apply$1.apply(DLEstimator.scala:292)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
	at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$collect$1.apply(DataFrame.scala:1503)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$collect$1.apply(DataFrame.scala:1503)
	at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1503)
	at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1480)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:70)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:37)
	at scala.Option.map(Option.scala:145)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$.main(ImagePredictor.scala:37)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor.main(ImagePredictor.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by a missing using of SampleToMiniBatch.
Shouldn't new SampleToMiniBatch in executor.

The text was updated successfully, but these errors were encountered:

qiuxin2012 · 2018-02-02T08:51:16Z

@hhbyyh
I change the code to create SampleToMiniBatch in driver. But I get an NoSuchMethodError when running with Spark 1.6.

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.ml.util.SchemaUtils$.appendColumn(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;)Lorg/apache/spark/sql/types/StructType;
	at org.apache.spark.ml.DLClassifierModel.transformSchema(DLClassifier.scala:81)
	at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:68)
	at org.apache.spark.ml.DLTransformerBase.transform(DLTransformerBase.scala:32)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:68)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$$anonfun$main$1.apply(ImagePredictor.scala:37)
	at scala.Option.map(Option.scala:145)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor$.main(ImagePredictor.scala:37)
	at com.intel.analytics.bigdl.example.imageclassification.ImagePredictor.main(ImagePredictor.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Spark_2.1 works fine.

hhbyyh · 2018-02-02T19:28:16Z

Thanks for reporting it. Is this tested against master or 0.4?
On yarn cluster?

yiheng · 2018-02-03T04:27:49Z

@lopelopelope why this is not caught in our integration test?

hhbyyh · 2018-02-03T06:47:14Z

I did some test with Standalone cluster and didn't meet error with 0.4. Please also attach the code producing the error. Thanks

qiuxin2012 · 2018-02-05T01:50:23Z

@hhbyyh On yarn, both 0.4.0 and master.
Standalone is OK, because mapPartitions is running on the driver actually.

hhbyyh · 2018-02-05T22:56:56Z

I tested on Yarn again and didn't get an error. We may need to sync again...

hhbyyh · 2018-02-06T23:11:50Z

Tested on Yarn with Spark 2.2 and Spark 1.6.3 and didn't meet an error.

On Spark 1.6.0 the second error java.lang.NoSuchMethodError: org.apache.spark.ml.util.SchemaUtils$.appendColumn(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;)Lorg/apache/spark/sql/types/StructType; was thrown. It's because SchemaUtils$.appendColumn was only defined in Spark 1.6.2.

qiuxin2012 · 2018-02-07T07:57:43Z

#2194 is fixing the first error

yiheng · 2018-02-11T08:21:45Z

@hhbyyh I can find that API in Spark 1.5, while it is changed in 1.6.2. So if you want to run on Spark 1.6.0, you may build with an option -Dspark.version=1.6.0.

our pr validation test build package against the test env

qiuxin2012 assigned hhbyyh Feb 2, 2018

qiuxin2012 changed the title ~~DLModel's internalTransform will throw IllegalArgumentException: requirement failed: Engine.init: Node number is not initialized.~~ DLModel's internalTransform will throw IllegalArgumentException Feb 2, 2018

qiuxin2012 changed the title ~~DLModel's internalTransform will throw IllegalArgumentException~~ DLModel's internalTransform will throw IllegalArgumentException on yarn Feb 5, 2018

yiheng closed this as completed Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DLModel's internalTransform will throw IllegalArgumentException on yarn #2268

DLModel's internalTransform will throw IllegalArgumentException on yarn #2268

qiuxin2012 commented Feb 2, 2018

qiuxin2012 commented Feb 2, 2018 •

edited

Loading

hhbyyh commented Feb 2, 2018

yiheng commented Feb 3, 2018

hhbyyh commented Feb 3, 2018

qiuxin2012 commented Feb 5, 2018

hhbyyh commented Feb 5, 2018

hhbyyh commented Feb 6, 2018

qiuxin2012 commented Feb 7, 2018 •

edited

Loading

yiheng commented Feb 11, 2018

DLModel's internalTransform will throw IllegalArgumentException on yarn #2268

DLModel's internalTransform will throw IllegalArgumentException on yarn #2268

Comments

qiuxin2012 commented Feb 2, 2018

qiuxin2012 commented Feb 2, 2018 • edited Loading

hhbyyh commented Feb 2, 2018

yiheng commented Feb 3, 2018

hhbyyh commented Feb 3, 2018

qiuxin2012 commented Feb 5, 2018

hhbyyh commented Feb 5, 2018

hhbyyh commented Feb 6, 2018

qiuxin2012 commented Feb 7, 2018 • edited Loading

yiheng commented Feb 11, 2018

qiuxin2012 commented Feb 2, 2018 •

edited

Loading

qiuxin2012 commented Feb 7, 2018 •

edited

Loading