Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD #73

Open
aashishrtyagi opened this issue Feb 22, 2018 · 1 comment

Comments

@aashishrtyagi
Copy link

Hi,
I used the following code as given in example to connect to hbase but i am facing class cast exception .

package com.gs
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import it.nerdammer.spark.hbase._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.SparkSession
object Test extends App{

object empSchema {
val stid = StructField("stid", StringType)
val name = StructField("name", StringType)
val subject = StructField("subject", StringType)
val grade = StructField("grade", StringType)
val city = StructField("city", StringType)
val struct = StructType(Array(stid, name, subject, grade, city))
}
val sparkConf = new SparkConf().setMaster("spark://myhostname:7077").setAppName("TestApp")

sparkConf.set("spark.hbase.host","myhostname")

val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(1 to 100)
.map(i => (i.toString, i+1, "Hello"))

rdd.toHBaseTable("mytable")
.toColumns("column1", "column2")
.inColumnFamily("mycf")
.save()
}

============Exception stack trace =====================
8/02/22 15:45:46 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on 192.168.224.116, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 1]
18/02/22 15:45:46 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, 192.168.224.116, executor 1, partition 0, PROCESS_LOCAL, 4829 bytes)
18/02/22 15:45:46 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, 192.168.224.116, executor 1, partition 1, PROCESS_LOCAL, 4886 bytes)
18/02/22 15:45:46 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on 192.168.224.116, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 2]
18/02/22 15:45:46 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, 192.168.224.116, executor 1, partition 0, PROCESS_LOCAL, 4829 bytes)
18/02/22 15:45:46 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on 192.168.224.116, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 3]
18/02/22 15:45:46 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, 192.168.224.116, executor 1, partition 1, PROCESS_LOCAL, 4886 bytes)
18/02/22 15:45:46 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on 192.168.224.116, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 4]
18/02/22 15:45:46 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, 192.168.224.116, executor 1, partition 0, PROCESS_LOCAL, 4829 bytes)
18/02/22 15:45:46 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on 192.168.224.116, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 5]
18/02/22 15:45:46 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, 192.168.224.116, executor 1, partition 1, PROCESS_LOCAL, 4886 bytes)
18/02/22 15:45:46 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 7) on 192.168.224.116, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 6]
18/02/22 15:45:46 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
18/02/22 15:45:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/02/22 15:45:46 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on 192.168.224.116, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 7]
18/02/22 15:45:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/02/22 15:45:46 INFO TaskSchedulerImpl: Cancelling stage 0
18/02/22 15:45:46 INFO DAGScheduler: ResultStage 0 (runJob at SparkHadoopMapReduceWriter.scala:88) failed in 6.777 s due to Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, 192.168.224.116, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
18/02/22 15:45:46 INFO DAGScheduler: Job 0 failed: runJob at SparkHadoopMapReduceWriter.scala:88, took 7.208332 s
18/02/22 15:45:46 ERROR SparkHadoopMapReduceWriter: Aborting job job_20180222154539_0002.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, 192.168.224.116, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

========== Spark and hbase version ===================
spark : spark-2.2.1
hbase : hbase-1.2.6
spark-hbase connector jar version : spark-hbase-connector_2.10-1.0.3.jar

@liijiankang
Copy link

@aashishrtyagi Is this problem solved? I also encountered this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants