Not able to write to an hbase table #38

surrey-kapkoti · 2016-08-18T12:00:18Z

I have and EMR cluster on which spark is running , and another EMR cluster on which hbase is running , I have created a table named 'TableForSpark' on it and I'm trying to write data to it using the following code:

import it.nerdammer.spark.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
//import org.apache.spark.sql.execution.datasources.hbase._
object hbaseTest {
def main( args: Array[String] ){
val conf = new SparkConf().setAppName("Hbase test")
//conf.set("spark.hbase.host", "192.168.0.23")
val sc = new SparkContext(conf)

val rdd = sc.parallelize(1 to 10).map(i => (i.toString, i+1, "Hello"))

val rdd1 = rdd.toHBaseTable("TableForSpark").toColumns("column1", "column1").inColumnFamily("cf")
rdd1.save()

}
}

I have built 'spark-hbase-connector' using scala 2.11.8 on spark 2.0.0.

When I submit the job using the following command , it gets stuck up in the last stage:
sudo spark-submit --deploy-mode client --jars $(echo lib/*.jar | tr ' ' ',') --class com.oreilly.learningsparkexamples.hbaseTest target/scala-2.11/hbase-test_2.11-0.0.1.jar

I have also kept hbase-site.xml file in the resource folder and the program is correctly picking up the zookeeper ip from it.

I have checked the logs of the task , it is able to connect to the zookeeper but not able to write to hbase, Could any throw some light on the problem.

The last part of the log looks like this:

16/08/18 11:48:35 INFO YarnClientSchedulerBackend: Application application_1470825934412_0088 has started running.
16/08/18 11:48:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46496.
16/08/18 11:48:35 INFO NettyBlockTransferService: Server created on 10.60.0.xxx:46496
16/08/18 11:48:35 INFO BlockManager: external shuffle service port = 7337
16/08/18 11:48:35 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.60.0.13, 46496)
16/08/18 11:48:35 INFO BlockManagerMasterEndpoint: Registering block manager 10.60.0.xxx:46496 with 414.4 MB RAM, BlockManagerId(driver, 10.60.0.xxx, 46496)
16/08/18 11:48:35 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.60.0.13, 46496)
16/08/18 11:48:36 INFO EventLoggingListener: Logging events to hdfs:///var/log/spark/apps/application_1470825934412_0088
16/08/18 11:48:36 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
16/08/18 11:48:36 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/08/18 11:48:36 INFO SparkContext: Starting job: saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102
16/08/18 11:48:36 INFO DAGScheduler: Got job 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102) with 2 output partitions
16/08/18 11:48:36 INFO DAGScheduler: Final stage: ResultStage 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102)
16/08/18 11:48:36 INFO DAGScheduler: Parents of final stage: List()
16/08/18 11:48:36 INFO DAGScheduler: Missing parents: List()
16/08/18 11:48:36 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66), which has no missing parents
16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 89.1 KB, free 414.4 MB)
16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.2 KB, free 414.3 MB)
16/08/18 11:48:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.60.0.13:46496 (size: 33.2 KB, free: 414.4 MB)
16/08/18 11:48:37 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/18 11:48:37 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66)
16/08/18 11:48:37 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
16/08/18 11:48:37 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.60.0.134:53842) with ID 1
16/08/18 11:48:42 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1)
16/08/18 11:48:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-60-0-xxx.ec2.internal, partition 0, PROCESS_LOCAL, 5427 bytes)
16/08/18 11:48:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-10-60-0-xxx.ec2.internal, partition 1, PROCESS_LOCAL, 5484 bytes)
16/08/18 11:48:42 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-60-0-xxx.ec2.internal:34581 with 2.8 GB RAM, BlockManagerId(1, ip-10-60-0-134.ec2.internal, 34581)
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal.
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal.
16/08/18 11:48:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-60-0-xxx.ec2.internal:34581 (size: 33.2 KB, free: 2.8 GB)

It gets stuck up at this point.

Thanks & Regards,
Surender.

The text was updated successfully, but these errors were encountered:

fbbergamo · 2017-11-23T18:58:17Z

@surrey-kapkoti any solution on that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to write to an hbase table #38

Not able to write to an hbase table #38

surrey-kapkoti commented Aug 18, 2016

fbbergamo commented Nov 23, 2017

Not able to write to an hbase table #38

Not able to write to an hbase table #38

Comments

surrey-kapkoti commented Aug 18, 2016

fbbergamo commented Nov 23, 2017