You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have and EMR cluster on which spark is running , and another EMR cluster on which hbase is running , I have created a table named 'TableForSpark' on it and I'm trying to write data to it using the following code:
import it.nerdammer.spark.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
//import org.apache.spark.sql.execution.datasources.hbase._
object hbaseTest {
def main( args: Array[String] ){
val conf = new SparkConf().setAppName("Hbase test")
//conf.set("spark.hbase.host", "192.168.0.23")
val sc = new SparkContext(conf)
val rdd = sc.parallelize(1 to 10).map(i => (i.toString, i+1, "Hello"))
val rdd1 = rdd.toHBaseTable("TableForSpark").toColumns("column1", "column1").inColumnFamily("cf")
rdd1.save()
}
}
I have built 'spark-hbase-connector' using scala 2.11.8 on spark 2.0.0.
When I submit the job using the following command , it gets stuck up in the last stage:
sudo spark-submit --deploy-mode client --jars $(echo lib/*.jar | tr ' ' ',') --class com.oreilly.learningsparkexamples.hbaseTest target/scala-2.11/hbase-test_2.11-0.0.1.jar
I have also kept hbase-site.xml file in the resource folder and the program is correctly picking up the zookeeper ip from it.
I have checked the logs of the task , it is able to connect to the zookeeper but not able to write to hbase, Could any throw some light on the problem.
The last part of the log looks like this:
16/08/18 11:48:35 INFO YarnClientSchedulerBackend: Application application_1470825934412_0088 has started running.
16/08/18 11:48:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46496.
16/08/18 11:48:35 INFO NettyBlockTransferService: Server created on 10.60.0.xxx:46496
16/08/18 11:48:35 INFO BlockManager: external shuffle service port = 7337
16/08/18 11:48:35 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.60.0.13, 46496)
16/08/18 11:48:35 INFO BlockManagerMasterEndpoint: Registering block manager 10.60.0.xxx:46496 with 414.4 MB RAM, BlockManagerId(driver, 10.60.0.xxx, 46496)
16/08/18 11:48:35 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.60.0.13, 46496)
16/08/18 11:48:36 INFO EventLoggingListener: Logging events to hdfs:///var/log/spark/apps/application_1470825934412_0088
16/08/18 11:48:36 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
16/08/18 11:48:36 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/08/18 11:48:36 INFO SparkContext: Starting job: saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102
16/08/18 11:48:36 INFO DAGScheduler: Got job 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102) with 2 output partitions
16/08/18 11:48:36 INFO DAGScheduler: Final stage: ResultStage 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102)
16/08/18 11:48:36 INFO DAGScheduler: Parents of final stage: List()
16/08/18 11:48:36 INFO DAGScheduler: Missing parents: List()
16/08/18 11:48:36 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66), which has no missing parents
16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 89.1 KB, free 414.4 MB)
16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.2 KB, free 414.3 MB)
16/08/18 11:48:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.60.0.13:46496 (size: 33.2 KB, free: 414.4 MB)
16/08/18 11:48:37 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/18 11:48:37 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66)
16/08/18 11:48:37 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
16/08/18 11:48:37 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.60.0.134:53842) with ID 1
16/08/18 11:48:42 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1)
16/08/18 11:48:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-60-0-xxx.ec2.internal, partition 0, PROCESS_LOCAL, 5427 bytes)
16/08/18 11:48:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-10-60-0-xxx.ec2.internal, partition 1, PROCESS_LOCAL, 5484 bytes)
16/08/18 11:48:42 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-60-0-xxx.ec2.internal:34581 with 2.8 GB RAM, BlockManagerId(1, ip-10-60-0-134.ec2.internal, 34581)
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal.
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal.
16/08/18 11:48:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-60-0-xxx.ec2.internal:34581 (size: 33.2 KB, free: 2.8 GB)
It gets stuck up at this point.
Thanks & Regards,
Surender.
The text was updated successfully, but these errors were encountered:
I have and EMR cluster on which spark is running , and another EMR cluster on which hbase is running , I have created a table named 'TableForSpark' on it and I'm trying to write data to it using the following code:
import it.nerdammer.spark.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
//import org.apache.spark.sql.execution.datasources.hbase._
object hbaseTest {
def main( args: Array[String] ){
val conf = new SparkConf().setAppName("Hbase test")
//conf.set("spark.hbase.host", "192.168.0.23")
val sc = new SparkContext(conf)
}
}
I have built 'spark-hbase-connector' using scala 2.11.8 on spark 2.0.0.
When I submit the job using the following command , it gets stuck up in the last stage:
sudo spark-submit --deploy-mode client --jars $(echo lib/*.jar | tr ' ' ',') --class com.oreilly.learningsparkexamples.hbaseTest target/scala-2.11/hbase-test_2.11-0.0.1.jar
I have also kept hbase-site.xml file in the resource folder and the program is correctly picking up the zookeeper ip from it.
I have checked the logs of the task , it is able to connect to the zookeeper but not able to write to hbase, Could any throw some light on the problem.
The last part of the log looks like this:
16/08/18 11:48:35 INFO YarnClientSchedulerBackend: Application application_1470825934412_0088 has started running.
16/08/18 11:48:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46496.
16/08/18 11:48:35 INFO NettyBlockTransferService: Server created on 10.60.0.xxx:46496
16/08/18 11:48:35 INFO BlockManager: external shuffle service port = 7337
16/08/18 11:48:35 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.60.0.13, 46496)
16/08/18 11:48:35 INFO BlockManagerMasterEndpoint: Registering block manager 10.60.0.xxx:46496 with 414.4 MB RAM, BlockManagerId(driver, 10.60.0.xxx, 46496)
16/08/18 11:48:35 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.60.0.13, 46496)
16/08/18 11:48:36 INFO EventLoggingListener: Logging events to hdfs:///var/log/spark/apps/application_1470825934412_0088
16/08/18 11:48:36 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
16/08/18 11:48:36 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/08/18 11:48:36 INFO SparkContext: Starting job: saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102
16/08/18 11:48:36 INFO DAGScheduler: Got job 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102) with 2 output partitions
16/08/18 11:48:36 INFO DAGScheduler: Final stage: ResultStage 0 (saveAsNewAPIHadoopDataset at HBaseWriterBuilder.scala:102)
16/08/18 11:48:36 INFO DAGScheduler: Parents of final stage: List()
16/08/18 11:48:36 INFO DAGScheduler: Missing parents: List()
16/08/18 11:48:36 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66), which has no missing parents
16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 89.1 KB, free 414.4 MB)
16/08/18 11:48:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.2 KB, free 414.3 MB)
16/08/18 11:48:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.60.0.13:46496 (size: 33.2 KB, free: 414.4 MB)
16/08/18 11:48:37 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/18 11:48:37 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at map at HBaseWriterBuilder.scala:66)
16/08/18 11:48:37 INFO YarnScheduler: Adding task set 0.0 with 2 tasks
16/08/18 11:48:37 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.60.0.134:53842) with ID 1
16/08/18 11:48:42 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1)
16/08/18 11:48:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-60-0-xxx.ec2.internal, partition 0, PROCESS_LOCAL, 5427 bytes)
16/08/18 11:48:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-10-60-0-xxx.ec2.internal, partition 1, PROCESS_LOCAL, 5484 bytes)
16/08/18 11:48:42 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-60-0-xxx.ec2.internal:34581 with 2.8 GB RAM, BlockManagerId(1, ip-10-60-0-134.ec2.internal, 34581)
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal.
16/08/18 11:48:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: ip-10-60-0-xxx.ec2.internal.
16/08/18 11:48:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-60-0-xxx.ec2.internal:34581 (size: 33.2 KB, free: 2.8 GB)
It gets stuck up at this point.
Thanks & Regards,
Surender.
The text was updated successfully, but these errors were encountered: