Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run SparkBWA on Amazon EMR Yarn cluster #55

Open
Maryom opened this issue Jan 20, 2018 · 1 comment
Open

Can't run SparkBWA on Amazon EMR Yarn cluster #55

Maryom opened this issue Jan 20, 2018 · 1 comment

Comments

@Maryom
Copy link

Maryom commented Jan 20, 2018

Hi,

Thanks for this repo.

I’m trying to run SparkBWA on Amazon EMR Yarn cluster, but I got many errors.

I wrote yarn instead of yarn-cluster and also I wrote the --deploy-mode cluster

Then, I got the following error:

[hadoop@ip-172-31-14-100 ~]$ spark-submit --class com.github.sparkbwa.SparkBWA --master yarn --deploy-mode cluster --driver-memory 1500m --executor-memory 10g --executor-cores 1 --verbose --num-executors 16 sparkbwa-1.0.jar -m -r -p --index /Data/HumanBase/hg38 -n 16 -w "-R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589" ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: spark.sql.warehouse.dir=*********(redacted)
Adding default property: spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
Adding default property: spark.history.fs.logDirectory=hdfs:///var/log/spark/apps
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.yarn.historyServer.address=ip-172-31-14-100.eu-west-2.compute.internal:18080
Adding default property: spark.stage.attempt.ignoreOnDecommissionFetchFailure=true
Adding default property: spark.driver.memory=11171M
Adding default property: spark.executor.instances=16
Adding default property: spark.default.parallelism=256
Adding default property: spark.resourceManager.cleanupExpiredHost=true
Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname -f)
Adding default property: spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
Adding default property: spark.master=yarn
Adding default property: spark.blacklist.decommissioning.timeout=1h
Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2
Adding default property: spark.executor.memory=10356M
Adding default property: spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
Adding default property: spark.executor.cores=8
Adding default property: spark.history.ui.port=18080
Adding default property: spark.blacklist.decommissioning.enabled=true
Adding default property: spark.decommissioning.timeout.threshold=20
Adding default property: spark.hadoop.yarn.timeline-service.enabled=false
Parsed arguments:
  master                  yarn
  deployMode              cluster
  executorMemory          10g
  executorCores           1
  totalExecutorCores      null
  propertiesFile          /usr/lib/spark/conf/spark-defaults.conf
  driverMemory            1500m
  driverCores             null
  driverExtraClassPath    /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
  driverExtraLibraryPath  /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
  driverExtraJavaOptions  -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
  supervise               false
  queue                   null
  numExecutors            16
  files                   null
  pyFiles                 null
  archives                null
  mainClass               com.github.sparkbwa.SparkBWA
  primaryResource         file:/home/hadoop/sparkbwa-1.0.jar
  name                    com.github.sparkbwa.SparkBWA
  childArgs               [-m -r -p --index /Data/HumanBase/hg38 -n 16 -w -R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589 ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf:
  (spark.blacklist.decommissioning.timeout,1h)
  (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
  (spark.default.parallelism,256)
  (spark.blacklist.decommissioning.enabled,true)
  (spark.hadoop.yarn.timeline-service.enabled,false)
  (spark.driver.memory,1500m)
  (spark.executor.memory,10356M)
  (spark.executor.instances,16)
  (spark.sql.warehouse.dir,*********(redacted))
  (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
  (spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080)
  (spark.eventLog.enabled,true)
  (spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
  (spark.history.ui.port,18080)
  (spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
  (spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
  (spark.resourceManager.cleanupExpiredHost,true)
  (spark.shuffle.service.enabled,true)
  (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
  (spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
  (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
  (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
  (spark.eventLog.dir,hdfs:///var/log/spark/apps)
  (spark.master,yarn)
  (spark.dynamicAllocation.enabled,true)
  (spark.executor.cores,8)
  (spark.decommissioning.timeout.threshold,20)
  (spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)

    
Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--jar
file:/home/hadoop/sparkbwa-1.0.jar
--class
com.github.sparkbwa.SparkBWA
--arg
-m
--arg
-r
--arg
-p
--arg
--index
--arg
/Data/HumanBase/hg38
--arg
-n
--arg
16
--arg
-w
--arg
-R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589
--arg
ERR000589_1.filt.fastq
--arg
ERR000589_2.filt.fastq
--arg
Output_ERR000589
System properties:
(spark.blacklist.decommissioning.timeout,1h)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.default.parallelism,256)
(spark.blacklist.decommissioning.enabled,true)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.driver.memory,1500m)
(spark.executor.memory,10g)
(spark.executor.instances,16)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.sql.warehouse.dir,*********(redacted))
(spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080)
(spark.eventLog.enabled,true)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.history.ui.port,18080)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(SPARK_SUBMIT,true)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.app.name,com.github.sparkbwa.SparkBWA)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.shuffle.service.enabled,true)
(spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.submit.deployMode,cluster)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.master,yarn)
(spark.dynamicAllocation.enabled,true)
(spark.decommissioning.timeout.threshold,20)
(spark.executor.cores,1)
(spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
Classpath elements:
file:/home/hadoop/sparkbwa-1.0.jar


18/01/20 15:53:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/01/20 15:53:13 INFO RMProxy: Connecting to ResourceManager at ip-172-31-14-100.eu-west-2.compute.internal/172.31.14.100:8032
18/01/20 15:53:13 INFO Client: Requesting a new application from cluster with 16 NodeManagers
18/01/20 15:53:13 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
18/01/20 15:53:13 INFO Client: Will allocate AM container, with 1884 MB memory including 384 MB overhead
18/01/20 15:53:13 INFO Client: Setting up container launch context for our AM
18/01/20 15:53:13 INFO Client: Setting up the launch environment for our AM container
18/01/20 15:53:13 INFO Client: Preparing resources for our AM container
18/01/20 15:53:14 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/01/20 15:53:16 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_libs__3181673287761365885.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_libs__3181673287761365885.zip
18/01/20 15:53:17 INFO Client: Uploading resource file:/home/hadoop/sparkbwa-1.0.jar -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/sparkbwa-1.0.jar
18/01/20 15:53:17 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_conf__4991143839440201874.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_conf__.zip
18/01/20 15:53:17 INFO SecurityManager: Changing view acls to: hadoop
18/01/20 15:53:17 INFO SecurityManager: Changing modify acls to: hadoop
18/01/20 15:53:17 INFO SecurityManager: Changing view acls groups to: 
18/01/20 15:53:17 INFO SecurityManager: Changing modify acls groups to: 
18/01/20 15:53:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/01/20 15:53:17 INFO Client: Submitting application application_1516463115359_0001 to ResourceManager
18/01/20 15:53:18 INFO YarnClientImpl: Submitted application application_1516463115359_0001
18/01/20 15:53:19 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:19 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1516463597765
	 final status: UNDEFINED
	 tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:20888/proxy/application_1516463115359_0001/
	 user: hadoop
18/01/20 15:53:20 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:21 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:22 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:23 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:24 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:25 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:26 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:27 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:28 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:29 INFO Client: Application report for application_1516463115359_0001 (state: FAILED)
18/01/20 15:53:29 INFO Client: 
	 client token: N/A
	 diagnostics: Application application_1516463115359_0001 failed 2 times due to AM Container for appattempt_1516463115359_0001_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1516463115359_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
	at org.apache.hadoop.util.Shell.run(Shell.java:479)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1516463597765
	 final status: FAILED
	 tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1516463115359_0001 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/01/20 15:53:29 INFO ShutdownHookManager: Shutdown hook called
18/01/20 15:53:29 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a
[hadoop@ip-172-31-14-100 ~]$ 
Broadcast message from root@ip-172-31-14-100
	(unknown) at 15:54 ...

The system is going down for power off NOW!
Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed by remote host.
Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed.

Any help will be appropriated

Thank you 🙏

@malimohub
Copy link

any word on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants