Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run spark-submit failed on Kubernetes cluster #2168

Closed
henryon opened this issue Apr 27, 2020 · 15 comments
Closed

run spark-submit failed on Kubernetes cluster #2168

henryon opened this issue Apr 27, 2020 · 15 comments

Comments

@henryon
Copy link

henryon commented Apr 27, 2020

Environment
MacBook Pro
jdk1.8
spark 2.4.5
minikube status
m01
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

##############error logs###############

sudo ./spark-submit --verbose \
--master k8s://https://192.168.99.103:8443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=spark:latest \
--conf spark.kubernetes.container.image.pullPolicy=Never \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark-2.4/dist/examples/jars/spark-examples_2.11-2.4.5.jar 10000000
Using properties file: null
20/04/27 20:26:30 WARN Utils: Your hostname, wenhongqiangs-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 172.16.15.219 instead (on interface en0)
20/04/27 20:26:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Parsed arguments:
  master                  k8s://https://192.168.99.103:8443
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            1
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         local:///opt/spark-2.4/dist/examples/jars/spark-examples_2.11-2.4.5.jar
  name                    spark-pi
  childArgs               [10000000]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:
  (spark.executor.instances,1)
  (spark.kubernetes.authenticate.driver.serviceAccountName,spark)
  (spark.kubernetes.container.image,spark:latest)
  (spark.kubernetes.container.image.pullPolicy,Never)

    
Main class:
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication
Arguments:
--primary-java-resource
local:///opt/spark-2.4/dist/examples/jars/spark-examples_2.11-2.4.5.jar
--main-class
org.apache.spark.examples.SparkPi
--arg
10000000
Spark config:
(spark.jars,local:///opt/spark-2.4/dist/examples/jars/spark-examples_2.11-2.4.5.jar)
(spark.app.name,spark-pi)
(spark.executor.instances,1)
(spark.kubernetes.container.image.pullPolicy,Never)
(spark.kubernetes.container.image,spark:latest)
(spark.submit.deployMode,cluster)
(spark.master,k8s://https://192.168.99.103:8443)
(spark.kubernetes.authenticate.driver.serviceAccountName,spark)
Classpath elements:



log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
	at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
	at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
	at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
	at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
	at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
	at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
	at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
	at okio.Okio$1.write(Okio.java:79)
	at okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
	at okio.RealBufferedSink.flush(RealBufferedSink.java:224)
	at okhttp3.internal.http2.Http2Writer.windowUpdate(Http2Writer.java:262)
	at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:518)
	at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505)
	at okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298)
	at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:287)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:168)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:112)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
	at okhttp3.RealCall.execute(RealCall.java:92)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:819)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:334)
	... 17 more
20/04/27 20:26:31 INFO ShutdownHookManager: Shutdown hook called
20/04/27 20:26:31 INFO ShutdownHookManager: Deleting directory /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/spark-76c1b047-4e22-417d-a787-cf298a8c7303
@henryon
Copy link
Author

henryon commented Apr 27, 2020

I met above errors after run spark-submit. anyone can help me on it? Thanks a lot.

@rohanKanojia
Copy link
Member

@henryon : Could you please share code which is causing this?

Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed.

Looks like you're not specifying name for the Pod

@henryon
Copy link
Author

henryon commented Apr 27, 2020

@rohanKanojia I didn't found where I can specify them. could you help provide guidance for that.

I ran below commands after minikube is up and spark2.4.5 docker image is build.

sudo ./spark-submit --verbose 
--master k8s://https://192.168.99.103:8443 
--deploy-mode cluster 
--name spark-pi 
--class org.apache.spark.examples.SparkPi 
--conf spark.executor.instances=1 
--conf spark.kubernetes.container.image=spark:latest 
--conf spark.kubernetes.container.image.pullPolicy=Never 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark 
local:///opt/spark-2.4/dist/examples/jars/spark-examples_2.11-2.4.5.jar 10000000

@tunguyen9889
Copy link

@henryon this one:

--conf spark.kubernetes.driver.pod.name=spark-pi-driver

Can refer my blog for more details: https://medium.com/@tunguyen9889/how-to-run-spark-job-on-eks-cluster-54f73f90d0bc

@wangyang0918
Copy link
Contributor

We find a same issue when running Flink on K8s. Currently, i have a very simple conclusion here. Because of the compatibility issue of fabric8 kubernetes-client, the fabric8 kubernetes-client has the following known limitation.

  • For jdk 8u252, it could only work on kubernetes v1.16 and lower versions.
  • For other jdk versions(e.g. 8u242, jdk11), i am not aware of the same issues. It always works well.

@FPiriz
Copy link

FPiriz commented May 11, 2020

In kubernetes 1.18 and spark 2.4.5,

  • For other jdk versions(e.g. 8u242, jdk11), i am not aware of the same issues. It always works

this worked for me. Thank you.

EDIT: I needed jdk8u242 in both submitter machine and spark container image.

@TheR3d1
Copy link

TheR3d1 commented May 20, 2020

I am facing same error.

  • kubectl Client Version: 1.18.2
  • kubectl Server Version: 1.18.2
  • kubeadm version : 1.18.2
  • spark : 2.4.7
  • java : 8u241 (oracle release)

Is this error related to spark, k8s, java, versions ?

EDIT: Downgrading kubernetes to 1.15 worked for me.

@martijndwars
Copy link

I had the exact same issue when using Java 8u251. Downgrading to Java 8u241 solved this.

@rohanKanojia
Copy link
Member

@MartinPodval : Could you please try on Kubernetes Client version v4.10.2?

@rohanKanojia
Copy link
Member

Duplicate of #2145

@rohanKanojia rohanKanojia marked this as a duplicate of #2145 Jun 3, 2020
@rohanKanojia
Copy link
Member

I think this issue can be closed. Please feel free to reopen this in case you are able to reproduce this on latest versions.

@manusa manusa closed this as completed Jun 23, 2020
@pingsutw
Copy link

pingsutw commented Jun 6, 2021

I got the same error. The steps to reproduce

git clone [email protected]:apache/spark.git
cd spark
# build docker images
./bin/docker-image-tool.sh -r pingsutw -t v3.1.2 build
# I've try `1.15.11`, and got the same error
minikube start --vm-driver=none --cpus 64 --memory 12288 --disk-size=128g --kubernetes-version v1.14.2
# Submit a Spark application
./bin/spark-submit \
  --master k8s://https://192.168.103.20.8443 \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.executor.instances=3 \
  --conf spark.kubernetes.container.image=pingsutw/spark:v3.1.2 \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.2.0-SNAPSHOT.jar
  • Spark: Clone from GitHub
  • Kubernetes: 1.14.2, 1.15.11
  • Ubuntu: 18.04

@manusa manusa reopened this Jun 7, 2021
@pingsutw
Copy link

@it seems like that spark can't connect to the master URL. I think we should check the client and k8s connection when we initializing the kubernetesClient. If we can successfully create the kubernetesClient, it means we could connect to the API server.

@stale
Copy link

stale bot commented Sep 8, 2021

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@stale stale bot added the status/stale label Sep 8, 2021
@stale stale bot closed this as completed Sep 15, 2021
@gasabr
Copy link

gasabr commented Mar 30, 2022

there is a typo in your master url, i think --master k8s://https://192.168.103.20.8443 should be --master k8s://https://192.168.103.20:8443

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants