Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparksubmit operator fails in ipv6+istio environment #1344

Closed
kd-nuance opened this issue Sep 7, 2021 · 5 comments · Fixed by #1825
Closed

sparksubmit operator fails in ipv6+istio environment #1344

kd-nuance opened this issue Sep 7, 2021 · 5 comments · Fixed by #1825

Comments

@kd-nuance
Copy link

Running Spark 3.1.1 on spark-operator with Istio 1.5.7 and Ipv6 environment. After submitting a job I am getting below exception:

Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Invalid proxy server configuration
at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:201)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:67)
at org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:100)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2610)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.MalformedURLException: For input string: "105::1:443"
at java.net.URL.(URL.java:645)
at java.net.URL.(URL.java:508)
at java.net.URL.(URL.java:457)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.getProxyUrl(HttpClientUtils.java:244)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:187)
... 13 more
Caused by: java.lang.NumberFormatException: For input string: "105::1:443"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:222)
at java.net.URL.(URL.java:640)
... 17 more

I believe the formatting for IPv6 is missing, it should be "https://[105::1]:443"
But, how do I pass this in the "kind: SparkApplication" yaml? Is there any specific property which will overwrite the proxy ip dynamically?

I tried the HTTPS_PROXY, HTTP_PROXY, HTTP2_DISABLE env variables for driver and executor. I have also disabled istio sidecar injection, as I understand jobs dont work well with istio.

@pm-nuance
Copy link

We need to add square brackets([]) while creating the master URL in submission.go. Update the method getMasterURL() in submission.go from /pkg/controller/sparkapplication/ as below

  **return fmt.Sprintf("k8s://https://[%s]:%s", kubernetesServiceHost, kubernetesServicePort), nil**

We need to also add square brackets([]) in entrypoint.sh where we pass the $SPARK_EXECUTOR_POD_IP.

Capture

@valorl
Copy link

valorl commented Jun 23, 2022

We're seeing the same issue on EKS with IPv6. Has there been any progress on this?

@valorl
Copy link

valorl commented Jul 1, 2022

I've tried the fix suggested by @pm-nuance and it does get rid of the original error. (Available in this image: ghcr.io/valorl/spark-on-k8s-operator:upstream-ipv6)

However, now it throws some (seemingly) TLS-related error. Any suggestions?

Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  for kind: [Pod]  with name: [null]  in namespace: [spark]  failed.
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:349)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:84)
        at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:139)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2611)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
        at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname fd00:10:96::1 not verified:
    certificate: sha256/LPsv1hc3g6MBZxaVQ8orX1AMm6FAYpBEpvqCtftRzAY=
    DN: CN=kube-apiserver
    subjectAltNames: [fd00:10:96:0:0:0:0:1, fc00:f853:ccd:e793:0:0:0:2, 0:0:0:0:0:0:0:1, kind-control-plane, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, localhost]
        at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:334)
        at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:284)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:169)
        at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
        at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
        at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
        at okhttp3.RealCall.execute(RealCall.java:93)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:252)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:879)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:341)
        ... 14 more

@LittleWat
Copy link
Contributor

Hello! We are in the same situation as @valorl

Regarding @pm-nuance 's comment,

We need to also add square brackets([]) in entrypoint.sh where we pass the $SPARK_EXECUTOR_POD_IP.

We didn’t set SPARK_APPLICATION_ID or SPARK_EXECUTER_POD_IP
Do we need to set SPARK_APPLICATION_ID or SPARK_EXECUTER_POD_IP...? If so, how...?

Thank you!

LittleWat added a commit to LittleWat/spark-on-k8s-operator that referenced this issue Aug 30, 2023
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
@LittleWat
Copy link
Contributor

created a PR to fix this (though, I disabled istio-injection)

and could confirm this works when spark 3.4 is used.

liyinan926 pushed a commit that referenced this issue Oct 26, 2023
Resolves #1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
peter-mcclonski pushed a commit to TechnologyBrewery/spark-on-k8s-operator that referenced this issue Apr 16, 2024
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```

Signed-off-by: Peter McClonski <[email protected]>
sigmarkarl pushed a commit to spotinst/spark-on-k8s-operator that referenced this issue Aug 7, 2024
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
jbhalodia-slack pushed a commit to jbhalodia-slack/spark-operator that referenced this issue Oct 4, 2024
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants