Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] spark engine status retrieval fails if master is set #6647

Open
3 of 4 tasks
moelhoussein opened this issue Aug 29, 2024 · 1 comment
Open
3 of 4 tasks

[Bug] spark engine status retrieval fails if master is set #6647

moelhoussein opened this issue Aug 29, 2024 · 1 comment
Labels
kind:bug This is a clearly a bug priority:major

Comments

@moelhoussein
Copy link

moelhoussein commented Aug 29, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

current set up

I am running kyuubi 1.9.1 on AKS, clients submit batch jobs using the kyuubi API. I have set the KUBECONFIG envvar to the path to a context file containing the contexts for my spark worker clusters (I have dedicated K8s for spark jobs).

Server config:

Kyuubi server config are as follows:

    kyuubi.kubernetes.isost10.spark.authenticate.oauthTokenFile=/etc/isost10/token
    kyuubi.kubernetes.isost9.spark.authenticate.oauthTokenFile=/etc/isost9/token

Clients sample request

User are able are able to post the following:

{
    "resource": "local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar",
    "name": "sample-job",
    "batchType": "SPARK",
    "className": "org.apache.spark.examples.SparkPi",
    "conf": {
        "spark.kubernetes.context": "isost9",
        "spark.master": "k8s://cluster:443",
        "spark.kubernetes.container.image": "acr.io/spark:3.4.1-5250722",
        "spark.kubernetes.namespace": "spark",
        "spark.kubernetes.serviceAccountName": "spark",
        "spark.kubernetes.driver.node.selector.label": "nodepool1",
        "spark.kubernetes.executor.node.selector.label": "nodepool1",
        "spark.executor.memory": "4G",
        "spark.executor.cores": "2",
        "spark.driver.memory": "4G",
        "spark.driver.cores": "2"
    }
}

The problem

The issue occurs when we set the master on the kyuubi side per context:

kyuubi.kubernetes.<context>.master.address=k8s://cluster:443
kyuubi.kubernetes.<context>.<namespace>.authenticate.oauthTokenFile

The batch job get scheduled in the remote cluster, but kyuubi is unable to retrieve the status. The logs will contain:

ERROR OkHttp http://k8s/... io.fabric8.kubernetes.client.informers.impl.cache.Reflector: listSyncAndWatch failed for v1/namespaces/spark/pods, will stop
java.util.concurrent.CompletionException: java.net.UnknownHostException: k8s
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
        at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$1.onFailure(OkHttpClientImpl.java:330)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:211)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.UnknownHostException: k8s

additionally clients are REQUIRED to set the spark master.

Is it supported to set spark-master per context, so the clients doesn't have to know the API server IP of the backend cluster?

Affects Version(s)

1.9.1

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@moelhoussein moelhoussein added kind:bug This is a clearly a bug priority:major labels Aug 29, 2024
Copy link

Hello @moelhoussein,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

1 participant