-
Notifications
You must be signed in to change notification settings - Fork 118
Trouble running SparkPi example on Kubernetes 1.8.x #617
Comments
As suspected it was due to the driver container not exposing the ports on creation. While the service selected the driver pod as expected, the driver did not have any ports open for the service to proxy requests into. Filed PR: #618 |
We've seen a similar issue under certain network overlay configurations. In our case the problem seemed to be that because only a headless service gets created with no IP address the cluster hostname for the service would not resolve but just the service name itself did. This only happened under Romana as it would work fine under other network overlays e.g. Flannel so this issue may be caused by specific network overlays |
Here's our stack trace for comparison:
|
I'm seeing the exact same thing on Kubernetes 1.9 with Flannel, using spark v2.2.0-kubernetes-0.5.0 Command Run:
|
My understanding is that opening pod ports is only a formality and that these ports are not actually necessary to set. But this might perhaps vary from cluster to cluster, and might be specific to later versions of Kubernetes. @foxish for thoughts. |
I believe it can vary from cluster to cluster and believe explicitly opening ports is the right way to move forward. Some deployments can have Pod ports open by default which is likely why it has been working for many folks, but for those who have a network policy of some sort set where ports must be explicitly opened, the project as is does not work in those clusters. |
I've noticed a similar error with setting driver label
Which gets added to the service and then can't resolve.
|
Trying to run the SparkPi example per the instructions on the website but hitting an error with the executor. When I run the
spark-submit
command I see Kubernetes creates the driver and executor pods as expected, but the executor pod then dies with error (in the logs)In the driver pod logs I see:
repeated several times (modulo timestamp), I assume because the executor went down so there are no workers to accept resources.
Going back to the executor pod error I see there is a
org-apache-spark-examples-sparkpi-1519271450264-driver-svc
service with expected label selectors and ports:Querying for the pods with corresponding labels I see the driver started that was mentioned earlier. Looking into the driver pods description I see however no ports are exposed:
As far as I can tell this is the corresponding code that creates the driver pod which also doesn't seem to expose ports, so my theory is the executor is trying to call the service and the service is trying to route the request but can't because the pod it is selecting has no ports exposed. However I am also doubtful of this theory since this should definitely be something that would be caught in integration tests right?
Any help would be appreciated.
Some more info below:
Command run
The command is run by SSH-ing into the pod specified below with
k exec -it <pod name> /bin/bash
- I did this since this seemed to be the easiest way to get thespark-submit
and examples JAR .Logs
Here are more logs from the driver pod before it starts repeating the last log line:
Here are logs from the executor pod:
The text was updated successfully, but these errors were encountered: