diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 0048bd90b48ae..e491329136a3c 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -69,17 +69,17 @@ building using the supplied script, or manually. To launch Spark Pi in cluster mode, -{% highlight bash %} +```bash $ bin/spark-submit \ --master k8s://https://: \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=5 \ - --conf spark.kubernetes.driver.docker.image= \ - --conf spark.kubernetes.executor.docker.image= \ + --conf spark.kubernetes.driver.container.image= \ + --conf spark.kubernetes.executor.container.image= \ local:///path/to/examples.jar -{% endhighlight %} +``` The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting `spark.master` in the application's configuration, must be a URL with the format `k8s://`. Prefixing the @@ -120,6 +120,54 @@ by their appropriate remote URIs. Also, application dependencies can be pre-moun Those dependencies can be added to the classpath by referencing them with `local://` URIs and/or setting the `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. +### Using Remote Dependencies +When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods +need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading +the dependencies so the driver and executor containers can use them locally. This requires users to specify the container +image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users +simply add the following option to the `spark-submit` command to specify the init-container image: + +``` +--conf spark.kubernetes.initContainer.image= +``` + +The init-container handles remote dependencies specified in `spark.jars` (or the `--jars` option of `spark-submit`) and +`spark.files` (or the `--files` option of `spark-submit`). It also handles remotely hosted main application resources, e.g., +the main application jar. The following shows an example of using remote dependencies with the `spark-submit` command: + +```bash +$ bin/spark-submit \ + --master k8s://https://: \ + --deploy-mode cluster \ + --name spark-pi \ + --class org.apache.spark.examples.SparkPi \ + --jars https://path/to/dependency1.jar,https://path/to/dependency2.jar + --files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2 + --conf spark.executor.instances=5 \ + --conf spark.kubernetes.driver.container.image= \ + --conf spark.kubernetes.executor.container.image= \ + --conf spark.kubernetes.initContainer.image= + https://path/to/examples.jar +``` + +## Secret Management +Kubernetes [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) can be used to provide credentials for a +Spark application to access secured services. To mount a user-specified secret into the driver container, users can use +the configuration property of the form `spark.kubernetes.driver.secrets.[SecretName]=`. Similarly, the +configuration property of the form `spark.kubernetes.executor.secrets.[SecretName]=` can be used to mount a +user-specified secret into the executor containers. Note that it is assumed that the secret to be mounted is in the same +namespace as that of the driver and executor pods. For example, to mount a secret named `spark-secret` onto the path +`/etc/secrets` in both the driver and executor containers, add the following options to the `spark-submit` command: + +``` +--conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets +--conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets +``` + +Note that if an init-container is used, any secret mounted into the driver container will also be mounted into the +init-container of the driver. Similarly, any secret mounted into an executor container will also be mounted into the +init-container of the executor. + ## Introspection and Debugging These are the different ways in which you can investigate a running/completed Spark application, monitor progress, and @@ -275,7 +323,7 @@ specific to Spark on Kubernetes. (none) Container image to use for the driver. - This is usually of the form `example.com/repo/spark-driver:v1.0.0`. + This is usually of the form example.com/repo/spark-driver:v1.0.0. This configuration is required and must be provided by the user. @@ -284,7 +332,7 @@ specific to Spark on Kubernetes. (none) Container image to use for the executors. - This is usually of the form `example.com/repo/spark-executor:v1.0.0`. + This is usually of the form example.com/repo/spark-executor:v1.0.0. This configuration is required and must be provided by the user. @@ -528,51 +576,91 @@ specific to Spark on Kubernetes. - spark.kubernetes.driver.limit.cores - (none) - - Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod. - - - - spark.kubernetes.executor.limit.cores - (none) - - Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application. - - - - spark.kubernetes.node.selector.[labelKey] - (none) - - Adds to the node selector of the driver pod and executor pods, with key labelKey and the value as the - configuration's value. For example, setting spark.kubernetes.node.selector.identifier to myIdentifier - will result in the driver pod and executors having a node selector with key identifier and value - myIdentifier. Multiple node selector keys can be added by setting multiple configurations with this prefix. - - - - spark.kubernetes.driverEnv.[EnvironmentVariableName] - (none) - - Add the environment variable specified by EnvironmentVariableName to - the Driver process. The user can specify multiple of these to set multiple environment variables. - - - - spark.kubernetes.mountDependencies.jarsDownloadDir - /var/spark-data/spark-jars - - Location to download jars to in the driver and executors. - This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. - - - - spark.kubernetes.mountDependencies.filesDownloadDir - /var/spark-data/spark-files - - Location to download jars to in the driver and executors. - This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. - - + spark.kubernetes.driver.limit.cores + (none) + + Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod. + + + + spark.kubernetes.executor.limit.cores + (none) + + Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application. + + + + spark.kubernetes.node.selector.[labelKey] + (none) + + Adds to the node selector of the driver pod and executor pods, with key labelKey and the value as the + configuration's value. For example, setting spark.kubernetes.node.selector.identifier to myIdentifier + will result in the driver pod and executors having a node selector with key identifier and value + myIdentifier. Multiple node selector keys can be added by setting multiple configurations with this prefix. + + + + spark.kubernetes.driverEnv.[EnvironmentVariableName] + (none) + + Add the environment variable specified by EnvironmentVariableName to + the Driver process. The user can specify multiple of these to set multiple environment variables. + + + + spark.kubernetes.mountDependencies.jarsDownloadDir + /var/spark-data/spark-jars + + Location to download jars to in the driver and executors. + This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. + + + + spark.kubernetes.mountDependencies.filesDownloadDir + /var/spark-data/spark-files + + Location to download jars to in the driver and executors. + This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. + + + + spark.kubernetes.mountDependencies.timeout + 300s + + Timeout in seconds before aborting the attempt to download and unpack dependencies from remote locations into + the driver and executor pods. + + + + spark.kubernetes.mountDependencies.maxSimultaneousDownloads + 5 + + Maximum number of remote dependencies to download simultaneously in a driver or executor pod. + + + + spark.kubernetes.initContainer.image + (none) + + Container image for the init-container of the driver and executors for downloading dependencies. This is usually of the form example.com/repo/spark-init:v1.0.0. This configuration is optional and must be provided by the user if any non-container local dependency is used and must be downloaded remotely. + + + + spark.kubernetes.driver.secrets.[SecretName] + (none) + + Add the Kubernetes Secret named SecretName to the driver pod on the path specified in the value. For example, + spark.kubernetes.driver.secrets.spark-secret=/etc/secrets. Note that if an init-container is used, + the secret will also be added to the init-container in the driver pod. + + + + spark.kubernetes.executor.secrets.[SecretName] + (none) + + Add the Kubernetes Secret named SecretName to the executor pod on the path specified in the value. For example, + spark.kubernetes.executor.secrets.spark-secret=/etc/secrets. Note that if an init-container is used, + the secret will also be added to the init-container in the executor pod. + + \ No newline at end of file diff --git a/sbin/build-push-docker-images.sh b/sbin/build-push-docker-images.sh index 4546e98dc2074..b3137598692d8 100755 --- a/sbin/build-push-docker-images.sh +++ b/sbin/build-push-docker-images.sh @@ -20,7 +20,8 @@ # with Kubernetes support. declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \ - [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile ) + [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \ + [spark-init]=kubernetes/dockerfiles/init-container/Dockerfile ) function build { docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .