Skip to content

Commit

Permalink
[SPARK-22648][K8S] Spark on Kubernetes - Documentation
Browse files Browse the repository at this point in the history
What changes were proposed in this pull request?

This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by #19717 and #19468 which have merged already.

How was this patch tested?
The script has been in use for releases on our fork. Rest is documentation.

cc rxin mateiz (shepherd)
k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko
reviewers: vanzin felixcheung jiangxb1987 mridulm

TODO:
- [x] Add dockerfiles directory to built distribution. (#20007)
- [x] Change references to docker to instead say "container" (#19995)
- [x] Update configuration table.
- [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032)

Author: foxish <[email protected]>

Closes #19946 from foxish/update-k8s-docs.
  • Loading branch information
foxish authored and rxin committed Dec 22, 2017
1 parent 7beb375 commit 7ab165b
Show file tree
Hide file tree
Showing 10 changed files with 677 additions and 8 deletions.
1 change: 1 addition & 0 deletions docs/_layouts/global.html
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@
<li><a href="spark-standalone.html">Spark Standalone</a></li>
<li><a href="running-on-mesos.html">Mesos</a></li>
<li><a href="running-on-yarn.html">YARN</a></li>
<li><a href="running-on-kubernetes.html">Kubernetes</a></li>
</ul>
</li>

Expand Down
6 changes: 5 additions & 1 deletion docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the
to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured
with Maven profile settings and so on like the direct Maven build. Example:

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes

This will build Spark distribution along with Python pip and R packages. For more information on usage, run `./dev/make-distribution.sh --help`

Expand Down Expand Up @@ -90,6 +90,10 @@ like ZooKeeper and Hadoop itself.
## Building with Mesos support

./build/mvn -Pmesos -DskipTests clean package

## Building with Kubernetes support

./build/mvn -Pkubernetes -DskipTests clean package

## Building with Kafka 0.8 support

Expand Down
7 changes: 2 additions & 5 deletions docs/cluster-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,8 @@ The system currently supports three cluster managers:
* [Apache Mesos](running-on-mesos.html) -- a general cluster manager that can also run Hadoop MapReduce
and service applications.
* [Hadoop YARN](running-on-yarn.html) -- the resource manager in Hadoop 2.
* [Kubernetes (experimental)](https://github.com/apache-spark-on-k8s/spark) -- In addition to the above,
there is experimental support for Kubernetes. Kubernetes is an open-source platform
for providing container-centric infrastructure. Kubernetes support is being actively
developed in an [apache-spark-on-k8s](https://github.com/apache-spark-on-k8s/) Github organization.
For documentation, refer to that project's README.
* [Kubernetes](running-on-kubernetes.html) -- [Kubernetes](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/)
is an open-source platform that provides container-centric infrastructure.

A third-party project (not supported by the Spark project) exists to add support for
[Nomad](https://github.com/hashicorp/nomad-spark) as a cluster manager.
Expand Down
2 changes: 2 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2376,6 +2376,8 @@ can be found on the pages for each mode:

#### [Mesos](running-on-mesos.html#configuration)

#### [Kubernetes](running-on-kubernetes.html#configuration)

#### [Standalone Mode](spark-standalone.html#cluster-launch-scripts)

# Environment Variables
Expand Down
Binary file added docs/img/k8s-cluster-mode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ options for deployment:
* [Standalone Deploy Mode](spark-standalone.html): simplest way to deploy Spark on a private cluster
* [Apache Mesos](running-on-mesos.html)
* [Hadoop YARN](running-on-yarn.html)
* [Kubernetes](running-on-kubernetes.html)

# Where to Go from Here

Expand Down Expand Up @@ -112,7 +113,7 @@ options for deployment:
* [Mesos](running-on-mesos.html): deploy a private cluster using
[Apache Mesos](http://mesos.apache.org)
* [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
* [Kubernetes (experimental)](https://github.com/apache-spark-on-k8s/spark): deploy Spark on top of Kubernetes
* [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes

**Other Documents:**

Expand Down
578 changes: 578 additions & 0 deletions docs/running-on-kubernetes.md

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ Spark application's configuration (driver, executors, and the AM when running in

There are two deploy modes that can be used to launch Spark applications on YARN. In `cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Unlike [Spark standalone](spark-standalone.html) and [Mesos](running-on-mesos.html) modes, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn`.
Unlike other cluster managers supported by Spark in which the master's address is specified in the `--master`
parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration.
Thus, the `--master` parameter is `yarn`.

To launch a Spark application in `cluster` mode:

Expand Down
16 changes: 16 additions & 0 deletions docs/submitting-applications.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,16 @@ export HADOOP_CONF_DIR=XXX
http://path/to/examples.jar \
1000

# Run on a Kubernetes cluster in cluster deploy mode
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master k8s://xx.yy.zz.ww:443 \
--deploy-mode cluster \
--executor-memory 20G \
--num-executors 50 \
http://path/to/examples.jar \
1000

{% endhighlight %}

# Master URLs
Expand Down Expand Up @@ -155,6 +165,12 @@ The master URL passed to Spark can be in one of the following formats:
<code>client</code> or <code>cluster</code> mode depending on the value of <code>--deploy-mode</code>.
The cluster location will be found based on the <code>HADOOP_CONF_DIR</code> or <code>YARN_CONF_DIR</code> variable.
</td></tr>
<tr><td> <code>k8s://HOST:PORT</code> </td><td> Connect to a <a href="running-on-kubernetes.html">Kubernetes</a> cluster in
<code>cluster</code> mode. Client mode is currently unsupported and will be supported in future releases.
The <code>HOST</code> and <code>PORT</code> refer to the [Kubernetes API Server](https://kubernetes.io/docs/reference/generated/kube-apiserver/).
It connects using TLS by default. In order to force it to use an unsecured connection, you can use
<code>k8s://http://HOST:PORT</code>.
</td></tr>
</table>


Expand Down
68 changes: 68 additions & 0 deletions sbin/build-push-docker-images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This script builds and pushes docker images when run from a release of Spark
# with Kubernetes support.

declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
[spark-executor]=kubernetes/dockerfiles/executor/Dockerfile )

function build {
docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .
for image in "${!path[@]}"; do
docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
done
}


function push {
for image in "${!path[@]}"; do
docker push ${REPO}/$image:${TAG}
done
}

function usage {
echo "This script must be run from a runnable distribution of Apache Spark."
echo "Usage: ./sbin/build-push-docker-images.sh -r <repo> -t <tag> build"
echo " ./sbin/build-push-docker-images.sh -r <repo> -t <tag> push"
echo "for example: ./sbin/build-push-docker-images.sh -r docker.io/myrepo -t v2.3.0 push"
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
usage
exit 0
fi

while getopts r:t: option
do
case "${option}"
in
r) REPO=${OPTARG};;
t) TAG=${OPTARG};;
esac
done

if [ -z "$REPO" ] || [ -z "$TAG" ]; then
usage
else
case "${@: -1}" in
build) build;;
push) push;;
*) usage;;
esac
fi

0 comments on commit 7ab165b

Please sign in to comment.