Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22960][k8s] Make build-push-docker-images.sh more dev-friendly. #20154

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ Kubernetes scheduler that has been added to Spark.
you may setup a test cluster on your local machine using
[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
* We recommend using the latest release of minikube with the DNS addon enabled.
* Be aware that the default minikube configuration is not enough for running Spark applications.
We recommend 3 CPUs and 4g of memory to be able to start a simple Spark application with a single
executor.
* You must have appropriate permissions to list, create, edit and delete
[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources
by running `kubectl auth can-i <list|create|edit|delete> pods`.
Expand Down Expand Up @@ -197,7 +200,7 @@ kubectl port-forward <driver-pod-name> 4040:4040

Then, the Spark driver UI can be accessed on `http://localhost:4040`.

### Debugging
### Debugging

There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
Expand All @@ -215,8 +218,8 @@ If the pod has encountered a runtime error, the status can be probed further usi
kubectl logs <spark-driver-pod>
```

Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
application, including all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
the Spark application.

## Kubernetes Features
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
# limitations under the License.
#

FROM spark-base
ARG base_image
FROM ${base_image}

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
# limitations under the License.
#

FROM spark-base
ARG base_image
FROM ${base_image}

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
# limitations under the License.
#

FROM spark-base
ARG base_image
FROM ${base_image}

# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@

FROM openjdk:8-alpine

ARG spark_jars
ARG img_path

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
Expand All @@ -34,11 +37,11 @@ RUN set -ex && \
ln -sv /bin/bash /bin/sh && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd

COPY jars /opt/spark/jars
COPY ${spark_jars} /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY conf /opt/spark/conf
COPY kubernetes/dockerfiles/spark-base/entrypoint.sh /opt/
COPY ${img_path}/spark-base/entrypoint.sh /opt/

ENV SPARK_HOME /opt/spark

Expand Down
120 changes: 100 additions & 20 deletions sbin/build-push-docker-images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,51 +19,131 @@
# This script builds and pushes docker images when run from a release of Spark
# with Kubernetes support.

declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
[spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \
[spark-init]=kubernetes/dockerfiles/init-container/Dockerfile )
function error {
echo "$@" 1>&2
exit 1
}

# Detect whether this is a git clone or a Spark distribution and adjust paths
# accordingly.
if [ -z "${SPARK_HOME}" ]; then
SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
. "${SPARK_HOME}/bin/load-spark-env.sh"

if [ -f "$SPARK_HOME/RELEASE" ]; then
IMG_PATH="kubernetes/dockerfiles"
SPARK_JARS="jars"
else
IMG_PATH="resource-managers/kubernetes/docker/src/main/dockerfiles"
SPARK_JARS="assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ ! -d "$IMG_PATH" ]; then
error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment? I presume now it should say runnable distribution, or from source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source directory is sort of a "runnable distribution" if Spark is built. I'd rather keep the message simple since it's mostly targeted at end users (not devs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

fi

declare -A path=( [spark-driver]="$IMG_PATH/driver/Dockerfile" \
[spark-executor]="$IMG_PATH/executor/Dockerfile" \
[spark-init]="$IMG_PATH/init-container/Dockerfile" )

function image_ref {
local image="$1"
local add_repo="${2:-1}"
if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
image="$REPO/$image"
fi
if [ -n "$TAG" ]; then
image="$image:$TAG"
fi
echo "$image"
}

function build {
docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .
local base_image="$(image_ref spark-base 0)"
docker build --build-arg "spark_jars=$SPARK_JARS" \
--build-arg "img_path=$IMG_PATH" \
-t "$base_image" \
-f "$IMG_PATH/spark-base/Dockerfile" .
for image in "${!path[@]}"; do
docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
docker build --build-arg "base_image=$base_image" -t "$(image_ref $image)" -f ${path[$image]} .
done
}


function push {
for image in "${!path[@]}"; do
docker push ${REPO}/$image:${TAG}
docker push "$(image_ref $image)"
done
}

function usage {
echo "This script must be run from a runnable distribution of Apache Spark."
echo "Usage: ./sbin/build-push-docker-images.sh -r <repo> -t <tag> build"
echo " ./sbin/build-push-docker-images.sh -r <repo> -t <tag> push"
echo "for example: ./sbin/build-push-docker-images.sh -r docker.io/myrepo -t v2.3.0 push"
cat <<EOF
Usage: $0 [options] [command]
Builds or pushes the built-in Spark Docker images.

Commands:
build Build images.
push Push images to a registry. Requires a repository address to be provided, both
when building and when pushing the images.

Options:
-r repo Repository address.
-t tag Tag to apply to built images, or to identify images to be pushed.
-m Use minikube's Docker daemon.

Using minikube when building images will do so directly into minikube's Docker daemon.
There is no need to push the images into minikube in that case, they'll be automatically
available when running applications inside the minikube cluster.

Check the following documentation for more information on using the minikube Docker daemon:

https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon

Examples:
- Build images in minikube with tag "testing"
$0 -m -t testing build

- Build and push images with tag "v2.3.0" to docker.io/myrepo
$0 -r docker.io/myrepo -t v2.3.0 build
$0 -r docker.io/myrepo -t v2.3.0 push
EOF
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
usage
exit 0
fi

while getopts r:t: option
REPO=
TAG=
while getopts mr:t: option
do
case "${option}"
in
r) REPO=${OPTARG};;
t) TAG=${OPTARG};;
m)
if ! which minikube 1>/dev/null; then
error "Cannot find minikube."
fi
eval $(minikube docker-env)
Copy link
Contributor

@foxish foxish Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think building docker images right into the minikube VM's docker daemon is uncommon and not something we'd want to recommend. Users on minikube should also use a proper registry - (for example, there is a registry addon) that could be used.

While this might be good to document as a local developer workflow, I'm apprehensive about adding a new flag just for this particular mode. Also one could invoke eval $(minikube docker-env) and then use the build command to get the same effect.

Copy link
Contributor Author

@vanzin vanzin Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started calling that command separately, but it's really annoying. This option is useful not just for Spark devs, but for people who want to try their own apps on minikube before trying them on a larger cluster, for example.

building docker images right into the minikube VM's docker daemon is uncommon

What's the alternative? Deploying your own registry? I struggled with that for hours and it's nearly impossible to get docker to talk to an insecure registry (or one with a self signed cert like minikube's). This approach just worked (tm).

Copy link
Contributor

@foxish foxish Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point - this is considerably easier. I spoke with a minikube maintainer and it seems this is not as uncommon as I initially thought. So, this change looks good, but I'd prefer that we add some more explanation to the usage section, that this will build an image within the minikube environment - and also linking to https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon.

cc/ @aaron-prindle

;;
esac
done

if [ -z "$REPO" ] || [ -z "$TAG" ]; then
case "${@: -1}" in
build)
build
;;
push)
if [ -z "$REPO" ]; then
usage
exit 1
fi
push
;;
*)
usage
else
case "${@: -1}" in
build) build;;
push) push;;
*) usage;;
esac
fi
exit 1
;;
esac