Skip to content

Commit

Permalink
[SPARK-24547][K8S] Allow for building spark on k8s docker images with…
Browse files Browse the repository at this point in the history
…out cache and don't forget to push spark-py container.

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-24547

TL;DR from JIRA issue:

- First time I generated images for 2.4.0 Docker was using it's cache, so actually when running jobs, old jars where still in the Docker image. This produces errors like this in the executors:

`java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 6155820641931972169, local class serialVersionUID = -3720498261147521051`

- The second problem was that the spark container is pushed, but the spark-py container wasn't yet. This was just forgotten in the initial PR.

- A third problem I also ran into because I had an older docker was apache#21551 so I have not included a fix for that in this ticket.

## How was this patch tested?

I've tested it on my own Spark on k8s deployment.

Author: Ray Burgemeestre <[email protected]>

Closes apache#21555 from rayburgemeestre/SPARK-24547.
  • Loading branch information
Ray Burgemeestre authored and Anirudh Ramanathan committed Jun 21, 2018
1 parent 3f4bda7 commit 15747cf
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions bin/docker-image-tool.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,17 +70,18 @@ function build {
local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"}

docker build "${BUILD_ARGS[@]}" \
docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
-t $(image_ref spark) \
-f "$BASEDOCKERFILE" .

docker build "${BINDING_BUILD_ARGS[@]}" \
docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
-t $(image_ref spark-py) \
-f "$PYDOCKERFILE" .
}

function push {
docker push "$(image_ref spark)"
docker push "$(image_ref spark-py)"
}

function usage {
Expand All @@ -99,6 +100,7 @@ Options:
-r repo Repository address.
-t tag Tag to apply to the built image, or to identify the image to be pushed.
-m Use minikube's Docker daemon.
-n Build docker image with --no-cache
Using minikube when building images will do so directly into minikube's Docker daemon.
There is no need to push the images into minikube in that case, they'll be automatically
Expand Down Expand Up @@ -127,14 +129,16 @@ REPO=
TAG=
BASEDOCKERFILE=
PYDOCKERFILE=
while getopts f:mr:t: option
NOCACHEARG=
while getopts f:mr:t:n option
do
case "${option}"
in
f) BASEDOCKERFILE=${OPTARG};;
p) PYDOCKERFILE=${OPTARG};;
r) REPO=${OPTARG};;
t) TAG=${OPTARG};;
n) NOCACHEARG="--no-cache";;
m)
if ! which minikube 1>/dev/null; then
error "Cannot find minikube."
Expand Down

0 comments on commit 15747cf

Please sign in to comment.