Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cloud build to build images instead #1923

Merged
merged 3 commits into from
Aug 23, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 16 additions & 25 deletions test/build-images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,7 @@

set -ex

IMAGE_BUILDER_ARG=""
if [ "$PROJECT" != "ml-pipeline-test" ]; then
COPIED_IMAGE_BUILDER_IMAGE=${GCR_IMAGE_BASE_DIR}/image-builder
echo "Copy image builder image to ${COPIED_IMAGE_BUILDER_IMAGE}"
yes | gcloud container images add-tag \
gcr.io/ml-pipeline-test/image-builder:v20181128-0.1.3-rc.1-109-ga5a14dc-e3b0c4 \
${COPIED_IMAGE_BUILDER_IMAGE}:latest
IMAGE_BUILDER_ARG="-p image-builder-image=${COPIED_IMAGE_BUILDER_IMAGE}"
fi
IMAGES_BUILDING=false

# Image caching can be turned off by setting $DISABLE_IMAGE_CACHING env flag.
# Note that GCR_IMAGE_BASE_DIR contains commit hash, so whenever there's a code
Expand All @@ -40,20 +32,19 @@ then
echo "docker images for api-server, frontend, scheduledworkflow and \
persistenceagent are already built in ${GCR_IMAGE_BASE_DIR}."
else
echo "submitting argo workflow to build docker images for commit ${PULL_PULL_SHA}..."
# Build Images
ARGO_WORKFLOW=`argo submit ${DIR}/build_image.yaml \
-p image-build-context-gcs-uri="$remote_code_archive_uri" \
${IMAGE_BUILDER_ARG} \
-p api-image="${GCR_IMAGE_BASE_DIR}/api-server" \
-p frontend-image="${GCR_IMAGE_BASE_DIR}/frontend" \
-p scheduledworkflow-image="${GCR_IMAGE_BASE_DIR}/scheduledworkflow" \
-p persistenceagent-image="${GCR_IMAGE_BASE_DIR}/persistenceagent" \
-n ${NAMESPACE} \
--serviceaccount test-runner \
-o name
`
echo "build docker images workflow submitted successfully"
source "${DIR}/check-argo-status.sh"
echo "build docker images workflow completed"
echo "submitting cloud build to build docker images for commit ${PULL_PULL_SHA}..."
IMAGES_BUILDING=true
CLOUD_BUILD_COMMON_ARGS=(. --async --format='value(id)' --substitutions=_GCR_BASE=${GCR_IMAGE_BASE_DIR})
# Split into two tasks because api_server builds slowly, use a separate task
# to make it faster.
BUILD_ID_API_SERVER=$(gcloud builds submit ${CLOUD_BUILD_COMMON_ARGS[@]} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to combine these steps to a single file?
cloud build seems support parallel execution
waitFor: ['-']

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also tried that.
There are some tradeoffs using waitFor: ['-']:

  • [major] slower because cloud build will run all the image builds in one machine, they compete for resources. Time taken is ~9min -> ~12min. Consider that we may need to build more images later. I think current approach is better. (or I can batch 3 image builds and only let api-server use a single job)
  • [minor] log will interleave for 4 builds, a little harder to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer

  • batch the 3 fast image builds
  • let api-server image build still be standalone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made corresponding changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. BTW with this change #1904 I'm not sure if building api-server images still consumes lots of resources. the change removed most of the dependencies such as tensorflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up. I will take a look and change if building api-server is faster.

gaoning777 marked this conversation as resolved.
Show resolved Hide resolved
--config ${DIR}/cloudbuild/api_server.yaml)
BUILD_ID_BATCH=$(gcloud builds submit ${CLOUD_BUILD_COMMON_ARGS[@]} \
--config ${DIR}/cloudbuild/batch_build.yaml)

BUILD_IDS=(
"${BUILD_ID_API_SERVER}"
"${BUILD_ID_BATCH}"
)
echo "Submitted the following cloud build jobs: ${BUILD_IDS[@]}"
fi
53 changes: 53 additions & 0 deletions test/check-build-image-status.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/bin/bash
#
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -ex

if [ "$IMAGES_BUILDING" == true ]; then
MAX_ATTEMPT=$(expr $TIMEOUT_SECONDS / 20)
PENDING_BUILD_IDS=("${BUILD_IDS[@]}") # copy pending build ids
for i in $(seq 1 ${MAX_ATTEMPT})
do
NEW_PENDING_BUILD_IDS=()
for id in "${PENDING_BUILD_IDS[@]}"
do
status=$(gcloud builds describe $id --format='value(status)') || status="FETCH_ERROR"
case "$status" in
"SUCCESS")
echo "Build with id ${id} has succeeded."
;;
"WORKING")
NEW_PENDING_BUILD_IDS+=( "$id" )
;;
"FETCH_ERROR")
echo "Fetching cloud build status failed, retrying..."
NEW_PENDING_BUILD_IDS+=( "$id" )
;;
*)
echo "Cloud build with build id ${id} failed with status ${status}"
exit 1
;;
esac
done
PENDING_BUILD_IDS=("${NEW_PENDING_BUILD_IDS[@]}")
if [ 0 == "${#PENDING_BUILD_IDS[@]}" ]; then
echo "All cloud builds succeeded."
break
fi
echo "Cloud build in progress, waiting for 20 seconds..."
sleep 20
done
fi
8 changes: 8 additions & 0 deletions test/cloudbuild/api_server.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
steps:
- name: 'gcr.io/cloud-builders/docker'
args: [ 'build', '-t', '$_GCR_BASE/api-server', '-f', 'backend/Dockerfile', '.' ]
timeout: 1800s # 30min
options:
machineType: N1_HIGHCPU_8 # This is cpu intensive, use a better machine.
images:
- '$_GCR_BASE/api-server'
20 changes: 20 additions & 0 deletions test/cloudbuild/batch_build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
steps:
- name: "gcr.io/cloud-builders/docker"
args:
["build", "-t", "$_GCR_BASE/persistenceagent", "-f", "backend/Dockerfile.persistenceagent", "."]
waitFor: ["-"]
- name: "gcr.io/cloud-builders/docker"
args:
["build", "-t", "$_GCR_BASE/scheduledworkflow", "-f", "backend/Dockerfile.scheduledworkflow", "."]
waitFor: ["-"]
- name: "gcr.io/cloud-builders/docker"
args:
["build", "-t", "$_GCR_BASE/frontend", "-f", "frontend/Dockerfile", "."]
waitFor: ["-"]
options:
machineType: N1_HIGHCPU_8 # use a fast machine to build because there a lot of work
images:
- "$_GCR_BASE/frontend"
- "$_GCR_BASE/scheduledworkflow"
- "$_GCR_BASE/persistenceagent"
timeout: 1800s # 30min
5 changes: 3 additions & 2 deletions test/install-argo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ if ! which argo; then
chmod +x ~/bin/argo
fi

kubectl create ns argo --dry-run -o yaml | kubectl apply -f -
kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/$ARGO_VERSION/manifests/install.yaml
# No need to install here, it comes with kfp lite deployment
# kubectl create ns argo --dry-run -o yaml | kubectl apply -f -
# kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/$ARGO_VERSION/manifests/install.yaml

# Some workflows are deployed to the non-default namespace where the GCP credential secret is stored
# In this case, the default service account in that namespace doesn't have enough permission
Expand Down
9 changes: 6 additions & 3 deletions test/presubmit-tests-with-pipeline-deployment.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,16 +75,19 @@ echo "presubmit test starts"
time source "${DIR}/test-prep.sh"
echo "test env prepared"

time source "${DIR}/build-images.sh"
echo "KFP images cloudbuild jobs submitted"

time source "${DIR}/deploy-cluster.sh"
echo "cluster deployed"

time source "${DIR}/check-build-image-status.sh"
echo "KFP images built"

# Install Argo CLI and test-runner service account
time source "${DIR}/install-argo.sh"
echo "argo installed"

time source "${DIR}/build-images.sh"
echo "KFP images built"

time source "${DIR}/deploy-pipeline-lite.sh"
echo "KFP lite deployed"

Expand Down