From 722efc5dad83e6f1312f372e20a65254a64c6d5b Mon Sep 17 00:00:00 2001 From: Lunderberg Date: Wed, 11 Aug 2021 18:31:21 -0500 Subject: [PATCH] [Docker] Refactor/clean-up of docker/bash.sh (#8670) * [Docker] Refactor/clean-up of docker/bash.sh - Added detailed help message, displayed using `-h` or `--help`. - Optional flags handled using `getopt`, can now occur in any order. - `--mount` flag may occur more than once. - Switched from short arguments to docker-run to long arguments (e.g. `--volume` instead of `-v`). Short arguments are good shortcuts for interactive work, but can be more difficult to read in longer scripts. - Mount the `.tvm_test_data` folder, to avoid re-downloading test data already available in the host environment. * [Docker] docker/bash.sh CI fix Dash-prefixed arguments as part of the command now require prefixing with -- to separate them from arguments intended for docker/bash.sh * [Docker] docker/bash.sh, consistent quoting * [Docker] Added --repo-mount-point for docker/bash.sh * [Docker] Updated command-line parsing of docker/bash.sh - Maintained previous behavior, any unrecognized flags after the docker/bash.sh are part of the command, no -- is needed. (e.g. docker/bash.sh ci_gpu make -j2) - Reverted changes to Jenskinsfile to add a --, no longer needed. * [Docker] Fixed multi-argument commands * [Docker] docker/bash.sh check permissions before mounting ~/.tvm_test_data * [Docker] Consistent workplace directory in docker/bash.sh for Jenkins Some locations in the CI perform build commands outside of the build steps (e.g. tests/scripts/task_ci_setup.sh#L38), and cmake doesn't like it if the build directory changes. These should probably be moved into the build steps of the CI, and be packed in tvm_multilib in the Jenkinsfile, but for the meantime maintaining a consistent /workspace directory on all CI nodes allows cmake to run. * [Docker] Updated bash.sh for MacOS compatibility MacOS has an older version of bash that handles arrays slightly differently. All instances of array expansion `"${ARRAY[@]}"` should instead be written as `${ARRAY[@]+"${ARRAY[@]}"}`. Otherwise, `set -u` will erroneously complain about an undefined variable. See https://stackoverflow.com/a/61551944 for details. Even though this is an older version of bash (observed in version 3.2.57), this is the last major version available under GPLv2 and is therefore the default version on MacOSX. At some point, the `docker/bash.sh` could be migrated to python for ease of maintenance/testing. --- Jenkinsfile | 0 docker/README.md | 9 +- docker/bash.sh | 409 +++++++++++++++++++++++++++++++++++------------ 3 files changed, 313 insertions(+), 105 deletions(-) mode change 100644 => 100755 Jenkinsfile diff --git a/Jenkinsfile b/Jenkinsfile old mode 100644 new mode 100755 diff --git a/docker/README.md b/docker/README.md index 5e470350749d..a05079d30881 100644 --- a/docker/README.md +++ b/docker/README.md @@ -33,7 +33,7 @@ interactive bash session with a given image_name. The script does the following things: -- Mount current directory to /workspace and set it as home +- Mount current directory to the same location in the docker container, and set it as home - Switch user to be the same user that calls the bash.sh - Use the host-side network @@ -102,9 +102,10 @@ The command ``./docker/build.sh image_name COMMANDS`` is almost equivelant to ``./docker/bash.sh image_name COMMANDS`` but in the case of ``bash.sh`` a build attempt is not done. -The build command will map the tvm root to /workspace/ inside the container -with the same user as the user invoking the docker command. -Here are some common use examples to perform CI tasks. +The build command will map the tvm root to the corresponding location +inside the container with the same user as the user invoking the +docker command. Here are some common use examples to perform CI +tasks. - lint the python codes diff --git a/docker/bash.sh b/docker/bash.sh index 80f4a9577be1..702cfa0f671b 100755 --- a/docker/bash.sh +++ b/docker/bash.sh @@ -18,9 +18,12 @@ # under the License. # -# Start a bash, mount /workspace to be current directory. +# Start a bash, mount REPO_MOUNT_POINT to be current directory. # -# Usage: bash.sh [-i] [--net=host] [--mount path] +# Usage: docker/bash.sh [-i|--interactive] [--net=host] +# [--mount MOUNT_DIR] [--repo-mount-point REPO_MOUNT_POINT] +# [--dry-run] +# [--] [COMMAND] # # Usage: docker/bash.sh # Starts an interactive session @@ -30,152 +33,356 @@ # With -i, execute interactively. # -set -e +set -euo pipefail -source "$(dirname $0)/dev_common.sh" || exit 2 +function show_usage() { + cat < [--] [COMMAND] + +-h, --help + + Display this help message. + +-i, --interactive + + Start the docker session in interactive mode. + +--net=host + + Expose servers run into the container to the host, passing the + "--net=host" argument through to docker. On MacOS, this is + instead passed as "-p 8888:8888" since the host networking driver + isn't supported. + +--mount MOUNT_DIR + + Expose MOUNT_DIR as an additional mount point inside the docker + container. The mount point inside the container is the same as + the folder location outside the container. This option can be + specified multiple times. + +--repo-mount-point REPO_MOUNT_POINT + + The directory inside the docker container at which the TVM + repository should be mounted, and is used as the workspace inside + the docker container. + + If unspecified, the mount location depends on the environment. If + running inside Jenkins, the mount location will be /workspace. + Otherwise, the mount location of the repository will be the same + as the external location of the repository, to maintain + compatibility with git-worktree. + +--dry-run + + Print the docker command to be run, but do not execute it. + +DOCKER_IMAGE_NAME + + The name of the docker container to be run. This can be an + explicit name of a docker image (e.g. "tlcpack/ci-gpu:v0.76") or + can be a shortcut as defined in the TVM Jenkinsfile + (e.g. "ci_gpu"). + +COMMAND + + The command to be run inside the docker container. If this is set + to "bash", both the --interactive and --net=host flags are set. + If no command is specified, defaults to "bash". If the command + contains dash-prefixed arguments, the command should be preceded + by -- to indicate arguments that are not intended for bash.sh. + +EOF +} -interactive=0 -if [ "$1" == "-i" ]; then - interactive=1 - shift + +################################# +### Start of argument parsing ### +################################# + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)" +REPO_DIR="$(dirname "${SCRIPT_DIR}")" + +DRY_RUN=false +INTERACTIVE=false +USE_NET_HOST=false +DOCKER_IMAGE_NAME= +COMMAND=bash +MOUNT_DIRS=( ) + +# TODO(Lunderberg): Remove this if statement and always set to +# "${REPO_DIR}". The consistent directory for Jenkins is currently +# necessary to allow cmake build commands to run in CI after the build +# steps. +if [[ -n "${JENKINS_HOME:-}" ]]; then + REPO_MOUNT_POINT=/workspace +else + REPO_MOUNT_POINT="${REPO_DIR}" fi -CI_DOCKER_EXTRA_PARAMS=( ) -if [[ "$1" == "--net=host" ]]; then - CI_DOCKER_EXTRA_PARAMS+=('--net=host') - shift 1 + +function parse_error() { + echo "$@" >&2 + show_usage >&2 + exit 1 +} + +# Handle joined flags, such as interpreting -ih as -i -h. Either rewrites +# the current argument if it is a joined argument, or shifts all arguments +# otherwise. Should be called as "eval $break_joined_flag" where joined +# flags are possible. Can't use a function definition, because it needs +# to overwrite the parent scope's behavior. +break_joined_flag='if (( ${#1} == 2 )); then shift; else set -- -"${1#-i}" "${@:2}"; fi' + + +while (( $# )); do + case "$1" in + -h|--help) + show_usage + exit 0 + ;; + + -i*|--interactive) + INTERACTIVE=true + eval $break_joined_flag + ;; + + --net=host) + USE_NET_HOST=true + shift + ;; + + --mount) + if [[ -n "$2" ]]; then + MOUNT_DIRS+=("$2") + shift 2 + else + parse_error 'ERROR: --mount requires a non-empty argument' + fi + ;; + + --mount=?*) + MOUNT_DIRS+=("${1#*=}") + shift + ;; + + --dry-run) + DRY_RUN=true + shift + ;; + + --repo-mount-point) + if [[ -n "$2" ]]; then + REPO_MOUNT_POINT="$2" + shift 2 + else + parse_error 'ERROR: --repo-mount-point requires a non-empty argument' + fi + ;; + + --repo-mount-point=?*) + REPO_MOUNT_POINT="${1#*=}" + shift + ;; + + --) + shift + COMMAND=( "$@" ) + break + ;; + + -*|--*) + echo "Error: Unknown flag: $1" >&2 + echo " If this flag is intended to be passed to the" >&2 + echo " docker command, please add -- before the docker" >&2 + echo " command (e.g. docker/bash.sh ci_gpu -- build -j2)" >&2 + show_usage >&2 + exit 1 + ;; + + *) + # First positional argument is the image name, all + # remaining below to the COMMAND. + if [[ -z "${DOCKER_IMAGE_NAME}" ]]; then + DOCKER_IMAGE_NAME=$1 + shift + else + COMMAND=( "$@" ) + break + fi + ;; + esac +done + +if [[ -z "${DOCKER_IMAGE_NAME}" ]]; then + echo "Error: Missing DOCKER_IMAGE_NAME" >&2 + show_usage >&2 fi -# Mount external directory to the docker -CI_DOCKER_MOUNT_CMD=( ) -if [ "$1" == "--mount" ]; then - shift 1 - CI_DOCKER_MOUNT_CMD=( -v "$1:$1" ) - shift 1 +if [[ ${COMMAND[@]+"${COMMAND[@]}"} = bash ]]; then + INTERACTIVE=true + USE_NET_HOST=true fi -if [ "$#" -lt 1 ]; then - echo "Usage: docker/bash.sh [-i] [--net=host] [COMMAND]" - exit -1 + + +############################### +### End of argument parsing ### +############################### + +source "$(dirname $0)/dev_common.sh" || exit 2 + +DOCKER_FLAGS=( ) +DOCKER_ENV=( ) +DOCKER_MOUNT=( ) +DOCKER_DEVICES=( ) + + +# If the user gave a shortcut defined in the Jenkinsfile, use it. +EXPANDED_SHORTCUT=$(lookup_image_spec "${DOCKER_IMAGE_NAME}") +if [ -n "${EXPANDED_SHORTCUT}" ]; then + DOCKER_IMAGE_NAME="${EXPANDED_SHORTCUT}" fi -DOCKER_IMAGE_NAME=$(lookup_image_spec "$1") -if [ -z "${DOCKER_IMAGE_NAME}" ]; then - DOCKER_IMAGE_NAME=("$1") +# Set up working directories + +DOCKER_FLAGS+=( --workdir "${REPO_MOUNT_POINT}" ) +DOCKER_MOUNT+=( --volume "${REPO_DIR}":"${REPO_MOUNT_POINT}" + --volume "${SCRIPT_DIR}":/docker + ) + +# Set up CI-specific environment variables +DOCKER_ENV+=( --env CI_BUILD_HOME="${REPO_MOUNT_POINT}" + --env CI_BUILD_USER="$(id -u -n)" + --env CI_BUILD_UID="$(id -u)" + --env CI_BUILD_GROUP="$(id -g -n)" + --env CI_BUILD_GID="$(id -g)" + --env CI_PYTEST_ADD_OPTIONS="${CI_PYTEST_ADD_OPTIONS:-}" + --env CI_IMAGE_NAME="${DOCKER_IMAGE_NAME}" + ) + + +# Pass tvm test data folder through to the docker container, to avoid +# repeated downloads. Check if we have permissions to write to the +# directory first, since the CI may not. +TEST_DATA_PATH="${TVM_DATA_ROOT_PATH:-${HOME}/.tvm_test_data}" +if [[ -d "${TEST_DATA_PATH}" && -w "${TEST_DATA_PATH}" ]]; then + DOCKER_MOUNT+=( --volume "${TEST_DATA_PATH}":"${REPO_MOUNT_POINT}"/.tvm_test_data ) fi -if [ "$#" -eq 1 ]; then - COMMAND="bash" - interactive=1 + +# Remove the container once it finishes running (--rm) and share the +# PID namespace (--pid=host). The process inside does not have pid 1 +# and SIGKILL is propagated to the process inside, allowing jenkins to +# kill it if needed. +DOCKER_FLAGS+=( --rm --pid=host) + +# Expose services running in container to the host. +if $USE_NET_HOST; then if [[ $(uname) == "Darwin" ]]; then # Docker's host networking driver isn't supported on macOS. # Use default bridge network and expose port for jupyter notebook. - CI_DOCKER_EXTRA_PARAMS+=( "${CI_DOCKER_EXTRA_PARAMS[@]}" "-p 8888:8888" ) + DOCKER_FLAGS+=( -p 8888:8888 ) else - CI_DOCKER_EXTRA_PARAMS+=( "${CI_DOCKER_EXTRA_PARAMS[@]}" "--net=host" ) + DOCKER_FLAGS+=(--net=host) fi -else - shift 1 - COMMAND=("$@") fi -if [ $interactive -eq 1 ]; then - CI_DOCKER_EXTRA_PARAMS=( "${CI_DOCKER_EXTRA_PARAMS[@]}" -it ) +# Set up interactive sessions +if ${INTERACTIVE}; then + DOCKER_FLAGS+=( --interactive --tty ) fi -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -WORKSPACE="$(pwd)" - -# Use nvidia-docker if the container is GPU. -if [[ ! -z $CUDA_VISIBLE_DEVICES ]]; then - CUDA_ENV="-e CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}" -else - CUDA_ENV="" -fi +# Expose external directories to the docker container +for MOUNT_DIR in ${MOUNT_DIRS[@]+"${MOUNT_DIRS[@]}"}; do + DOCKER_MOUNT+=( --volume "${MOUNT_DIR}:${MOUNT_DIR}" ) +done +# Use nvidia-docker for GPU container. If nvidia-docker is not +# available, fall back to using "--gpus all" flag, requires docker +# version 19.03 or higher. if [[ "${DOCKER_IMAGE_NAME}" == *"gpu"* || "${DOCKER_IMAGE_NAME}" == *"cuda"* ]]; then - if ! type "nvidia-docker" 1> /dev/null 2> /dev/null - then - DOCKER_BINARY="docker" - CUDA_ENV=" --gpus all "${CUDA_ENV} + if type nvidia-docker 1> /dev/null 2> /dev/null; then + DOCKER_BINARY=nvidia-docker else - DOCKER_BINARY="nvidia-docker" + DOCKER_BINARY=docker + DOCKER_FLAGS+=( --gpus all ) fi else - DOCKER_BINARY="docker" + DOCKER_BINARY=docker +fi + +# Pass any restrictions of allowed CUDA devices from the host to the +# docker container. +if [[ -n ${CUDA_VISIBLE_DEVICES:-} ]]; then + DOCKER_ENV+=( --env CUDA_VISIBLE_DEVICES="${CUDA_VISIBLE_DEVICES}" ) fi + + +# Set TVM import path inside the docker image if [[ "${DOCKER_IMAGE_NAME}" == *"ci"* ]]; then - CI_ADDON_ENV="-e PYTHONPATH=/workspace/python" -else - CI_ADDON_ENV="" + DOCKER_ENV+=( --env PYTHONPATH="${REPO_MOUNT_POINT}"/python ) fi -DOCKER_ENVS="" -DOCKER_DEVICES="" -WORKSPACE_VOLUMES="" -# If the Vitis-AI docker image is selected, expose the Xilinx FPGA devices and required volumes containing e.g. DSA's and overlays + + +# If the Vitis-AI docker image is selected, expose the Xilinx FPGA +# devices and required volumes containing e.g. DSA's and overlays if [[ "${DOCKER_IMAGE_NAME}" == *"demo_vitis_ai"* && -d "/dev/shm" && -d "/opt/xilinx/dsa" && -d "/opt/xilinx/overlaybins" ]]; then - WORKSPACE_VOLUMES="-v /dev/shm:/dev/shm -v /opt/xilinx/dsa:/opt/xilinx/dsa -v /opt/xilinx/overlaybins:/opt/xilinx/overlaybins" + DOCKER_MOUNT+=( --volume /dev/shm:/dev/shm + --volume /opt/xilinx/dsa:/opt/xilinx/dsa + --volume /opt/xilinx/overlaybins:/opt/xilinx/overlaybins + ) + XCLMGMT_DRIVER="$(find /dev -name xclmgmt\*)" - DOCKER_DEVICES="" - for i in ${XCLMGMT_DRIVER} ; - do - DOCKER_DEVICES+="--device=$i " + for DRIVER in "${XCLMGMT_DRIVER}"; do + DOCKER_DEVICES+=( --device="${DRIVER}" ) done RENDER_DRIVER="$(find /dev/dri -name renderD\*)" - for i in ${RENDER_DRIVER} ; - do - DOCKER_DEVICES+="--device=$i " + for DRIVER in "${RENDER_DRIVER}"; do + DOCKER_DEVICES+=( --device="${DRIVER}" ) done fi # Add ROCm devices and set ROCM_ENABLED=1 which is used in the with_the_same_user script # to add the user to the video group if [[ "${DOCKER_IMAGE_NAME}" == *"rocm"* && -d "/dev/dri" ]]; then - DOCKER_DEVICES+="--device=/dev/kfd --device=/dev/dri " - DOCKER_ENVS+="-e ROCM_ENABLED=1 " + DOCKER_DEVICES+=( --device=/dev/kfd --device=/dev/dri ) + DOCKER_ENV+=( --env ROCM_ENABLED=1 ) +fi + +# When running from a git worktree, also mount the original git dir. +if [ -f "${REPO_DIR}/.git" ]; then + git_dir=$(cd ${REPO_DIR} && git rev-parse --git-common-dir) + if [ "${git_dir}" != "${REPO_DIR}/.git" ]; then + DOCKER_MOUNT+=( --volume "${git_dir}:${git_dir}" ) + fi fi # Print arguments. -echo "WORKSPACE: ${WORKSPACE}" +echo "REPO_DIR: ${REPO_DIR}" echo "DOCKER CONTAINER NAME: ${DOCKER_IMAGE_NAME}" echo "" -echo "Running '${COMMAND[@]}' inside ${DOCKER_IMAGE_NAME}..." +echo Running \'${COMMAND[@]+"${COMMAND[@]}"}\' inside ${DOCKER_IMAGE_NAME}... -# When running from a git worktree, also mount the original git dir. -EXTRA_MOUNTS=( ) -if [ -f "${WORKSPACE}/.git" ]; then - git_dir=$(cd ${WORKSPACE} && git rev-parse --git-common-dir) - if [ "${git_dir}" != "${WORKSPACE}/.git" ]; then - EXTRA_MOUNTS=( "${EXTRA_MOUNTS[@]}" -v "${git_dir}:${git_dir}" ) - fi -fi +DOCKER_CMD=(${DOCKER_BINARY} run + ${DOCKER_FLAGS[@]+"${DOCKER_FLAGS[@]}"} + ${DOCKER_ENV[@]+"${DOCKER_ENV[@]}"} + ${DOCKER_MOUNT[@]+"${DOCKER_MOUNT[@]}"} + ${DOCKER_DEVICES[@]+"${DOCKER_DEVICES[@]}"} + "${DOCKER_IMAGE_NAME}" + bash --login /docker/with_the_same_user + ${COMMAND[@]+"${COMMAND[@]}"} + ) -# By default we cleanup - remove the container once it finish running (--rm) -# and share the PID namespace (--pid=host) so the process inside does not have -# pid 1 and SIGKILL is propagated to the process inside (jenkins can kill it). -${DOCKER_BINARY} run --rm --pid=host\ - ${DOCKER_DEVICES}\ - ${WORKSPACE_VOLUMES}\ - -v ${WORKSPACE}:/workspace \ - -v ${SCRIPT_DIR}:/docker \ - "${CI_DOCKER_MOUNT_CMD[@]}" \ - "${EXTRA_MOUNTS[@]}" \ - -w /workspace \ - -e "CI_BUILD_HOME=/workspace" \ - -e "CI_BUILD_USER=$(id -u -n)" \ - -e "CI_BUILD_UID=$(id -u)" \ - -e "CI_BUILD_GROUP=$(id -g -n)" \ - -e "CI_BUILD_GID=$(id -g)" \ - -e "CI_PYTEST_ADD_OPTIONS=$CI_PYTEST_ADD_OPTIONS" \ - -e "CI_IMAGE_NAME=${DOCKER_IMAGE_NAME}" \ - ${DOCKER_ENVS} \ - ${CI_ADDON_ENV} \ - ${CUDA_ENV} \ - "${CI_DOCKER_EXTRA_PARAMS[@]}" \ - ${DOCKER_IMAGE_NAME} \ - bash --login /docker/with_the_same_user \ - "${COMMAND[@]}" +if ${DRY_RUN}; then + echo ${DOCKER_CMD[@]+"${DOCKER_CMD[@]}"} +else + ${DOCKER_CMD[@]+"${DOCKER_CMD[@]}"} +fi