Skip to content

Commit

Permalink
[#614] feat(docker): Gravitino Trino Docker image (#702)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

- Build a minimal Trino Docker image.
- Provides a choice of the Docker image type when the build Docker image in the GitHub Action.

![image](https://github.com/datastrato/gravitino/assets/3677382/d8acf95d-f61a-4274-8fef-ebcb2e8399d5)

### Why are the changes needed?

Gravitino uses Trino for integration tests.
Trino has more than 40 plugins, Which makes the Trino server start up
very slowly, usually taking 2~3 minutes.
We need to rebuild the Gravitino Trino Docker image and remove unused
plugins.

Fix: #614 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

I build Trino Docker image success.
https://hub.docker.com/repository/docker/datastrato/gravitino-ci-trino/general
  • Loading branch information
xunliu authored Nov 8, 2023
1 parent 06e3223 commit 2cc5545
Show file tree
Hide file tree
Showing 8 changed files with 230 additions and 130 deletions.
25 changes: 20 additions & 5 deletions .github/workflows/docker-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@ name: Publish Docker Image
on:
workflow_dispatch:
inputs:
image:
type: choice
description: 'Choose the image to build'
required: true
default: 'gravitino-ci-hive'
options:
- 'gravitino-ci-hive'
- 'gravitino-ci-trino'
tag:
description: 'Docker tag to apply to this image'
required: true
Expand All @@ -11,8 +19,6 @@ on:
description: 'Publish Docker token'
required: true
type: string
env:
HIVE_IMAGE_NAME: datastrato/gravitino-ci-hive

jobs:
publish-docker-image:
Expand All @@ -22,6 +28,16 @@ jobs:
input_token: ${{ github.event.inputs.token }}
secrets_token: ${{ secrets.PUBLISH_DOCKER_TOKEN }}
steps:
- name: Set environment variables
run: |
if [ "${{ github.event.inputs.image }}" == "gravitino-ci-hive" ]; then
echo "image_type=hive" >> $GITHUB_ENV
echo "image_name=datastrato/gravitino-ci-hive" >> $GITHUB_ENV
elif [ "${{ github.event.inputs.image }}" == "gravitino-ci-trino" ]; then
echo "image_type=trino" >> $GITHUB_ENV
echo "image_name=datastrato/gravitino-ci-trino" >> $GITHUB_ENV
fi
- uses: actions/checkout@v3

- name: Check publish Docker token
Expand All @@ -45,8 +61,7 @@ jobs:

- name: Build and Push the main branch Docker image
if: ${{ github.ref_name == 'main' }}
run: ./dev/docker/hive/build-docker.sh --platform all --image ${HIVE_IMAGE_NAME} --tag ${{ github.event.inputs.tag }} --latest

run: ./dev/docker/build-docker.sh --platform all --type ${image_type} --image ${image_name} --tag ${{ github.event.inputs.tag }} --latest
- name: Build and Push the other branch Docker image
if: ${{ github.ref_name != 'main' }}
run: ./dev/docker/hive/build-docker.sh --platform all --image ${HIVE_IMAGE_NAME} --tag ${{ github.event.inputs.tag }}
run: ./dev/docker/build-docker.sh --platform all --type ${image_type} --image ${image_name} --tag ${{ github.event.inputs.tag }}
40 changes: 25 additions & 15 deletions dev/docker/hive/README.md → dev/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,26 @@
Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2.
-->
# Hadoop and Hive Docker image
This Docker image is used to support Gravitino integration testing.
It includes Hadoop-2.x and Hive-2.x, you can use this Docker image to test the Gravitino catalog-hive module.
# Gravitino Docker images
This Docker image is designed to facilitate Gravitino integration testing.
It can be utilized to test all catalog and connector modules within Gravitino.

## Build Docker image
```
./build-docker.sh --platform [all|linux/amd64|linux/arm64] --image {image_name} --tag {tag_name} --latest
```
# Datastrato Docker hub repository
- [Datastrato Docker hub repository address](https://hub.docker.com/r/datastrato)

## Run container
## How to build Docker image
```
docker run --rm -d -p 8022:22 -p 8088:8088 -p 9000:9000 -p 9083:9083 -p 10000:10000 -p 10002:10002 -p 50070:50070 -p 50075:50075 -p 50010:50010 datastrato/gravitino-ci-hive
./build-docker.sh --platform [all|linux/amd64|linux/arm64] --type [hive|trino] --image {image_name} --tag {tag_name} --latest
```

## Login Docker container
# Version change history
## Gravitino CI Hive

### Container startup commands
```
ssh -p 8022 datastrato@localhost (password: ds123, this is a sudo user)
docker run --rm -d -p 8022:22 -p 8088:8088 -p 9000:9000 -p 9083:9083 -p 10000:10000 -p 10002:10002 -p 50070:50070 -p 50075:50075 -p 50010:50010 datastrato/gravitino-ci-hive
```

# Docker hub repository
- [datastrato/gravitino-ci-hive](https://hub.docker.com/r/datastrato/gravitino-ci-hive)

## Version change history
### 0.1.0
- Docker image `datastrato/gravitino-ci-hive:0.1.0`
- `hadoop-2.7.3`
Expand Down Expand Up @@ -62,3 +59,16 @@ ssh -p 8022 datastrato@localhost (password: ds123, this is a sudo user)

### 0.1.5
- Rollback `Map container hostname to 127.0.0.1 before starting Hadoop` of `datastrato/gravitino-ci-hive:0.1.4`

## Gravitino CI Trino

### Container startup commands
```
docker run --rm -it -p 8080:8080 datastrato/gravitino-ci-trino
```

### 0.1.0
- Docker image `datastrato/gravitino-ci-trino:0.1.0`
- Base on `trinodb/trino:426` and removed some unused plugins from it.
- Expose ports:
- `8080` Trino JDBC port
99 changes: 99 additions & 0 deletions dev/docker/build-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#!/bin/bash
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
set -ex
script_dir="$(dirname "${BASH_SOURCE-$0}")"
script_dir="$(cd "${script_dir}">/dev/null; pwd)"

# Build docker image for multi-arch
USAGE="-e Usage: ./build-docker.sh --platform [all|linux/amd64|linux/arm64] --type [hive|trino] --image {image_name} --tag {tag_name} --latest"

# Get platform type
if [[ "$1" == "--platform" ]]; then
shift
platform_type="$1"
if [[ "${platform_type}" == "linux/amd64" || "${platform_type}" == "linux/arm64" || "${platform_type}" == "all" ]]; then
echo "INFO : platform type is ${platform_type}"
else
echo "ERROR : ${platform_type} is not a valid platform type"
echo ${USAGE}
exit 1
fi
shift
else
platform_type="all"
fi

# Get component type
if [[ "$1" == "--type" ]]; then
shift
component_type="$1"
shift
else
echo "ERROR : must specify component type"
echo ${USAGE}
exit 1
fi

# Get docker image name
if [[ "$1" == "--image" ]]; then
shift
image_name="$1"
shift
else
echo "ERROR : must specify image name"
echo ${USAGE}
exit 1
fi

# Get docker image tag
if [[ "$1" == "--tag" ]]; then
shift
tag_name="$1"
shift
fi

# Get latest flag
build_latest=0
if [[ "$1" == "--latest" ]]; then
shift
build_latest=1
fi

if [[ "${component_type}" == "hive" ]]; then
. ${script_dir}/hive/hive-dependency.sh
build_args="--build-arg HADOOP_PACKAGE_NAME=${HADOOP_PACKAGE_NAME} --build-arg HIVE_PACKAGE_NAME=${HIVE_PACKAGE_NAME}"
elif [ "${component_type}" == "trino" ]; then
true # Placeholder, do nothing
else
echo "ERROR : ${component_type} is not a valid component type"
echo ${USAGE}
exit 1
fi

# Create multi-arch builder
BUILDER_NAME="gravitino-builder"
builders=$(docker buildx ls)
if echo "${builders}" | grep -q "${BUILDER_NAME}"; then
echo "BuildKit builder '${BUILDER_NAME}' already exists."
else
echo "BuildKit builder '${BUILDER_NAME}' does not exist."
docker buildx create --platform linux/amd64,linux/arm64 --use --name ${BUILDER_NAME}
fi

cd ${script_dir}/${component_type}
if [[ "${platform_type}" == "all" ]]; then
if [ ${build_latest} -eq 1 ]; then
docker buildx build --platform=linux/amd64,linux/arm64 ${build_args} --push --progress plain -f Dockerfile -t ${image_name}:latest -t ${image_name}:${tag_name} .
else
docker buildx build --platform=linux/amd64,linux/arm64 ${build_args} --push --progress plain -f Dockerfile -t ${image_name}:${tag_name} .
fi
else
if [ ${build_latest} -eq 1 ]; then
docker buildx build --platform=${platform_type} ${build_args} --output type=docker --progress plain -f Dockerfile -t ${image_name}:latest -t ${image_name}:${tag_name} .
else
docker buildx build --platform=${platform_type} ${build_args} --output type=docker --progress plain -f Dockerfile -t ${image_name}:${tag_name} .
fi
fi
100 changes: 0 additions & 100 deletions dev/docker/hive/build-docker.sh

This file was deleted.

31 changes: 31 additions & 0 deletions dev/docker/hive/hive-dependency.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
set -ex
hive_dir="$(dirname "${BASH_SOURCE-$0}")"
hive_dir="$(cd "${hive_dir}">/dev/null; pwd)"

# Environment variables definition
HADOOP_VERSION="2.7.3"
HIVE_VERSION="2.3.9"

HADOOP_PACKAGE_NAME="hadoop-${HADOOP_VERSION}.tar.gz" # Must export this variable for Dockerfile
HADOOP_DOWNLOAD_URL="http://archive.apache.org/dist/hadoop/core/hadoop-${HADOOP_VERSION}/${HADOOP_PACKAGE_NAME}"

HIVE_PACKAGE_NAME="apache-hive-${HIVE_VERSION}-bin.tar.gz" # Must export this variable for Dockerfile
HIVE_DOWNLOAD_URL="https://archive.apache.org/dist/hive/hive-${HIVE_VERSION}/${HIVE_PACKAGE_NAME}"

# Prepare download packages
if [[ ! -d "${hive_dir}/packages" ]]; then
mkdir -p "${hive_dir}/packages"
fi

if [ ! -f "${hive_dir}/packages/${HADOOP_PACKAGE_NAME}" ]; then
curl -s -o "${hive_dir}/packages/${HADOOP_PACKAGE_NAME}" ${HADOOP_DOWNLOAD_URL}
fi

if [ ! -f "${hive_dir}/packages/${HIVE_PACKAGE_NAME}" ]; then
curl -s -o "${hive_dir}/packages/${HIVE_PACKAGE_NAME}" ${HIVE_DOWNLOAD_URL}
fi
47 changes: 47 additions & 0 deletions dev/docker/trino/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
FROM trinodb/trino:426
LABEL maintainer="[email protected]"

# Only mysql, hudi, iceberg, mariadb, jmx, memory, tpch, tpcds, hive plugin are kept
RUN rm -rf /usr/lib/trino/plugin/accumulo \
&& rm -rf /usr/lib/trino/plugin/blackhole \
&& rm -rf /usr/lib/trino/plugin/delta-lake \
&& rm -rf /usr/lib/trino/plugin/example-http \
&& rm -rf /usr/lib/trino/plugin/geospatial \
&& rm -rf /usr/lib/trino/plugin/kafka \
&& rm -rf /usr/lib/trino/plugin/local-file \
&& rm -rf /usr/lib/trino/plugin/ml \
&& rm -rf /usr/lib/trino/plugin/mysql-event-listener \
&& rm -rf /usr/lib/trino/plugin/phoenix5 \
&& rm -rf /usr/lib/trino/plugin/prometheus \
&& rm -rf /usr/lib/trino/plugin/redshift \
&& rm -rf /usr/lib/trino/plugin/singlestore \
&& rm -rf /usr/lib/trino/plugin/thrift \
&& rm -rf /usr/lib/trino/plugin/atop \
&& rm -rf /usr/lib/trino/plugin/cassandra \
&& rm -rf /usr/lib/trino/plugin/druid \
&& rm -rf /usr/lib/trino/plugin/exchange-filesystem \
&& rm -rf /usr/lib/trino/plugin/google-sheets \
&& rm -rf /usr/lib/trino/plugin/http-event-listener \
&& rm -rf /usr/lib/trino/plugin/ignite \
&& rm -rf /usr/lib/trino/plugin/kinesis \
&& rm -rf /usr/lib/trino/plugin/mongodb \
&& rm -rf /usr/lib/trino/plugin/oracle \
&& rm -rf /usr/lib/trino/plugin/pinot \
&& rm -rf /usr/lib/trino/plugin/raptor-legacy \
&& rm -rf /usr/lib/trino/plugin/resource-group-managers \
&& rm -rf /usr/lib/trino/plugin/sqlserver \
&& rm -rf /usr/lib/trino/plugin/bigquery \
&& rm -rf /usr/lib/trino/plugin/clickhouse \
&& rm -rf /usr/lib/trino/plugin/elasticsearch \
&& rm -rf /usr/lib/trino/plugin/exchange-hdfs \
&& rm -rf /usr/lib/trino/plugin/hudi \
&& rm -rf /usr/lib/trino/plugin/kudu \
&& rm -rf /usr/lib/trino/plugin/password-authenticators \
&& rm -rf /usr/lib/trino/plugin/postgresql \
&& rm -rf /usr/lib/trino/plugin/redis \
&& rm -rf /usr/lib/trino/plugin/session-property-managers \
&& rm -rf /usr/lib/trino/plugin/teradata-functions
Binary file modified docs/assets/publish-docker-image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 2cc5545

Please sign in to comment.