Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#198] test: Add Catalog-hive e2e integration test #308

Merged
merged 10 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions .github/workflows/integration-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
name: Integration Test

# Controls when the workflow will run
on:
# Triggers the workflow on push or pull request events but only for the "main" branch
push:
branches: [ "main", "branch-*" ]
pull_request:
branches: [ "main", "branch-*" ]

env:
HIVE2_IMAGE_NAME: datastrato/hive2
HIVE2_IMAGE_VERSION: 0.1.0
HIVE2_IMAGE_LATEST: latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's usage of "HIVE_IMAGE_VERSION" and "HIVE_IMAGE_LATEST"

Copy link
Member Author

@xunliu xunliu Sep 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is that "HIVE2_IMAGE_VERSION" seemed not used in this script. Also seems "HIVE_IMAGE_VERSION" and "HIVE_IMAGE_LATEST" they all point to the version things, do we need to unify them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that with ASF projects you can only publish releases to Docker and can't refer to the latest. If we go with publishing the latest we'll need to change that when we enter the ASF incubator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that with ASF projects you can only publish releases to Docker and can't refer to the latest. If we go with publishing the latest we'll need to change that when we enter the ASF incubator.
@justinmclean Thank you for your reminder.

I modify this code.


concurrency:
group: ${{ github.worklfow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_requests' }}

jobs:
# Integration test for AMD64 architecture
test-amd64-arch:
runs-on: ubuntu-latest
timeout-minutes: 60
strategy:
matrix:
architecture: [linux/amd64]
env:
DOCKER_RUN_NAME: hive2-amd64
PLATFORM: ${{ matrix.architecture }}
steps:
- uses: actions/checkout@v3

- uses: actions/setup-java@v3
with:
java-version: '8'
distribution: 'temurin'

- name: Set up QEMU
uses: docker/setup-qemu-action@v1

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1

- name: Build the hive2 Docker image for AMD64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the Docker Official Images on Docker Hub provide a variety of architecturesopen_in_new. For example, the busybox image supports amd64, arm32v5, arm32v6, arm32v7, arm64v8, i386, ppc64le, and s390x. When running this image on an x86_64 / amd64 machine, the amd64 variant is pulled and run.

https://docs.docker.com/build/building/multi-platform/#:~:text=Most%20of%20the%20Docker%20Official,variant%20is%20pulled%20and%20run.

if: ${{ contains(github.event.pull_request.labels.*.name, 'build docker image') }}
run: ./dev/docker/hive2/build-docker.sh --platform ${PLATFORM} --image ${HIVE2_IMAGE_NAME}:${HIVE2_IMAGE_LATEST}

- name: Run AMD64 container
run: |
docker run --rm --name ${DOCKER_RUN_NAME} --platform ${PLATFORM} -d -p 8088:8088 -p 50070:50070 -p 50075:50075 -p 10000:10000 -p 10002:10002 -p 8888:8888 -p 9083:9083 -p 8022:22 ${HIVE2_IMAGE_NAME}:${HIVE2_IMAGE_LATEST}
docker ps -a

- name: Setup Gradle
uses: gradle/gradle-build-action@v2
with:
gradle-version: '8.1.1'

- name: Show gradle version
run: gradle --version

- name: Package Graviton
run: |
gradle build
gradle compileDistribution

- name: Setup Debug Action
if: ${{ contains(github.event.pull_request.labels.*.name, 'debug action') }}
xunliu marked this conversation as resolved.
Show resolved Hide resolved
uses: csexton/debugger-action@master

- name: Integration Test
run: |
gradle integrationTest

- name: Print logs when Graviton integration tests failure
if: ${{ failure() }}
run: |
if [ -f "distribution/package/logs/graviton-server.out" ]; then
cat distribution/package/logs/graviton-server.out
fi
if [ -f "distribution/package/logs/graviton-server.log" ]; then
cat distribution/package/logs/graviton-server.log
fi

- name: Stop and remove container
run: |
docker stop ${DOCKER_RUN_NAME}
sleep 3
docker ps -a
docker rmi ${HIVE2_IMAGE_NAME}:${HIVE2_IMAGE_LATEST}
47 changes: 0 additions & 47 deletions .github/workflows/integration.yml

This file was deleted.

11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,13 @@
Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2.
-->
# Graviton
# Graviton
## Introduction

Graviton is a high-performance, geo-distributed and federated metadata lake.

## Development Guide

1. [How to build Graviton](docs/how-to-build.md)
2. [How to Run Integration Test](docs/integration-test.md)
3. [How to publish Docker images](docs/publish-docker-images.md)
4 changes: 2 additions & 2 deletions bin/graviton.sh
Original file line number Diff line number Diff line change
Expand Up @@ -124,14 +124,14 @@ function stop() {
}

HOSTNAME=$(hostname)
GRAVITON_OUTFILE="${GRAVITON_LOG_DIR}/graviton-${HOSTNAME}.out"
GRAVITON_OUTFILE="${GRAVITON_LOG_DIR}/graviton-server.out"
GRAVITON_SERVER_NAME=com.datastrato.graviton.server.GravitonServer

JAVA_OPTS+=" -Dfile.encoding=UTF-8"
JAVA_OPTS+=" -Dlog4j2.configurationFile=file://${GRAVITON_CONF_DIR}/log4j2.properties"
JAVA_OPTS+=" -Dgraviton.log.path=${GRAVITON_LOG_DIR} ${GRAVITON_MEM}"

addJarInDir "${GRAVITON_HOME}/lib"
addJarInDir "${GRAVITON_HOME}/libs"

case "${1}" in
start)
Expand Down
27 changes: 17 additions & 10 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ tasks {
val outputDir = projectDir.dir("distribution")

val compileDistribution by registering {
dependsOn("copyRuntimeClass", "copyCatalogRuntimeClass", "copySubmoduleClass")
dependsOn("copyRuntimeClass", "copyCatalogRuntimeClass", "copySubmoduleClass", "copyCatalogModuleClass")

group = "graviton distribution"
outputs.dir(projectDir.dir("distribution/package"))
Expand All @@ -172,8 +172,7 @@ tasks {
group = "graviton distribution"
finalizedBy("checksumDistribution")
from(compileDistribution.map { it.outputs.files.single() })
archiveBaseName.set("datastrato")
archiveAppendix.set(rootProject.name.lowercase())
archiveBaseName.set(rootProject.name.lowercase())
archiveVersion.set("${version}")
archiveClassifier.set("bin")
destinationDirectory.set(outputDir)
Expand Down Expand Up @@ -204,10 +203,10 @@ tasks {

val copyRuntimeClass by registering(Copy::class) {
subprojects.forEach() {
if (it.name != "catalog-hive" && it.name != "client-java") {
// println("copyRuntimeClass: ${it.name}")
if (it.name != "catalog-hive" && it.name != "client-java" && it.name != "integration-test") {
println("copyRuntimeClass: ${it.name}")
from(it.configurations.runtimeClasspath)
into("distribution/package/lib")
into("distribution/package/libs")
}
}
}
Expand All @@ -217,24 +216,32 @@ tasks {
if (it.name == "catalog-hive") {
// println("copyCatalogRuntimeClass: ${it.name}")
from(it.configurations.runtimeClasspath)
into("distribution/package/catalogs/catalog-hive/lib")
into("distribution/package/catalogs/hive/libs")
}
}
}

val copySubmoduleClass by registering(Copy::class) {
dependsOn("copyRuntimeClass", "copyCatalogRuntimeClass")
subprojects.forEach() {
// println("copySubmoduleClass: ${it.name}")
if (it.name != "client-java") {
if (it.name != "client-java" && it.name != "integration-test" && it.name != "catalog-hive") {
from("${it.name}/build/libs")
into("distribution/package/lib")
into("distribution/package/libs")
include("*.jar")
setDuplicatesStrategy(DuplicatesStrategy.INCLUDE)
}
}
}

val copyCatalogModuleClass by registering(Copy::class) {
subprojects.forEach() {
if (it.name == "catalog-hive") {
from("${it.name}/build/libs")
into("distribution/package/catalogs/hive/libs")
}
}
}

task("integrationTest") {
dependsOn(":integration-test:integrationTest")
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
/*
* Copyright 2023 Datastrato.
* This software is licensed under the Apache License version 2.
*/
package com.datastrato.graviton.catalog.hive;

import com.datastrato.graviton.Config;
import com.datastrato.graviton.config.ConfigBuilder;
import com.datastrato.graviton.config.ConfigEntry;

public class HiveCatalogConfig extends Config {
public static final ConfigEntry<String> HADOOP_USER_NAME =
new ConfigBuilder("graviton.hadoop.user.name")
.doc(
"The specify Hadoop user name that will be used when accessing Hadoop Distributed File System (HDFS).")
.version("0.1.0")
.stringConf()
.createWithDefault("hive");
}
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,13 @@ public void initialize(Map<String, String> conf) throws RuntimeException {
conf.forEach(hadoopConf::set);
hiveConf = new HiveConf(hadoopConf, HiveCatalogOperations.class);

// TODO(xun): Wait add Graviton User Account System to manage user and group
// The specify Hadoop user name that will be used when accessing Hadoop
// Distributed File System (HDFS).
if (conf.containsKey(HiveCatalogConfig.HADOOP_USER_NAME.getKey())) {
System.setProperty("HADOOP_USER_NAME", conf.get(HiveCatalogConfig.HADOOP_USER_NAME.getKey()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific Hadoop user name that will be used when accessing Hadoop Distributed File System (HDFS).
I think using HADOOP_USER_NAME environment is not good. Because users need to use HiveCatalogOperations.java operation in many different Hive Clusters.
Wait add Graviton User Account System to manage users and groups in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we remove this configuration and use some other ways to bypass this issue? I don't think using configuration here is a good idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we set this property in the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify to set HADOOP_USER_NAME environment in the integration-test module.

}

// todo(xun): add hive client pool size in config
this.clientPool = new HiveClientPool(1, hiveConf);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,15 @@
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.apache.thrift.TException;
import org.apache.thrift.transport.TTransportException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

// hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java

/** Represents a client pool for managing connections to the Hive Metastore service. */
public class HiveClientPool extends ClientPoolImpl<IMetaStoreClient, TException> {

private static final Logger LOG = LoggerFactory.getLogger(HiveClientPool.class);
private static final DynMethods.StaticMethod GET_CLIENT =
DynMethods.builder("getProxy")
.impl(
Expand Down Expand Up @@ -96,6 +99,7 @@ protected IMetaStoreClient newClient() {

@Override
protected IMetaStoreClient reconnect(IMetaStoreClient client) {
LOG.warn("Reconnecting to Hive Metastore");
try {
client.close();
client.reconnect();
Expand All @@ -116,6 +120,7 @@ protected boolean isConnectionException(Exception e) {

@Override
protected void close(IMetaStoreClient client) {
LOG.info("Closing Hive Metastore client");
client.close();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,19 +71,23 @@ public CatalogWrapper(BaseCatalog catalog, IsolatedClassLoader classLoader) {
}

public <R> R doWithSchemaOps(ThrowableFunction<SupportsSchemas, R> fn) throws Exception {
if (asSchemas() == null) {
throw new UnsupportedOperationException("Catalog does not support schema operations");
}
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved

return classLoader.withClassLoader(cl -> fn.apply(asSchemas()));
return classLoader.withClassLoader(
cl -> {
if (asSchemas() == null) {
throw new UnsupportedOperationException("Catalog does not support schema operations");
}
return fn.apply(asSchemas());
});
}

public <R> R doWithTableOps(ThrowableFunction<TableCatalog, R> fn) throws Exception {
if (asTables() == null) {
throw new UnsupportedOperationException("Catalog does not support table operations");
}

return classLoader.withClassLoader(cl -> fn.apply(asTables()));
return classLoader.withClassLoader(
cl -> {
if (asTables() == null) {
throw new UnsupportedOperationException("Catalog does not support table operations");
}
return fn.apply(asTables());
});
}

public void close() {
Expand Down Expand Up @@ -447,7 +451,14 @@ private String buildPkgPath(Map<String, String> conf, String provider) {
if (pkg != null) {
pkgPath = pkg;
} else if (!testEnv) {
pkgPath = gravitonHome + File.separator + "catalogs" + File.separator + provider;
pkgPath =
gravitonHome
+ File.separator
+ "catalogs"
+ File.separator
+ provider
+ File.separator
+ "libs";
} else {
pkgPath =
new StringBuilder()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,8 @@ private <R, E extends Throwable> R doWithCatalog(
NameIdentifier ident, ThrowableFunction<CatalogManager.CatalogWrapper, R> fn, Class<E> ex)
throws E {
try {
CatalogManager.CatalogWrapper c = catalogManager.loadCatalogAndWrap(ident);
NameIdentifier catalogIdent = getCatalogIdentifier(ident);
CatalogManager.CatalogWrapper c = catalogManager.loadCatalogAndWrap(catalogIdent);
return fn.apply(c);
} catch (Throwable throwable) {
if (ex.isInstance(throwable)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@
*/
package com.datastrato.graviton.utils;

import com.datastrato.graviton.meta.AuditInfo;
import com.datastrato.graviton.meta.rel.BaseSchema;
import java.io.Closeable;
import java.io.InputStream;
import java.net.URL;
Expand Down Expand Up @@ -151,9 +149,7 @@ private boolean isSharedClass(String name) {
*/
private boolean isBarrierClass(String name) {
// We need to add more later on when we have more catalog implementations.
return name.startsWith(BaseSchema.class.getName())
|| name.startsWith(AuditInfo.class.getName())
|| barrierClasses.stream().anyMatch(name::startsWith);
return barrierClasses.stream().anyMatch(name::startsWith);
}

private ClassLoader getRootClassLoader() throws Exception {
Expand Down
2 changes: 1 addition & 1 deletion dev/docker/hive2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Build Image

Run container
=============
docker run --rm -m -p 8088:8088 -p 50070:50070 -p 50075:50075 -p 10000:10000 -p 10002:10002 -p 8888:8888 -p 9083:9083 -p 8022:22 datastrato/hive2:0.1.0
docker run --rm -d -p 8088:8088 -p 50070:50070 -p 50075:50075 -p 10000:10000 -p 10002:10002 -p 8888:8888 -p 9083:9083 -p 8022:22 datastrato/hive2:0.1.0

Login to the server
=============
Expand Down
Loading