Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#198] test: Add Catalog-hive e2e integration test #308

Merged
merged 10 commits into from
Sep 5, 2023
Merged

[#198] test: Add Catalog-hive e2e integration test #308

merged 10 commits into from
Sep 5, 2023

Conversation

xunliu
Copy link
Member

@xunliu xunliu commented Sep 1, 2023

What changes were proposed in this pull request?

  1. Add Catalog-hive integration test in the integration-test module
  2. Run integration test in the GitHub Action AMD64 architecture (support ARM64 future in the next PR)
  3. Add some documents
  4. Add build docker image label in the PR, It enables/disable rebuilding the docker image in the Pull Request CI step.

Why are the changes needed?

Launching the Graviton Server process in the local host or GitHub CI environment.
Through Graviton client connects the Graviton Server and hive docker container to test Catalog-hive.

Fix: #198

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Test in the GitHub Action

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

Code Coverage Report

Overall Project 62.99% -0.25% 🟢
Files changed 18.18% 🔴

Module Coverage
core 71.35% -0.74% 🔴
catalog-hive 58.45% -0.07% 🟢
Files
Module File Coverage
core CatalogManager.java 62.26% -6.1% 🔴
IsolatedClassLoader.java 54.04% 🟢
CatalogOperationDispatcher.java 17.71% -2.24% 🔴
catalog-hive HiveClientPool.java 53.46% -1.89% 🟢

@jerryshao jerryshao added need backport Issues that need to backport to another branch branch-0.1 labels Sep 1, 2023
.github/workflows/integration-test.yml Outdated Show resolved Hide resolved
.github/workflows/integration-test.yml Outdated Show resolved Hide resolved
env:
HIVE2_IMAGE_NAME: datastrato/hive2
HIVE2_IMAGE_VERSION: 0.1.0
HIVE2_IMAGE_LATEST: latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's usage of "HIVE_IMAGE_VERSION" and "HIVE_IMAGE_LATEST"

Copy link
Member Author

@xunliu xunliu Sep 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is that "HIVE2_IMAGE_VERSION" seemed not used in this script. Also seems "HIVE_IMAGE_VERSION" and "HIVE_IMAGE_LATEST" they all point to the version things, do we need to unify them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that with ASF projects you can only publish releases to Docker and can't refer to the latest. If we go with publishing the latest we'll need to change that when we enter the ASF incubator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that with ASF projects you can only publish releases to Docker and can't refer to the latest. If we go with publishing the latest we'll need to change that when we enter the ASF incubator.
@justinmclean Thank you for your reminder.

I modify this code.

.github/workflows/integration-test.yml Show resolved Hide resolved
// The specify Hadoop user name that will be used when accessing Hadoop
// Distributed File System (HDFS).
if (conf.containsKey(HiveCatalogConfig.HADOOP_USER_NAME.getKey())) {
System.setProperty("HADOOP_USER_NAME", conf.get(HiveCatalogConfig.HADOOP_USER_NAME.getKey()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific Hadoop user name that will be used when accessing Hadoop Distributed File System (HDFS).
I think using HADOOP_USER_NAME environment is not good. Because users need to use HiveCatalogOperations.java operation in many different Hive Clusters.
Wait add Graviton User Account System to manage users and groups in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we remove this configuration and use some other ways to bypass this issue? I don't think using configuration here is a good idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we set this property in the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify to set HADOOP_USER_NAME environment in the integration-test module.

docs/integration-test.md Outdated Show resolved Hide resolved
integration-test/build.gradle.kts Outdated Show resolved Hide resolved
Comment on lines +211 to +241
// TODO (xun) enable this test waiting for fixed [#316] [Bug report] alterSchema throw
// NoSuchSchemaException
// @Order(3)
// @Test
public void testAlterSchema() throws TException, InterruptedException {
NameIdentifier ident = NameIdentifier.of(metalakeName, catalogName, schemaName);
Map<String, String> properties = Maps.newHashMap();
properties.put("key1", "val1");
properties.put("key2", "val2");
String comment = "comment";

GravitonMetaLake metalake = client.loadMetalake(NameIdentifier.of(metalakeName));
Catalog catalog = metalake.loadCatalog(NameIdentifier.of(metalakeName, catalogName));
catalog
.asSchemas()
.alterSchema(
ident,
SchemaChange.removeProperty("key1"),
SchemaChange.setProperty("key2", "val2-alter"));

NameIdentifier[] nameIdentifiers = catalog.asSchemas().listSchemas(ident.namespace());

Map<String, String> properties2 = catalog.asSchemas().loadSchema(ident).properties();
Assertions.assertFalse(properties2.containsKey("key1"));
Assertions.assertEquals("val2-alter", properties2.get("key2"));

Database database = hiveClientPool.run(client -> client.getDatabase(schemaName));
Map<String, String> properties3 = database.getParameters();
Assertions.assertFalse(properties3.containsKey("key1"));
Assertions.assertEquals("val2-alter", properties3.get("key2"));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testAlterSchema() throws an exception.
I created #316 [Bug report] alterSchema throw to track this bug.
Temporarily disable this test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can debug it together tomorrow;

.github/workflows/integration-test.yml Show resolved Hide resolved
docs/how-to-build.md Show resolved Hide resolved
docs/integration-test.md Outdated Show resolved Hide resolved
integration-test/build.gradle.kts Outdated Show resolved Hide resolved
Comment on lines +211 to +241
// TODO (xun) enable this test waiting for fixed [#316] [Bug report] alterSchema throw
// NoSuchSchemaException
// @Order(3)
// @Test
public void testAlterSchema() throws TException, InterruptedException {
NameIdentifier ident = NameIdentifier.of(metalakeName, catalogName, schemaName);
Map<String, String> properties = Maps.newHashMap();
properties.put("key1", "val1");
properties.put("key2", "val2");
String comment = "comment";

GravitonMetaLake metalake = client.loadMetalake(NameIdentifier.of(metalakeName));
Catalog catalog = metalake.loadCatalog(NameIdentifier.of(metalakeName, catalogName));
catalog
.asSchemas()
.alterSchema(
ident,
SchemaChange.removeProperty("key1"),
SchemaChange.setProperty("key2", "val2-alter"));

NameIdentifier[] nameIdentifiers = catalog.asSchemas().listSchemas(ident.namespace());

Map<String, String> properties2 = catalog.asSchemas().loadSchema(ident).properties();
Assertions.assertFalse(properties2.containsKey("key1"));
Assertions.assertEquals("val2-alter", properties2.get("key2"));

Database database = hiveClientPool.run(client -> client.getDatabase(schemaName));
Map<String, String> properties3 = database.getParameters();
Assertions.assertFalse(properties3.containsKey("key1"));
Assertions.assertEquals("val2-alter", properties3.get("key2"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can debug it together tomorrow;

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1

- name: Build the hive2 Docker image for AMD64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the Docker Official Images on Docker Hub provide a variety of architecturesopen_in_new. For example, the busybox image supports amd64, arm32v5, arm32v6, arm32v7, arm64v8, i386, ppc64le, and s390x. When running this image on an x86_64 / amd64 machine, the amd64 variant is pulled and run.

https://docs.docker.com/build/building/multi-platform/#:~:text=Most%20of%20the%20Docker%20Official,variant%20is%20pulled%20and%20run.

@xunliu
Copy link
Member Author

xunliu commented Sep 5, 2023

@jerryshao I fixed the problem in the comments. Please help me review PR, Thanks.

@jerryshao jerryshao closed this Sep 5, 2023
@jerryshao jerryshao reopened this Sep 5, 2023
@jerryshao jerryshao merged commit 5b26878 into apache:main Sep 5, 2023
github-actions bot pushed a commit that referenced this pull request Sep 5, 2023
### What changes were proposed in this pull request?

1. Add Catalog-hive integration test in the `integration-test` module
2. Run integration test in the GitHub Action AMD64 architecture (support
ARM64 future in the next PR)
3. Add some documents
4. Add `build docker image` label in the PR, It enables/disable
rebuilding the docker image in the Pull Request CI step.

### Why are the changes needed?

Launching the Graviton Server process in the local host or GitHub CI
environment.
Through Graviton client connects the Graviton Server and hive docker
container to test Catalog-hive.

Fix: #198

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Test in the GitHub Action
@xunliu xunliu deleted the issue-198 branch October 9, 2023 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build docker image need backport Issues that need to backport to another branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Catalog-hive e2e integration test
4 participants