Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#483] test(lakehouse-iceberg): add graviton IT test for iceberg #501

Closed
wants to merge 181 commits into from

Conversation

Clearvive
Copy link
Contributor

@Clearvive Clearvive commented Oct 12, 2023

What changes were proposed in this pull request?
Integrate testing using Graviton Server, used jdbcCatalog for end-to-end validation.

Why are the changes needed?

Issues: #483

Does this PR introduce any user-facing change?
No

How was this patch tested?
CatalogIcebergIT

jerryshao and others added 30 commits April 23, 2023 10:09
### What changes were proposed in this pull request?

This PR propose the schema and type spec for Unified Catalog. This spec
is used to describe how metadata is organized in the system.

### Why are the changes needed?

This PR defines the basic metadata schema model, which will be used in
the system for memory structure, on-wire protocol and serialization
protocol.

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

N/A
…ndecy and plugin version (#18)

### What changes were proposed in this pull request?

This PR improves the gradle build file to use gradle catalog mechanism
to centralize the dependency and plugin versions. Also removes some
redundant codes.

### Why are the changes needed?

Currently there's no centralized version control place for project, this
will easily lead to dependency chaos. After some investigation, we
choose to use gradle's default catalog mechanism to manage all the
versions in a central place.

Fix: #17

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Local manual test.
### What changes were proposed in this pull request?

This PR proposes to implement the schema metadata and type system in
Java and Protobuf.

### Why are the changes needed?

This PR defines a core metadata system for unified catalog.

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

This PR adds UT to cover the schema definitions.
…ema (#20)

### What changes were proposed in this pull request?

This PR adds JSON serde support for schema system.

### Why are the changes needed?

The adds of JSON serde support will help to support REST API for Unified
Catalog.

Fix: #16 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

And new UTs.
…pport for schema system (#26)

### What changes were proposed in this pull request?

This PR proposes to add protobuf SerDe support for schema support. The
protobuf SerDe support will be mainly used in schema persistence and RPC
communication.

### Why are the changes needed?

The adds of protobuf SerDe support will be used in schema persistence
and RPC communication.

Fix: #21

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Adds new UT to cover the codes.
…#30)

### What changes were proposed in this pull request?

As discussed in #22 , we planned to rename the project from Unified
Catalog to Graviton. This PR aims to change all the affected items from
Unified Catalog to Graviton.


Fix: #22 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Existing UTs.
### What changes were proposed in this pull request?

This PR introduces a Spark-style like strong typed config system for the
project.

### Why are the changes needed?

The config system is a cornerstone of the project. By comparing
different config system implementations, Spark's one is strong type
verified, and also can extend to other functions, so introducing a
Spark-style like config system for the project.

Fix: #27 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Add new UTs to cover the test
…tifier (#33)

### What changes were proposed in this pull request?

This PR is a preconditional PR to support REST API for Graviton. This PR
defines:

1. Entity's name identifier to distinguish between entities.
2. Entity operation interfaces. We will later on implement this
interface to manipulate the entities.

### Why are the changes needed?

This PR is a preconditional PR to support REST API for Graviton.

Fix: #32 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

New UTs to cover the test.
### What changes were proposed in this pull request?

This PR proposes to add Jetty server support for Graviton.

### Why are the changes needed?

The purpose of introducing Jetty as embedded web server is that:

1. Jetty is a light-weighted web server that can be easily embedded into
our project compared to Tomcat and other services.
2. We basically don't want to introduce a bunch of Springboot related
code to build our REST API. In that case, Jersey + Jetty would be the
best choice.
3. If later on the performance of Jetty cannot meet our requirements, we
can shift to use other web servers instead.

Fix: #5

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Local manual test
…APIs (#38)

### What changes were proposed in this pull request?

This PR adds Jersey support with Jetty for REST APIs, also adding tenant
operation REST APIs.

### Why are the changes needed?

The REST API is mainly exposed to users and compute engines to
manipulate the metadata. Adding a basic Jersey support as well as
referenced tenant operation implementations.

Fix: #34 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Manual e2e tests. Jersey test framework will be added later.
…(#40)

### What changes were proposed in this pull request?

This PR proposes to introduce Jersey test framework and mock tool, so we
could add jersey UTs later on. Also complement the left tenant operation
UTs for last PR.

### Why are the changes needed?

This PR introduces jersey test framework, which could be used later on
when we add more rest interfaces.

Fix: #39

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

UTs
… (#45)

### What changes were proposed in this pull request?

This PR changes include:

1. refactors the whole metadata system, which simplifies the current
tenant/lakehouse/zone/table schema structure to
lakehouse/catalog/<meta-structure> (refer to what metacat did).
2. Add catalog interfaces and define a series of catalog behaviors. the
specific catalog implementation could inherit these interfaces to
achieve its own one (refer to what Spark connector/catalog did).
3. rearchitect the code structure.

### Why are the changes needed?

As #43 described, the current schema system is a bit difficult to manage
(to pursue virtual semantics), we needs to simplify the current design.

Fix: #44

### Does this PR introduce _any_ user-facing change?

We're still in the early stage of the project, the change is unavoidable
and acceptable.

### How was this patch tested?

UTs to cover the codes.
### What changes were proposed in this pull request?

This PR proposes to add the entity store interface for graviton. The
implementation of this interface will store the entities to the
underlying storage system.

### Why are the changes needed?

This is the basic interface for entity store, which defines the
supported behavior of underlying storage.

Fix: #48 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Add UTs to cover the code.
…stem (#47)

### What changes were proposed in this pull request?

This PR tracks the work of #46 to update the rfc-1 to match the
refactoring work in #45 .

### Why are the changes needed?

This is the subtask for the schema system refactoring work.

Fix: #46 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

NA
### What changes were proposed in this pull request?

This PR adds the `EntitySerDe` interface and Protobuf implementation for
Graviton.

### Why are the changes needed?

The SerDe interface will be used for storage system to serialize and
deserialize entity objects when interacting with underlying storage.

Fix: #51 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Existing UTs to cover the code.
### What changes were proposed in this pull request?

1. This PR adds the HiveCatalog implementation for Graviton.
2. This HiveCatalog includes a hive client. 
3. This hive client is created by reflection and can support hive1,
hive2, and hive3.

### Why are the changes needed?

With this change, we could add maintain namespace for Graviton as the
next step.

Fix: #60 

### Does this PR introduce _any_ user-facing change?
As the early stage of the project, the change is unavoidable.

### How was this patch tested?

UTs to cover the test.
### What changes were proposed in this pull request?

This PR propose to do several reafactoring works:

1. Introduce a API module and extract all the interfaces to this module.
This module is mainly for users and graviton internally to manipulate
metadata.
2. Introduce a comment module and implement all the DTOs for graviton.
These DTOs will be mainly used to transmit objects between client and
server.
3. Add Lakehouse update request to the service.

### Why are the changes needed?

With this change, we could add a common client for Graviton as the next
step, and implement a Spark/trino connector later on.

Fix: #56 

### Does this PR introduce _any_ user-facing change?

As the early stage of the project, the change is unavoidable.

### How was this patch tested?

UTs to cover the test.
…raviton (#68)

### What changes were proposed in this pull request?

This PR propose to change the Graviton terminology `Lakehouse` to
`Metalake`.

### Why are the changes needed?

Compared to `Lakehouse`, `Metalake` is more closer to the goal of
Graviton as a unified metadata repository (lake), so proposing to change
the name.

Fix: #67 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Existing tests.
### What changes were proposed in this pull request?

As a cornerstone work of building Graviton client, this PR proposes to
add REST client support for Graviton Client. This work is mainly
referred from Apache Iceberg.

### Why are the changes needed?

This is the cornerstone work of building Graviton Client.

Fix: #64 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Add UTs to cover the code.
…rt (#70)

### What changes were proposed in this pull request?

This PR proposes to add Graviton client metalake manipulation support.

### Why are the changes needed?

This PR is a part of the work to build a Graviton client.

Fix: #66 

### Does this PR introduce _any_ user-facing change?

This PR introduces new `GravitonClient` and `GravitonMetalake`
interfaces.

### How was this patch tested?

Add new UTs to cover the code.
Add a minimal NOTICE file so any content from 3rd party ALvL NOTICE
files or required 3rd party notices can be placed here.
### What changes were proposed in this pull request?

This PR adds the support of catalog rest implementation for Graviton.

### Why are the changes needed?

With this, users could issue REST requests to manipulate catalogs.

Fix: #72 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Add UTs to test.
Set contributing expectations for external contributors.
…talog (#74)

### What changes were proposed in this pull request?

This HiveCatalog will be used for `SupportsNamespaces` interface to
implement create/drop/alter namespaces in the hive.

### Why are the changes needed?

1. Creating a graviton namespace equivalent to creating a hive database.
2. Dropping a graviton namespace equivalent to dropping a hive database.
3. Altering a graviton namespace properties equivalent to Altering a
hive database properties.


### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Added `HiveNamespaceTest` unit test.
@Clearvive Clearvive force-pushed the 400/iceberg-graviton-it branch from 1359824 to 2faeed5 Compare October 16, 2023 06:41
### What changes were proposed in this pull request?
1. MySQL bind 0.0.0.0
2. create iceberg user and grant all priviges

### Why are the changes needed?
Iceberg JDBC IT need access mysql

Fix: #508

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. MySQL client out of docker could connect MySQL server using `iceberg`
user
yuqi1129 and others added 3 commits October 16, 2023 21:04
…nfigurations that will be passed by to specific engines (#510)

### What changes were proposed in this pull request?

Make sure Gravition configuration like `a.b` will overwrite
configurations(`gravition.passby.a.b`) that will be passby to `Hive`

### Why are the changes needed?

The Graviton configuration has a higher priority, so it should overwrite
the pass-by configurations.

Fix: #509 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

UT
…ata (#486)

### What changes were proposed in this pull request?
Implement catalogPropertiesMetadata、tablePropertiesMetadata for
IcebergCatalogOperations

### Why are the changes needed?
Currently catalogPropertiesMetadatatablePropertiesMetadata will return
an empty map, and we need to implement it to return a real value for the
iceberg property metadata.

Fix: #446 

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
Add test testCatalogProperty in TestIcebergCatalog
Add test testTableProperty in TestIcebergTable
… (#515)

### What changes were proposed in this pull request?

Add UT to test restart and reopen for `RocksDBBackend`

### Why are the changes needed?

Fully test `RocksDBBackend` to verify if it works well.

Fix: #514 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

UT
@jerryshao jerryshao changed the title [400] test(lakehouse-iceberg): add graviton IT test for iceberg [#400] test(lakehouse-iceberg): add graviton IT test for iceberg Oct 16, 2023
@jerryshao
Copy link
Contributor

@sandflee Can you please help to review this, thanks.

@Clearvive Clearvive force-pushed the 400/iceberg-graviton-it branch from 2faeed5 to 28c148a Compare October 17, 2023 01:50
…eIT (#493)

### What changes were proposed in this pull request?

add hive catalog to IcebergRESTServiceIT
1. custom graviton config file with different Iceberg catalog types
2. The hive catalog warehouse location is using localfs to bypass HDFS
3. unify test namespace to `iceberg_rest_` prefix, to drop all test
namespace and tables before each test.

### Why are the changes needed?


Part of: #480

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. existing UTs
4. HiveCatalog UTs
@Clearvive Clearvive force-pushed the 400/iceberg-graviton-it branch from 28c148a to 32bb821 Compare October 17, 2023 02:17
@jerryshao jerryshao changed the title [#400] test(lakehouse-iceberg): add graviton IT test for iceberg [#483] test(lakehouse-iceberg): add graviton IT test for iceberg Oct 17, 2023
@jerryshao jerryshao removed this from the Graviton v0.2.0 milestone Oct 17, 2023
bin/common.sh Outdated
export DEFAULT_MYSQL_VERSION=8.0.15
export GET_MYSQL_JAR=true

function download_mysql_jdbc_jar(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only support mysql, what if user want to use PostgreSQL, how to enable it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will mention in the documentation that PostgreSQL users need to place the JAR file in the classpath themselves. Additionally, it seems that the PostgreSQL library can be directly downloaded and used. Does it have any copyright issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PG's license is compatible with Apache v2.

Basically, I'm thinking of the necessity to download MySQL jdbc driver automatically. If we're downloading MySQL jar automatically, why don't we also support PG, mssql, oracle, etc, right?

So I'm suggesting that we don't do this automatically in script, instead we write this in detail in the document to let user do it manually.

What do you think @FANNG1 @Clearvive ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to not download MYSQL jdbc drivers automatically. by the way , Iceberg using sqllite to test JdbcCatalog, no need to download Mysql or PG drivers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use SQLite instead, and write clearly in the document to tell user how to install different jdbc drivers.

@Clearvive Clearvive requested a review from FANNG1 October 17, 2023 06:55
@Clearvive Clearvive closed this Oct 17, 2023
@Clearvive Clearvive force-pushed the 400/iceberg-graviton-it branch from 2c0bc34 to a79aa71 Compare October 17, 2023 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants