-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#483] test(lakehouse-iceberg): add graviton IT test for iceberg #501
Conversation
### What changes were proposed in this pull request? This PR propose the schema and type spec for Unified Catalog. This spec is used to describe how metadata is organized in the system. ### Why are the changes needed? This PR defines the basic metadata schema model, which will be used in the system for memory structure, on-wire protocol and serialization protocol. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? N/A
…ndecy and plugin version (#18) ### What changes were proposed in this pull request? This PR improves the gradle build file to use gradle catalog mechanism to centralize the dependency and plugin versions. Also removes some redundant codes. ### Why are the changes needed? Currently there's no centralized version control place for project, this will easily lead to dependency chaos. After some investigation, we choose to use gradle's default catalog mechanism to manage all the versions in a central place. Fix: #17 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Local manual test.
### What changes were proposed in this pull request? This PR proposes to implement the schema metadata and type system in Java and Protobuf. ### Why are the changes needed? This PR defines a core metadata system for unified catalog. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? This PR adds UT to cover the schema definitions.
…ema (#20) ### What changes were proposed in this pull request? This PR adds JSON serde support for schema system. ### Why are the changes needed? The adds of JSON serde support will help to support REST API for Unified Catalog. Fix: #16 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? And new UTs.
…pport for schema system (#26) ### What changes were proposed in this pull request? This PR proposes to add protobuf SerDe support for schema support. The protobuf SerDe support will be mainly used in schema persistence and RPC communication. ### Why are the changes needed? The adds of protobuf SerDe support will be used in schema persistence and RPC communication. Fix: #21 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Adds new UT to cover the codes.
…#30) ### What changes were proposed in this pull request? As discussed in #22 , we planned to rename the project from Unified Catalog to Graviton. This PR aims to change all the affected items from Unified Catalog to Graviton. Fix: #22 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Existing UTs.
### What changes were proposed in this pull request? This PR introduces a Spark-style like strong typed config system for the project. ### Why are the changes needed? The config system is a cornerstone of the project. By comparing different config system implementations, Spark's one is strong type verified, and also can extend to other functions, so introducing a Spark-style like config system for the project. Fix: #27 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Add new UTs to cover the test
…tifier (#33) ### What changes were proposed in this pull request? This PR is a preconditional PR to support REST API for Graviton. This PR defines: 1. Entity's name identifier to distinguish between entities. 2. Entity operation interfaces. We will later on implement this interface to manipulate the entities. ### Why are the changes needed? This PR is a preconditional PR to support REST API for Graviton. Fix: #32 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? New UTs to cover the test.
### What changes were proposed in this pull request? This PR proposes to add Jetty server support for Graviton. ### Why are the changes needed? The purpose of introducing Jetty as embedded web server is that: 1. Jetty is a light-weighted web server that can be easily embedded into our project compared to Tomcat and other services. 2. We basically don't want to introduce a bunch of Springboot related code to build our REST API. In that case, Jersey + Jetty would be the best choice. 3. If later on the performance of Jetty cannot meet our requirements, we can shift to use other web servers instead. Fix: #5 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Local manual test
…APIs (#38) ### What changes were proposed in this pull request? This PR adds Jersey support with Jetty for REST APIs, also adding tenant operation REST APIs. ### Why are the changes needed? The REST API is mainly exposed to users and compute engines to manipulate the metadata. Adding a basic Jersey support as well as referenced tenant operation implementations. Fix: #34 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Manual e2e tests. Jersey test framework will be added later.
…(#40) ### What changes were proposed in this pull request? This PR proposes to introduce Jersey test framework and mock tool, so we could add jersey UTs later on. Also complement the left tenant operation UTs for last PR. ### Why are the changes needed? This PR introduces jersey test framework, which could be used later on when we add more rest interfaces. Fix: #39 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? UTs
… (#45) ### What changes were proposed in this pull request? This PR changes include: 1. refactors the whole metadata system, which simplifies the current tenant/lakehouse/zone/table schema structure to lakehouse/catalog/<meta-structure> (refer to what metacat did). 2. Add catalog interfaces and define a series of catalog behaviors. the specific catalog implementation could inherit these interfaces to achieve its own one (refer to what Spark connector/catalog did). 3. rearchitect the code structure. ### Why are the changes needed? As #43 described, the current schema system is a bit difficult to manage (to pursue virtual semantics), we needs to simplify the current design. Fix: #44 ### Does this PR introduce _any_ user-facing change? We're still in the early stage of the project, the change is unavoidable and acceptable. ### How was this patch tested? UTs to cover the codes.
### What changes were proposed in this pull request? This PR proposes to add the entity store interface for graviton. The implementation of this interface will store the entities to the underlying storage system. ### Why are the changes needed? This is the basic interface for entity store, which defines the supported behavior of underlying storage. Fix: #48 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Add UTs to cover the code.
…stem (#47) ### What changes were proposed in this pull request? This PR tracks the work of #46 to update the rfc-1 to match the refactoring work in #45 . ### Why are the changes needed? This is the subtask for the schema system refactoring work. Fix: #46 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA
### What changes were proposed in this pull request? This PR adds the `EntitySerDe` interface and Protobuf implementation for Graviton. ### Why are the changes needed? The SerDe interface will be used for storage system to serialize and deserialize entity objects when interacting with underlying storage. Fix: #51 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Existing UTs to cover the code.
### What changes were proposed in this pull request? 1. This PR adds the HiveCatalog implementation for Graviton. 2. This HiveCatalog includes a hive client. 3. This hive client is created by reflection and can support hive1, hive2, and hive3. ### Why are the changes needed? With this change, we could add maintain namespace for Graviton as the next step. Fix: #60 ### Does this PR introduce _any_ user-facing change? As the early stage of the project, the change is unavoidable. ### How was this patch tested? UTs to cover the test.
### What changes were proposed in this pull request? This PR propose to do several reafactoring works: 1. Introduce a API module and extract all the interfaces to this module. This module is mainly for users and graviton internally to manipulate metadata. 2. Introduce a comment module and implement all the DTOs for graviton. These DTOs will be mainly used to transmit objects between client and server. 3. Add Lakehouse update request to the service. ### Why are the changes needed? With this change, we could add a common client for Graviton as the next step, and implement a Spark/trino connector later on. Fix: #56 ### Does this PR introduce _any_ user-facing change? As the early stage of the project, the change is unavoidable. ### How was this patch tested? UTs to cover the test.
…raviton (#68) ### What changes were proposed in this pull request? This PR propose to change the Graviton terminology `Lakehouse` to `Metalake`. ### Why are the changes needed? Compared to `Lakehouse`, `Metalake` is more closer to the goal of Graviton as a unified metadata repository (lake), so proposing to change the name. Fix: #67 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Existing tests.
### What changes were proposed in this pull request? As a cornerstone work of building Graviton client, this PR proposes to add REST client support for Graviton Client. This work is mainly referred from Apache Iceberg. ### Why are the changes needed? This is the cornerstone work of building Graviton Client. Fix: #64 ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Add UTs to cover the code.
…rt (#70) ### What changes were proposed in this pull request? This PR proposes to add Graviton client metalake manipulation support. ### Why are the changes needed? This PR is a part of the work to build a Graviton client. Fix: #66 ### Does this PR introduce _any_ user-facing change? This PR introduces new `GravitonClient` and `GravitonMetalake` interfaces. ### How was this patch tested? Add new UTs to cover the code.
Add a minimal NOTICE file so any content from 3rd party ALvL NOTICE files or required 3rd party notices can be placed here.
### What changes were proposed in this pull request? This PR adds the support of catalog rest implementation for Graviton. ### Why are the changes needed? With this, users could issue REST requests to manipulate catalogs. Fix: #72 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Add UTs to test.
Set contributing expectations for external contributors.
…talog (#74) ### What changes were proposed in this pull request? This HiveCatalog will be used for `SupportsNamespaces` interface to implement create/drop/alter namespaces in the hive. ### Why are the changes needed? 1. Creating a graviton namespace equivalent to creating a hive database. 2. Dropping a graviton namespace equivalent to dropping a hive database. 3. Altering a graviton namespace properties equivalent to Altering a hive database properties. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Added `HiveNamespaceTest` unit test.
1359824
to
2faeed5
Compare
### What changes were proposed in this pull request? 1. MySQL bind 0.0.0.0 2. create iceberg user and grant all priviges ### Why are the changes needed? Iceberg JDBC IT need access mysql Fix: #508 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. MySQL client out of docker could connect MySQL server using `iceberg` user
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Show resolved
Hide resolved
…nfigurations that will be passed by to specific engines (#510) ### What changes were proposed in this pull request? Make sure Gravition configuration like `a.b` will overwrite configurations(`gravition.passby.a.b`) that will be passby to `Hive` ### Why are the changes needed? The Graviton configuration has a higher priority, so it should overwrite the pass-by configurations. Fix: #509 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? UT
…ata (#486) ### What changes were proposed in this pull request? Implement catalogPropertiesMetadata、tablePropertiesMetadata for IcebergCatalogOperations ### Why are the changes needed? Currently catalogPropertiesMetadatatablePropertiesMetadata will return an empty map, and we need to implement it to return a real value for the iceberg property metadata. Fix: #446 ### Does this PR introduce any user-facing change? N/A ### How was this patch tested? Add test testCatalogProperty in TestIcebergCatalog Add test testTableProperty in TestIcebergTable
… (#515) ### What changes were proposed in this pull request? Add UT to test restart and reopen for `RocksDBBackend` ### Why are the changes needed? Fully test `RocksDBBackend` to verify if it works well. Fix: #514 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? UT
@sandflee Can you please help to review this, thanks. |
2faeed5
to
28c148a
Compare
...rc/main/java/com/datastrato/graviton/catalog/lakehouse/iceberg/IcebergCatalogOperations.java
Outdated
Show resolved
Hide resolved
...se-iceberg/src/main/java/com/datastrato/graviton/catalog/lakehouse/iceberg/IcebergTable.java
Show resolved
Hide resolved
…eIT (#493) ### What changes were proposed in this pull request? add hive catalog to IcebergRESTServiceIT 1. custom graviton config file with different Iceberg catalog types 2. The hive catalog warehouse location is using localfs to bypass HDFS 3. unify test namespace to `iceberg_rest_` prefix, to drop all test namespace and tables before each test. ### Why are the changes needed? Part of: #480 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. existing UTs 4. HiveCatalog UTs
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Show resolved
Hide resolved
28c148a
to
32bb821
Compare
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Show resolved
Hide resolved
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Outdated
Show resolved
Hide resolved
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Show resolved
Hide resolved
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Show resolved
Hide resolved
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Show resolved
Hide resolved
f4bf989
to
2ef2657
Compare
bin/common.sh
Outdated
export DEFAULT_MYSQL_VERSION=8.0.15 | ||
export GET_MYSQL_JAR=true | ||
|
||
function download_mysql_jdbc_jar(){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only support mysql, what if user want to use PostgreSQL, how to enable it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will mention in the documentation that PostgreSQL users need to place the JAR file in the classpath themselves. Additionally, it seems that the PostgreSQL library can be directly downloaded and used. Does it have any copyright issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PG's license is compatible with Apache v2.
Basically, I'm thinking of the necessity to download MySQL jdbc driver automatically. If we're downloading MySQL jar automatically, why don't we also support PG, mssql, oracle, etc, right?
So I'm suggesting that we don't do this automatically in script, instead we write this in detail in the document to let user do it manually.
What do you think @FANNG1 @Clearvive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to not download MYSQL jdbc drivers automatically. by the way , Iceberg using sqllite
to test JdbcCatalog, no need to download Mysql or PG drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use SQLite instead, and write clearly in the document to tell user how to install different jdbc drivers.
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Outdated
Show resolved
Hide resolved
...ava/com/datastrato/graviton/integration/test/catalog/lakehouse/iceberg/CatalogIcebergIT.java
Outdated
Show resolved
Hide resolved
2c0bc34
to
a79aa71
Compare
What changes were proposed in this pull request?
Integrate testing using Graviton Server, used jdbcCatalog for end-to-end validation.
Why are the changes needed?
Issues: #483
Does this PR introduce any user-facing change?
No
How was this patch tested?
CatalogIcebergIT