Skip to content

Commit

Permalink
[#4954] docs(hudi-catalog): Add docs for Hudi catalog (#4976)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Add docs for Hudi catalog

### Why are the changes needed?

Fix: #4954

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

no need
  • Loading branch information
mchades authored Oct 14, 2024
1 parent 2b8687a commit 67b7be0
Show file tree
Hide file tree
Showing 2 changed files with 116 additions and 0 deletions.
110 changes: 110 additions & 0 deletions docs/lakehouse-hudi-catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: "Hudi catalog"
slug: /lakehouse-hudi-catalog
keywords:
- lakehouse
- hudi
- metadata
license: "This software is licensed under the Apache License version 2."
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

## Introduction

Apache Gravitino provides the ability to manage Apache Hudi metadata.

### Requirements and limitations

:::info
Tested and verified with Apache Hudi `0.15.0`.
:::

## Catalog

### Catalog capabilities

- Works as a catalog proxy, supporting `HMS` as catalog backend.
- Only support read operations (list and load) for Hudi schemas and tables.
- Doesn't support timeline management operations now.

### Catalog properties

| Property name | Description | Default value | Required | Since Version |
|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------|
| `catalog-backend` | Catalog backend of Gravitino Hudi catalog. Only supports `hms` now. | (none) | Yes | 0.7.0-incubating |
| `uri` | The URI associated with the backend. Such as `thrift://127.0.0.1:9083` for HMS backend. | (none) | Yes | 0.7.0-incubating |
| `client.pool-size` | For HMS backend. The maximum number of Hive metastore clients in the pool for Gravitino. | 1 | No | 0.7.0-incubating |
| `client.pool-cache.eviction-interval-ms` | For HMS backend. The cache pool eviction interval. | 300000 | No | 0.7.0-incubating |
| `gravitino.bypass.` | Property name with this prefix passed down to the underlying backend client for use. Such as `gravitino.bypass.hive.metastore.failure.retries = 3` indicate 3 times of retries upon failure of Thrift metastore calls for HMS backend. | (none) | No | 0.7.0-incubating |

### Catalog operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#catalog-operations) for more details.

## Schema

### Schema capabilities

- Only support read operations: listSchema, loadSchema, and schemaExists.

### Schema properties

- The `Location` is an optional property that shows the storage path to the Hudi database

### Schema operations

Only support read operations: listSchema, loadSchema, and schemaExists.
Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#schema-operations) for more details.

## Table

### Table capabilities

- Only support read operations: listTable, loadTable, and tableExists.

### Table partitions

- Support loading Hudi partitioned tables (Hudi only supports identity partitioning).

### Table sort orders

- Doesn't support table sort orders.

### Table distributions

- Doesn't support table distributions.

### Table indexes

- Doesn't support table indexes.

### Table properties

- For HMS backend, it will bring out all the table parameters from the HMS.

### Table column types

The following table shows the mapping between Gravitino and [Apache Hudi column types](https://hudi.apache.org/docs/sql_ddl#supported-types):

| Gravitino Type | Apache Hudi Type |
|----------------|------------------|
| `boolean` | `boolean` |
| `integer` | `int` |
| `long` | `long` |
| `date` | `date` |
| `timestamp` | `timestamp` |
| `float` | `float` |
| `double` | `double` |
| `string` | `string` |
| `decimal` | `decimal` |
| `binary` | `bytes` |
| `array` | `array` |
| `map` | `map` |
| `struct` | `struct` |

### Table operations

Only support read operations: listTable, loadTable, and tableExists.
Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#table-operations) for more details.
6 changes: 6 additions & 0 deletions docs/manage-relational-metadata-using-gravitino.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ For more details, please refer to the related doc.
- [**Apache Doris**](./jdbc-doris-catalog.md)
- [**Apache Iceberg**](./lakehouse-iceberg-catalog.md)
- [**Apache Paimon**](./lakehouse-paimon-catalog.md)
- [**Apache Hudi**](./lakehouse-hudi-catalog.md)

Assuming:

Expand Down Expand Up @@ -93,6 +94,7 @@ Currently, Gravitino supports the following catalog providers:
| `hive` | [Hive catalog property](./apache-hive-catalog.md#catalog-properties) |
| `lakehouse-iceberg` | [Iceberg catalog property](./lakehouse-iceberg-catalog.md#catalog-properties) |
| `lakehouse-paimon` | [Paimon catalog property](./lakehouse-paimon-catalog.md#catalog-properties) |
| `lakehouse-hudi` | [Hudi catalog property](./lakehouse-hudi-catalog.md#catalog-properties) |
| `jdbc-mysql` | [MySQL catalog property](./jdbc-mysql-catalog.md#catalog-properties) |
| `jdbc-postgresql` | [PostgreSQL catalog property](./jdbc-postgresql-catalog.md#catalog-properties) |
| `jdbc-doris` | [Doris catalog property](./jdbc-doris-catalog.md#catalog-properties) |
Expand Down Expand Up @@ -326,6 +328,7 @@ Currently, Gravitino supports the following schema property:
| `hive` | [Hive schema property](./apache-hive-catalog.md#schema-properties) |
| `lakehouse-iceberg` | [Iceberg scheme property](./lakehouse-iceberg-catalog.md#schema-properties) |
| `lakehouse-paimon` | [Paimon scheme property](./lakehouse-paimon-catalog.md#schema-properties) |
| `lakehouse-hudi` | [Hudi scheme property](./lakehouse-hudi-catalog.md#schema-properties) |
| `jdbc-mysql` | [MySQL schema property](./jdbc-mysql-catalog.md#schema-properties) |
| `jdbc-postgresql` | [PostgreSQL schema property](./jdbc-postgresql-catalog.md#schema-properties) |
| `jdbc-doris` | [Doris schema property](./jdbc-doris-catalog.md#schema-properties) |
Expand Down Expand Up @@ -807,6 +810,7 @@ The following is a table of the column default value that Gravitino supports for
| `hive` | ✘ |
| `lakehouse-iceberg` | ✘ |
| `lakehouse-paimon` | ✘ |
| `lakehouse-hudi` | ✘ |
| `jdbc-mysql` | ✔ |
| `jdbc-postgresql` | ✔ |

Expand All @@ -820,6 +824,7 @@ The following table shows the column auto-increment that Gravitino supports for
| `hive` | ✘ |
| `lakehouse-iceberg` | ✘ |
| `lakehouse-paimon` | ✘ |
| `lakehouse-hudi` | ✘ |
| `jdbc-mysql` | ✔([limitations](./jdbc-mysql-catalog.md#table-column-auto-increment)) |
| `jdbc-postgresql` | ✔ |

Expand All @@ -832,6 +837,7 @@ The following is the table property that Gravitino supports:
| `hive` | [Hive table property](./apache-hive-catalog.md#table-properties) | [Hive type mapping](./apache-hive-catalog.md#table-column-types) |
| `lakehouse-iceberg` | [Iceberg table property](./lakehouse-iceberg-catalog.md#table-properties) | [Iceberg type mapping](./lakehouse-iceberg-catalog.md#table-column-types) |
| `lakehouse-paimon` | [Paimon table property](./lakehouse-paimon-catalog.md#table-properties) | [Paimon type mapping](./lakehouse-paimon-catalog.md#table-column-types) |
| `lakehouse-hudi` | [Hudi table property](./lakehouse-hudi-catalog.md#table-properties) | [Hudi type mapping](./lakehouse-hudi-catalog.md#table-column-types) |
| `jdbc-mysql` | [MySQL table property](./jdbc-mysql-catalog.md#table-properties) | [MySQL type mapping](./jdbc-mysql-catalog.md#table-column-types) |
| `jdbc-postgresql` | [PostgreSQL table property](./jdbc-postgresql-catalog.md#table-properties) | [PostgreSQL type mapping](./jdbc-postgresql-catalog.md#table-column-types) |
| `doris` | [Doris table property](./jdbc-doris-catalog.md#table-properties) | [Doris type mapping](./jdbc-doris-catalog.md#table-column-types) |
Expand Down

0 comments on commit 67b7be0

Please sign in to comment.