From 4477757a8ba3a43ac7d002c3a4021d19ec9d69ff Mon Sep 17 00:00:00 2001 From: Qi Yu Date: Tue, 2 Jan 2024 23:09:27 +0800 Subject: [PATCH] [#1135] improvement(docs): Add docs about tables advanced feature like partitioning (#1203) ### What changes were proposed in this pull request? Add docs about the details of table partitioning, bucketing, and sorting order. ### Why are the changes needed? The document is mandatory for users. Fix: #1135 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? N/A --------- Co-authored-by: Jerry Shao --- docs/manage-metadata-using-gravitino.md | 220 +++----------- ...table-partitioning-bucketing-sort-order.md | 287 ++++++++++++++++++ 2 files changed, 333 insertions(+), 174 deletions(-) create mode 100644 docs/table-partitioning-bucketing-sort-order.md diff --git a/docs/manage-metadata-using-gravitino.md b/docs/manage-metadata-using-gravitino.md index d968f2d622a..0dd7e38158d 100644 --- a/docs/manage-metadata-using-gravitino.md +++ b/docs/manage-metadata-using-gravitino.md @@ -31,9 +31,9 @@ You can create a metalake by sending a `POST` request to the `/api/metalakes` en The following is an example of creating a metalake: - + -```bash +```shell curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{"name":"metalake","comment":"comment","properties":{}}' \ http://localhost:8090/api/metalakes @@ -61,9 +61,9 @@ GravitinoMetaLake newMetalake = gravitinoClient.createMetalake( You can create a metalake by sending a `GET` request to the `/api/metalakes/{metalake_name}` endpoint or just use the Gravitino Java client. The following is an example of loading a metalake: - + -```bash +```shell curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" http://localhost:8090/api/metalakes/metalake ``` @@ -86,9 +86,9 @@ GravitinoMetaLake loaded = gravitinoClient.loadMetalake( You can modify a metalake by sending a `PUT` request to the `/api/metalakes/{metalake_name}` endpoint or just use the Gravitino Java client. The following is an example of altering a metalake: - + -```bash +```shell curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "updates": [ @@ -136,9 +136,9 @@ Currently, Gravitino supports the following changes to a metalake: You can remove a metalake by sending a `DELETE` request to the `/api/metalakes/{metalake_name}` endpoint or just use the Gravitino Java client. The following is an example of dropping a metalake: - + -```bash +```shell curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" http://localhost:8090/api/metalakes/metalake ``` @@ -166,9 +166,9 @@ Drop a metalake only removes metadata about the metalake and catalogs, schemas, You can list metalakes by sending a `GET` request to the `/api/metalakes` endpoint or just use the Gravitino Java client. The following is an example of listing all metalake name: - + -```bash +```shell curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" http://localhost:8090/api/metalakes ``` @@ -198,9 +198,9 @@ The code below is an example of creating a Hive catalog. For other catalogs, the You can create a catalog by sending a `POST` request to the `/api/metalakes/{metalake_name}/catalogs` endpoint or just use the Gravitino Java client. The following is an example of creating a catalog: - + -```bash +```shell curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "name": "catalog", @@ -256,9 +256,9 @@ Currently, Gravitino supports the following catalog providers: You can load a catalog by sending a `GET` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}` endpoint or just use the Gravitino Java client. The following is an example of loading a catalog: - + -```bash +```shell curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" http://localhost:8090/api/metalakes/metalake/catalogs/catalog ``` @@ -284,9 +284,9 @@ Catalog catalog = gravitinoMetaLake.loadCatalog(NameIdentifier.of("metalake", "c You can modify a catalog by sending a `PUT` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}` endpoint or just use the Gravitino Java client. The following is an example of altering a catalog: - + -```bash +```shell curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "updates": [ @@ -334,9 +334,9 @@ Currently, Gravitino supports the following changes to a catalog: You can remove a catalog by sending a `DELETE` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}` endpoint or just use the Gravitino Java client. The following is an example of dropping a catalog: - + -```bash +```shell curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" \ http://localhost:8090/api/metalakes/metalake/catalogs/catalog @@ -368,9 +368,9 @@ You can list all catalogs under a metalake by sending a `GET` request to the `/a a metalake: - + -```bash +```shell curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" \ http://localhost:8090/api/metalakes/metalake/catalogs @@ -403,9 +403,9 @@ Users should create a metalake and a catalog before creating a schema. You can create a schema by sending a `POST` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas` endpoint or just use the Gravitino Java client. The following is an example of creating a schema: - + -```bash +```shell curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "name": "schema", @@ -460,9 +460,9 @@ Currently, Gravitino supports the following schema property: You can create a schema by sending a `GET` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}` endpoint or just use the Gravitino Java client. The following is an example of loading a schema: - + -```bash +```shell curl -X GET \-H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" \ http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema @@ -488,9 +488,9 @@ Schema schema = supportsSchemas.loadSchema(NameIdentifier.of("metalake", "catalo You can change a schema by sending a `PUT` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}` endpoint or just use the Gravitino Java client. The following is an example of modifying a schema: - + -```bash +```shell curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "updates": [ @@ -536,9 +536,9 @@ Currently, Gravitino supports the following changes to a schema: You can remove a schema by sending a `DELETE` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}` endpoint or just use the Gravitino Java client. The following is an example of dropping a schema: - + -```bash +```shell // cascade can be true or false curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" \ @@ -571,9 +571,9 @@ You can alter all schemas under a catalog by sending a `GET` request to the `/ap - + -```bash +```shell curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas ``` @@ -604,9 +604,9 @@ Users should create a metalake, a catalog and a schema before creating a table. You can create a table by sending a `POST` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables` endpoint or just use the Gravitino Java client. The following is an example of creating a table: - + -```bash +```shell curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "name": "table", @@ -730,142 +730,14 @@ The following is the table property that Gravitino supports: In addition to the basic settings, Gravitino supports the following features: -| Feature | Description | Java doc | -|---------------------|----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------| -| Partitioned table | Equal to `PARTITION BY` in Apache Hive and other engine that support partitioning. | [Partition](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/dto/rel/partitions/Partitioning.html) | -| Bucketed table | Equal to `CLUSTERED BY` in Apache Hive, some engine may use different words to describe it. | [Distribution](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/distributions/Distribution.html) | -| Sorted order table | Equal to `SORTED BY` in Apache Hive, some engine may use different words to describe it. | [SortOrder](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/sorts/SortOrder.html) | +| Feature | Description | Java doc | +|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------| +| Table partitioning | Equal to `PARTITION BY` in Apache Hive, It is a partitioning strategy that is used to split a table into parts based on partition keys. Some table engine may not support this feature | [Partition](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/dto/rel/partitions/Partitioning.html) | +| Table bucketing | Equal to `CLUSTERED BY` in Apache Hive, Bucketing a.k.a (Clustering) is a technique to split the data into more manageable files/parts, (By specifying the number of buckets to create). The value of the bucketing column will be hashed by a user-defined number into buckets. | [Distribution](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/distributions/Distribution.html) | +| Table sort ordering | Equal to `SORTED BY` in Apache Hive, sort ordering is a method to sort the data by specific ways such as by a column or a function and then store table data. it will highly improve the query performance under certain scenarios. | [SortOrder](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/sorts/SortOrder.html) | -:::tip -**Not all catalogs may support those features.**. Please refer to the related document for more details. -::: - -The following is an example of creating a partitioned, bucketed table and sorted order table: - - - - -```bash -curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ --H "Content-Type: application/json" -d '{ - "name": "table", - "columns": [ - { - "name": "id", - "type": "integer", - "nullable": true, - "comment": "Id of the user" - }, - { - "name": "name", - "type": "varchar(2000)", - "nullable": true, - "comment": "Name of the user" - }, - { - "name": "age", - "type": "short", - "nullable": true, - "comment": "Age of the user" - }, - { - "name": "score", - "type": "double", - "nullable": true, - "comment": "Score of the user" - } - ], - "comment": "Create a new Table", - "properties": { - "format": "ORC" - }, - "partitioning": [ - { - "strategy": "identity", - "fieldName": ["score"] - } - ], - "distribution": { - "strategy": "hash", - "number": 4, - "funcArgs": [ - { - "type": "field", - "fieldName": ["score"] - } - ] - }, - "sortOrders": [ - { - "direction": "asc", - "nullOrder": "NULLS_LAST", - "sortTerm": { - "type": "field", - "fieldName": ["name"] - } - } - ] -}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables -``` - - - -```java -tableCatalog.createTable( - NameIdentifier.of("metalake", "hive_catalog", "schema", "table"), - new ColumnDTO[] { - ColumnDTO.builder() - .withComment("Id of the user") - .withName("id") - .withDataType(Types.IntegerType.get()) - .withNullable(true) - .build(), - ColumnDTO.builder() - .withComment("Name of the user") - .withName("name") - .withDataType(Types.VarCharType.of(1000)) - .withNullable(true) - .build(), - ColumnDTO.builder() - .withComment("Age of the user") - .withName("age") - .withDataType(Types.ShortType.get()) - .withNullable(true) - .build(), - - ColumnDTO.builder() - .withComment("Score of the user") - .withName("score") - .withDataType(Types.DoubleType.get()) - .withNullable(true) - .build(), - }, - "Create a new Table", - tablePropertiesMap, - new Transform[] { - // Partition by id - Transforms.identity("score") - }, - // CLUSTERED BY id - new DistributionDTO.Builder() - .withStrategy(Strategy.HASH) - .withNumber(4) - .withArgs(FieldReferenceDTO.of("id")) - .build(), - // SORTED BY name asc - new SortOrderDTO[] { - new SortOrderDTO.Builder() - .withDirection(SortDirection.ASCENDING) - .withNullOrder(NullOrdering.NULLS_LAST) - .withSortTerm(FieldReferenceDTO.of("name")) - .build() - } - ); -``` - - - +For more information, please see the related document on [partitioning, bucketing, and sorting](table-partitioning-bucketing-sort-order.md). :::note The code above is an example of creating a Hive table. For other catalogs, the code is similar, but the supported column type, table properties may be different. For more details, please refer to the related doc. @@ -876,9 +748,9 @@ The code above is an example of creating a Hive table. For other catalogs, the c You can load a table by sending a `GET` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{table_name}` endpoint or just use the Gravitino Java client. The following is an example of loading a table: - + -```bash +```shell curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" \ http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table @@ -905,9 +777,9 @@ tableCatalog.loadTable(NameIdentifier.of("metalake", "hive_catalog", "schema", " You can modify a table by sending a `PUT` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{table_name}` endpoint or just use the Gravitino Java client. The following is an example of modifying a table: - + -```bash +```shell curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "updates": [ @@ -962,9 +834,9 @@ Currently, Gravitino supports the following changes to a table: You can remove a table by sending a `DELETE` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{table_name}` endpoint or just use the Gravitino Java client. The following is an example of dropping a table: - + -```bash +```shell ## purge can be true or false, if purge is true, Gravitino will remove the data of the table. curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \ @@ -1001,9 +873,9 @@ Apache Hive support both, `dropTable` will only remove the metadata of a table a You can list all tables in a schema by sending a `GET` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables` endpoint or just use the Gravitino Java client. The following is an example of list all tables in a schema: - + -```bash +```shell curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" \ http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables diff --git a/docs/table-partitioning-bucketing-sort-order.md b/docs/table-partitioning-bucketing-sort-order.md new file mode 100644 index 00000000000..638e32f9a73 --- /dev/null +++ b/docs/table-partitioning-bucketing-sort-order.md @@ -0,0 +1,287 @@ +--- +title: "Table partitioning, bucketing and sort ordering" +slug: /table-partitioning-bucketing-sort-order +date: 2023-12-25 +keyword: Table Partition Bucket Distribute Sort By +license: Copyright 2023 Datastrato Pvt Ltd. This software is licensed under the Apache License version 2. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## Table partitioning + +To create a partitioned table, you should provide the following two components to construct a valid partitioned table. + +- Partitioning strategy. It defines how Gravitino will distribute table data across partitions. currently Gravitino supports the following partitioning strategies. + +:::note +The `score`, `createTime`, and `city` appearing in the table below refer to the field names in a table. +::: + +| Partitioning strategy | Description | JSON example | Java example | Equivalent SQL semantics | +|-----------------------|----------------------------------------------------------------|------------------------------------------------------------------|--------------------------------------------------------|---------------------------------------| +| `identity` | Source value, unmodified. | `{"strategy":"identity","fieldName":["score"]}` | `Transforms.identity("score")` | `PARTITION BY score` | +| `hour` | Extract a timestamp hour, as hours from '1970-01-01 00:00:00'. | `{"strategy":"hour","fieldName":["createTime"]}` | `Transforms.hour("createTime")` | `PARTITION BY hour(createTime)` | +| `day` | Extract a date or timestamp day, as days from '1970-01-01'. | `{"strategy":"day","fieldName":["createTime"]}` | `Transforms.day("createTime")` | `PARTITION BY day(createTime)` | +| `month` | Extract a date or timestamp month, as months from '1970-01-01' | `{"strategy":"month","fieldName":["createTime"]}` | `Transforms.month("createTime")` | `PARTITION BY month(createTime)` | +| `year` | Extract a date or timestamp year, as years from 1970. | `{"strategy":"year","fieldName":["createTime"]}` | `Transforms.year("createTime")` | `PARTITION BY year(createTime)` | +| `bucket[N]` | Hash of value, mod N. | `{"strategy":"bucket","numBuckets":10,"fieldNames":[["score"]]}` | `Transforms.bucket(10, "score")` | `PARTITION BY bucket(10, score)` | +| `truncate[W]` | Value truncated to width W. | `{"strategy":"truncate","width":20,"fieldName":["score"]}` | `Transforms.truncate(20, "score")` | `PARTITION BY truncate(20, score)` | +| `list` | Partition the table by a list value. | `{"strategy":"list","fieldNames":[["createTime"],["city"]]}` | `Transforms.list(new String[] {"createTime", "city"})` | `PARTITION BY list(createTime, city)` | +| `range` | Partition the table by a range value. | `{"strategy":"range","fieldName":["createTime"]}` | `Transforms.range("createTime")` | `PARTITION BY range(createTime)` | + +As well as the strategies mentioned before, you can use other functions strategies to partition the table, for example, the strategy can be `{"strategy":"functionName","fieldName":["score"]}`. The `functionName` can be any function name that you can use in SQL, for example, `{"strategy":"toDate","fieldName":["createTime"]}` is equivalent to `PARTITION BY toDate(createTime)` in SQL. +For complex functions, please refer to [FunctionPartitioningDTO](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/partitions/FunctionPartitioningDTO.java). + +- Field names: It defines which fields Gravitino uses to partition the table. + +- Other messages may also be needed. For example, if the partitioning strategy is `bucket`, you should provide the number of buckets; if the partitioning strategy is `truncate`, you should provide the width of the truncate. + +The following is an example of creating a partitioned table: + + + + +```json +[ + { + "strategy": "identity", + "fieldName": [ + "score" + ] + } +] +``` + + + + +```java +new Transform[] { + // Partition by score + Transforms.identity("score") + } +``` + + + + + +## Table bucketing + +To create a bucketed table, you should use the following three components to construct a valid bucketed table. + +- Strategy. It defines how Gravitino will distribute table data across partitions. + +| Bucket strategy | Description | JSON | Java | +|-----------------|-------------------------------------------------------------------------------------------------------------------------------|----------|------------------| +| hash | Bucket table using hash. Gravitino will distribute table data into buckets based on the hash value of the key. | `hash` | `Strategy.HASH` | +| range | Bucket table using range. Gravitino will distribute table data into buckets based on a specified range or interval of values. | `range` | `Strategy.RANGE` | +| even | Bucket table using even. Gravitino will distribute table data, ensuring an equal distribution of data. | `even` | `Strategy.EVEN` | + +- Number. It defines how many buckets you use to bucket the table. +- Function arguments. It defines the arguments of the strategy above, Gravitino supports the following three kinds of arguments, for more, you can refer to Java class [FunctionArg](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/expressions/FunctionArg.java) and [DistributionDTO](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/DistributionDTO.java) to use more complex function arguments. + +| Expression type | JSON example | Java example | Equivalent SQL semantics | Description | +|-----------------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------|-----------------------------------| +| field | `{"type":"field","fieldName":["score"]}` | `FieldReferenceDTO.of("score")` | `score` | The field reference value `score` | +| function | `{"type":"function","functionName":"hour","fieldName":["dt"]}` | `new FuncExpressionDTO.Builder().withFunctionName("hour").withFunctionArgs("dt").build()` | `hour(dt)` | The function value `hour(dt)` | +| constant | `{"type":"literal","value":10, "dataType": "integer"}` | `new LiteralDTO.Builder().withValue("10").withDataType(Types.IntegerType.get()).build()` | `10` | The integer literal `10` | + + + + + +```json +{ + "strategy": "hash", + "number": 4, + "funcArgs": [ + { + "type": "field", + "fieldName": ["score"] + } + ] +} +``` + + + + +```java +Distributions.of(Strategy.HASH, 4, NamedReference.field("score")); +``` + + + + + +## Sort ordering + +To define a sorted order table, you should use the following three components to construct a valid sorted order table. + +- Direction. It defines in which direction Gravitino sorts the table. The default value is `ascending`. + +| Direction | Description | JSON | Java | +|------------|---------------------------------------------| ------ | -------------------------- | +| ascending | Sorted by a field or a function ascending. | `asc` | `SortDirection.ASCENDING` | +| descending | Sorted by a field or a function descending. | `desc` | `SortDirection.DESCENDING` | + +- Null ordering. It describes how to handle null values when ordering + +| Null ordering Type | Description | JSON | Java | +|--------------------|-----------------------------------------| ------------- | -------------------------- | +| null_first | Puts the null value in the first place. | `nulls_first` | `NullOrdering.NULLS_FIRST` | +| null_last | Puts the null value in the last place. | `nulls_last` | `NullOrdering.NULLS_LAST` | + +Note: If the direction value is `ascending`, the default ordering value is `nulls_first` and if the direction value is `descending`, the default ordering value is `nulls_last`. + +- Sort term. It shows which field or function Gravitino uses to sort the table, please refer to the `Function arguments` in the table bucketing section. + + + + +```json + { + "direction": "asc", + "nullOrder": "NULLS_LAST", + "sortTerm": { + "type": "field", + "fieldName": ["score"] + } +} +``` + + + + +```java +SortOrders.of(FieldReferenceDTO.of("score"), SortDirection.ASCENDING, NullOrdering.NULLS_LAST); +``` + + + + + +:::tip +**Not all catalogs may support those features.**. Please refer to the related document for more details. +::: + +The following is an example of creating a partitioned, bucketed table, and sorted order table: + + + + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "table", + "columns": [ + { + "name": "id", + "type": "integer", + "nullable": true, + "comment": "Id of the user" + }, + { + "name": "name", + "type": "varchar(2000)", + "nullable": true, + "comment": "Name of the user" + }, + { + "name": "age", + "type": "short", + "nullable": true, + "comment": "Age of the user" + }, + { + "name": "score", + "type": "double", + "nullable": true, + "comment": "Score of the user" + } + ], + "comment": "Create a new Table", + "properties": { + "format": "ORC" + }, + "partitioning": [ + { + "strategy": "identity", + "fieldName": ["score"] + } + ], + "distribution": { + "strategy": "hash", + "number": 4, + "funcArgs": [ + { + "type": "field", + "fieldName": ["score"] + } + ] + }, + "sortOrders": [ + { + "direction": "asc", + "nullOrder": "NULLS_LAST", + "sortTerm": { + "type": "field", + "fieldName": ["name"] + } + } + ] +}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables +``` + + + + +```java +tableCatalog.createTable( + NameIdentifier.of("metalake", "hive_catalog", "schema", "table"), + new ColumnDTO[] { + ColumnDTO.builder() + .withComment("Id of the user") + .withName("id") + .withDataType(Types.IntegerType.get()) + .withNullable(true) + .build(), + ColumnDTO.builder() + .withComment("Name of the user") + .withName("name") + .withDataType(Types.VarCharType.of(1000)) + .withNullable(true) + .build(), + ColumnDTO.builder() + .withComment("Age of the user") + .withName("age") + .withDataType(Types.ShortType.get()) + .withNullable(true) + .build(), + ColumnDTO.builder() + .withComment("Score of the user") + .withName("score") + .withDataType(Types.DoubleType.get()) + .withNullable(true) + .build(), + }, + "Create a new Table", + tablePropertiesMap, + new Transform[] { + // Partition by id + Transforms.identity("score") + }, + // CLUSTERED BY id + Distributions.of(Strategy.HASH, 4, NamedReference.field("id")),, + // SORTED BY name asc + new SortOrder[] { + SortOrders.of( + NamedReference.field("age"), SortDirection.ASCENDING, NullOrdering.NULLS_LAST), + }); +``` + + +