title | slug | date | keyword | license |
---|---|---|---|---|
Manage table partition using Gravitino |
/manage-table-partition-using-gravitino |
2024-02-03 |
table partition management |
Copyright 2024 Datastrato Pvt Ltd. This software is licensed under the Apache License version 2. |
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
Although many catalogs inherently manage partitions automatically, there are scenarios where manual partition management is necessary. Usage scenarios like managing the TTL (Time-To-Live) of partition data, gathering statistics on partition metadata, and optimizing queries through partition pruning. For these reasons, Gravitino provides capabilities of partition management.
- Partition management is based on the partitioned table, so please ensure that you are operating on a partitioned table.
The following table shows the partition operations supported across various catalogs in Gravitino:
Operation | Hive catalog | Iceberg catalog | Jdbc-Mysql catalog | Jdbc-PostgreSQL catalog |
---|---|---|---|---|
Add Partition | ✔ | ✘ | ✘ | ✘ |
Get Partition by Name | ✔ | ✘ | ✘ | ✘ |
List Partition Names | ✔ | ✘ | ✘ | ✘ |
List Partitions | ✔ | ✘ | ✘ | ✘ |
Drop Partition | 🚀(Coming Soon) | 🚀(Coming Soon) | ✘ | ✘ |
:::tip[WELCOME FEEDBACK] If you need additional partition management support for a specific catalog, please feel free to create an issue on the Gravitino repository. :::
You must match the partition types you want to add with the table's partitioning types; Gravitino currently supports adding the following partition types:
Partition Type | Description |
---|---|
identity | An identity partition represents a result of identity partitioning. |
range | A range partition represents a result of range partitioning. |
list | A list partition represents a result of list partitioning. |
For JSON examples:
{
"type": "identity",
"name": "dt=2008-08-08/country=us",
"fieldNames": [
[
"dt"
],
[
"country"
]
],
"values": [
{
"type": "literal",
"dataType": "date",
"value": "2008-08-08"
},
{
"type": "literal",
"dataType": "string",
"value": "us"
}
]
}
:::note
The values of the field values
must be the same ordering as the values of fieldNames
.
When adding an identity partition to a partitioned Hive table, the specified partition name is ignored. This is because Hive generates the partition name based on field names and values. :::
{
"type": "range",
"name": "p20200321",
"upper": {
"type": "literal",
"dataType": "date",
"value": "2020-03-21"
},
"lower": {
"type": "literal",
"dataType": "null",
"value": "null"
}
}
{
"type": "list",
"name": "p202204_California",
"lists": [
[
{
"type": "literal",
"dataType": "date",
"value": "2022-04-01"
},
{
"type": "literal",
"dataType": "string",
"value": "Los Angeles"
}
],
[
{
"type": "literal",
"dataType": "date",
"value": "2022-04-01"
},
{
"type": "literal",
"dataType": "string",
"value": "San Francisco"
}
]
]
}
:::note Each list in the lists must have the same length. The values in each list must correspond to the field definitions in the list partitioning. :::
For Java examples:
Partition partition =
Partitions.identity(
"dt=2008-08-08/country=us",
new String[][] {{"dt"}, {"country"}},
new Literal[] {
Literals.dateLiteral(LocalDate.parse("2008-08-08")), Literals.stringLiteral("us")
},
Maps.newHashMap());
:::note The values are in the same order as the field names.
When adding an identity partition to a partitioned Hive table, the specified partition name is ignored. This is because Hive generates the partition name based on field names and values. :::
Partition partition =
Partitions.range(
"p20200321",
Literals.dateLiteral(LocalDate.parse("2020-03-21")),
Literals.NULL,
Maps.newHashMap());
Partition partition =
Partitions.list(
"p202204_California",
new Literal[][] {
{
Literals.dateLiteral(LocalDate.parse("2022-04-01")),
Literals.stringLiteral("Los Angeles")
},
{
Literals.dateLiteral(LocalDate.parse("2022-04-01")),
Literals.stringLiteral("San Francisco")
}
},
Maps.newHashMap());
:::note Each list in the lists must have the same length. The values in each list must correspond to the field definitions in the list partitioning. :::
You can add a partition to a partitioned table by sending a POST
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions
endpoint or by using the Gravitino Java client.
The following is an example of adding a identity partition to a Hive partitioned table:
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"partitions": [
{
"type": "identity",
"fieldNames": [
[
"dt"
],
[
"country"
]
],
"values": [
{
"type": "literal",
"dataType": "date",
"value": "2008-08-08"
},
{
"type": "literal",
"dataType": "string",
"value": "us"
}
]
}
]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions
GravitinoClient gravitinoClient = GravitinoClient
.builder("http://127.0.0.1:8090")
.withMetalake("metalake")
.build();
// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition addedPartition =
gravitinoClient
.loadCatalog(NameIdentifier.of("metalake", "catalog"))
.asTableCatalog()
.loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
.supportPartitions()
.addPartition(
Partitions.identity(
new String[][] {{"dt"}, {"country"}},
new Literal[] {
Literals.dateLiteral(LocalDate.parse("2008-08-08")), Literals.stringLiteral("us")},
Maps.newHashMap()));
You can get a partition by its name via sending a GET
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions/{partition_name}
endpoint or by using the Gravitino Java client.
The following is an example of getting a partition by its name:
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions/p20200321
:::tip
If the partition name contains special characters, you should use URL encoding. For example, if the partition name is dt=2008-08-08/country=us
you should use dt%3D2008-08-08%2Fcountry%3Dus
in the URL.
:::
GravitinoClient gravitinoClient = GravitinoClient
.builder("http://127.0.0.1:8090")
.withMetalake("metalake")
.build();
// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition Partition =
gravitinoClient
.loadCatalog(NameIdentifier.of("metalake", "catalog"))
.asTableCatalog()
.loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
.supportPartitions()
.getPartition("partition_name");
You can list all partition names under a partitioned table by sending a GET
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions
endpoint or by using the Gravitino Java client.
The following is an example of listing all partition names under a partitioned table:
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions
GravitinoClient gravitinoClient = GravitinoClient
.builder("http://127.0.0.1:8090")
.withMetalake("metalake")
.build();
// Assume that you have a partitioned table named "metalake.catalog.schema.table".
String[] partitionNames =
gravitinoClient
.loadCatalog(NameIdentifier.of("metalake", "catalog"))
.asTableCatalog()
.loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
.supportPartitions()
.listPartitionNames();
If you want to get more detailed information about the partitions under a partitioned table, you can list all partitions under a partitioned table by sending a GET
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions
endpoint or by using the Gravitino Java client.
The following is an example of listing all partitions under a partitioned table:
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions?details=true
// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition[] partitions =
gravitinoClient
.loadCatalog(NameIdentifier.of("metalake", "catalog"))
.asTableCatalog()
.loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
.supportPartitions()
.listPartitions();