Skip to content

Latest commit

 

History

History
378 lines (312 loc) · 13 KB

manage-table-partition-using-gravitino.md

File metadata and controls

378 lines (312 loc) · 13 KB
title slug date keyword license
Manage table partition using Gravitino
/manage-table-partition-using-gravitino
2024-02-03
table partition management
Copyright 2024 Datastrato Pvt Ltd. This software is licensed under the Apache License version 2.

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Introduction

Although many catalogs inherently manage partitions automatically, there are scenarios where manual partition management is necessary. Usage scenarios like managing the TTL (Time-To-Live) of partition data, gathering statistics on partition metadata, and optimizing queries through partition pruning. For these reasons, Gravitino provides capabilities of partition management.

Requirements and limitations

  • Partition management is based on the partitioned table, so please ensure that you are operating on a partitioned table.

The following table shows the partition operations supported across various catalogs in Gravitino:

Operation Hive catalog Iceberg catalog Jdbc-Mysql catalog Jdbc-PostgreSQL catalog
Add Partition
Get Partition by Name
List Partition Names
List Partitions
Drop Partition 🚀(Coming Soon) 🚀(Coming Soon)

:::tip[WELCOME FEEDBACK] If you need additional partition management support for a specific catalog, please feel free to create an issue on the Gravitino repository. :::

Partition operations

Add partition

You must match the partition types you want to add with the table's partitioning types; Gravitino currently supports adding the following partition types:

Partition Type Description
identity An identity partition represents a result of identity partitioning.
range A range partition represents a result of range partitioning.
list A list partition represents a result of list partitioning.

For JSON examples:

{
  "type": "identity",
  "name": "dt=2008-08-08/country=us",
  "fieldNames": [
    [
      "dt"
    ],
    [
      "country"
    ]
  ],
  "values": [
    {
      "type": "literal",
      "dataType": "date",
      "value": "2008-08-08"
    },
    {
      "type": "literal",
      "dataType": "string",
      "value": "us"
    }
  ]
}

:::note The values of the field values must be the same ordering as the values of fieldNames.

When adding an identity partition to a partitioned Hive table, the specified partition name is ignored. This is because Hive generates the partition name based on field names and values. :::

{
  "type": "range",
  "name": "p20200321",
  "upper": {
    "type": "literal",
    "dataType": "date",
    "value": "2020-03-21"
  },
  "lower": {
    "type": "literal",
    "dataType": "null",
    "value": "null"
  }
}
{
  "type": "list",
  "name": "p202204_California",
  "lists": [
    [
      {
        "type": "literal",
        "dataType": "date",
        "value": "2022-04-01"
      },
      {
        "type": "literal",
        "dataType": "string",
        "value": "Los Angeles"
      }
    ],
    [
      {
        "type": "literal",
        "dataType": "date",
        "value": "2022-04-01"
      },
      {
        "type": "literal",
        "dataType": "string",
        "value": "San Francisco"
      }
    ]
  ]
}

:::note Each list in the lists must have the same length. The values in each list must correspond to the field definitions in the list partitioning. :::

For Java examples:

Partition partition =
        Partitions.identity(
            "dt=2008-08-08/country=us",
            new String[][] {{"dt"}, {"country"}},
            new Literal[] {
              Literals.dateLiteral(LocalDate.parse("2008-08-08")), Literals.stringLiteral("us")
            },
            Maps.newHashMap());

:::note The values are in the same order as the field names.

When adding an identity partition to a partitioned Hive table, the specified partition name is ignored. This is because Hive generates the partition name based on field names and values. :::

Partition partition =
        Partitions.range(
            "p20200321",
            Literals.dateLiteral(LocalDate.parse("2020-03-21")),
            Literals.NULL,
            Maps.newHashMap());
Partition partition =
        Partitions.list(
            "p202204_California",
            new Literal[][] {
              {
                Literals.dateLiteral(LocalDate.parse("2022-04-01")),
                Literals.stringLiteral("Los Angeles")
              },
              {
                Literals.dateLiteral(LocalDate.parse("2022-04-01")),
                Literals.stringLiteral("San Francisco")
              }
            },
            Maps.newHashMap());

:::note Each list in the lists must have the same length. The values in each list must correspond to the field definitions in the list partitioning. :::

You can add a partition to a partitioned table by sending a POST request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions endpoint or by using the Gravitino Java client. The following is an example of adding a identity partition to a Hive partitioned table:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "partitions": [
    {
      "type": "identity",
      "fieldNames": [
        [
          "dt"
        ],
        [
          "country"
        ]
      ],
      "values": [
        {
          "type": "literal",
          "dataType": "date",
          "value": "2008-08-08"
        },
        {
          "type": "literal",
          "dataType": "string",
          "value": "us"
        }
      ]
    }
  ]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions
GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://127.0.0.1:8090")
    .withMetalake("metalake")
    .build();

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition addedPartition = 
    gravitinoClient
        .loadCatalog(NameIdentifier.of("metalake", "catalog"))
        .asTableCatalog()
        .loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
        .supportPartitions()
        .addPartition(
            Partitions.identity(
              new String[][] {{"dt"}, {"country"}},
              new Literal[] {
              Literals.dateLiteral(LocalDate.parse("2008-08-08")), Literals.stringLiteral("us")},
              Maps.newHashMap()));

Get a partition by name

You can get a partition by its name via sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions/{partition_name} endpoint or by using the Gravitino Java client. The following is an example of getting a partition by its name:

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions/p20200321

:::tip If the partition name contains special characters, you should use URL encoding. For example, if the partition name is dt=2008-08-08/country=us you should use dt%3D2008-08-08%2Fcountry%3Dus in the URL. :::

GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://127.0.0.1:8090")
    .withMetalake("metalake")
    .build();

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition Partition = 
    gravitinoClient
        .loadCatalog(NameIdentifier.of("metalake", "catalog"))
        .asTableCatalog()
        .loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
        .supportPartitions()
        .getPartition("partition_name");

List partition names under a partitioned table

You can list all partition names under a partitioned table by sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions endpoint or by using the Gravitino Java client. The following is an example of listing all partition names under a partitioned table:

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions
GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://127.0.0.1:8090")
    .withMetalake("metalake")
    .build();

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
String[] partitionNames = 
    gravitinoClient
        .loadCatalog(NameIdentifier.of("metalake", "catalog"))
        .asTableCatalog()
        .loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
        .supportPartitions()
        .listPartitionNames();

List partitions under a partitioned table

If you want to get more detailed information about the partitions under a partitioned table, you can list all partitions under a partitioned table by sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions endpoint or by using the Gravitino Java client. The following is an example of listing all partitions under a partitioned table:

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions?details=true
// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition[] partitions =
        gravitinoClient
            .loadCatalog(NameIdentifier.of("metalake", "catalog"))
            .asTableCatalog()
            .loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
            .supportPartitions()
            .listPartitions();