[#3365] feat(spark-connector): Support Iceberg metadata tables #3481

caican00 · 2024-05-21T06:13:38Z

What changes were proposed in this pull request?

Support Iceberg metadata tables, such as:

  ENTRIES,
  FILES,
  DATA_FILES,
  DELETE_FILES,
  HISTORY,
  METADATA_LOG_ENTRIES,
  SNAPSHOTS,
  REFS,
  MANIFESTS,
  PARTITIONS,
  ALL_DATA_FILES,
  ALL_DELETE_FILES,
  ALL_FILES,
  ALL_MANIFESTS,
  ALL_ENTRIES,
  POSITION_DELETES

Why are the changes needed?

Support Iceberg metadata tables.

Fix: #3365

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New IT and modified ITs.

caican00 · 2024-05-21T06:56:39Z

supported read Iceberg metadata tables, could you please help review it if you are free?
cc: @FANNG1

...rg/src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/ops/IcebergTableOps.java

FANNG1 · 2024-05-21T11:34:54Z

.../main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/ops/IcebergTableOpsHelper.java

@@ -351,7 +351,8 @@ public static TableIdentifier buildIcebergTableIdentifier(
   */
  public static TableIdentifier buildIcebergTableIdentifier(NameIdentifier nameIdentifier) {


could you explain why changing this?

could you explain why changing this?

see #3481 (comment).
the namespace maybe be concacted by db and table, so here we must to split it.

.../src/main/java/com/datastrato/gravitino/spark/connector/iceberg/GravitinoIcebergCatalog.java

FANNG1 · 2024-05-21T11:43:59Z

.../src/main/java/com/datastrato/gravitino/spark/connector/iceberg/GravitinoIcebergCatalog.java

+      return getCatalogDefaultNamespace();
+    } else if (sparkIdentifier.namespace().length == 1) {
+      return sparkIdentifier.namespace()[0];
+    } else if (sparkIdentifier.namespace().length == 2) {


what conditions when length == 2? could you add description ?

for an Iceberg metadata table, such as iceberg.default_db.test_iceberg_metadata_table.snapshots, the sparkIdentifier.namespace() should be ['default_db', 'test_iceberg_metadata_table'], and there is a limitation in RelationalCatalog, so I decide to concact the sparkIdentifier.namespace() here to avoid exception.

https://github.com/datastrato/gravitino/blob/ce29d83880a68c34f9aa25a8ced103c4ff9f118c/api/src/main/java/com/datastrato/gravitino/Namespace.java#L165-L170

in this method, the Namespace will contain metalake, catalog and db.

for iceberg.default_db.test_iceberg_metadata_table.snapshots, getDatabase will return default_db.test_iceberg_metadata_table? why not default_db?

for iceberg.default_db.test_iceberg_metadata_table.snapshots, getDatabase will return default_db.test_iceberg_metadata_table?

Yes.

why not default_db?

for an iceberg metadata table, such as iceberg.default_db.test_iceberg_metadata_table.snapshots, the real table name is snapshots. Spark Identifier automatically uses the last part as a tableName, and everything else except the catalog and tableName as a namespace

for iceberg.default_db.test_iceberg_metadata_table.snapshots, getDatabase will return default_db.test_iceberg_metadata_table?

Yes.

why not default_db?

for an iceberg metadata table, such as iceberg.default_db.test_iceberg_metadata_table.snapshots, the real table name is snapshots. Spark Identifier automatically uses the last part as a tableName, and everything else except the catalog and tableName as a namespace

cc: @FANNG1

caican00 · 2024-05-21T12:58:03Z

comments have been addressed and could you please help review again? @FANNG1

pan3793 · 2024-05-22T07:00:16Z

I have a question about the permission management of the metadata table, though it might be out of the scope of this PR.

Suppose there is an Iceberg table iceberg_ctlg.my_db.my_table, when role X is granted permission to the table, does it mean X is granted the same permission for all metatables?

FANNG1 · 2024-05-22T07:02:23Z

The current implementation retrivies metadata table (like Iceberg snapshot table) from Gravitino, I'm afraid whether this's out of the scope of Gravitino and Whether Gravitino is ready to provide the corresponding abilities (like we have to combine database and table as gravitino namespace because Gravitino only support one level namespace). Currently how about querying metadata table from underlying Iceberg catalog not though Gravitino, but this may skip the privilege check, WDYT? @jerryshao @caican00 @qqqttt123

FANNG1 · 2024-05-22T07:08:41Z

I have a question about the permission management of the metadata table, though it might be out of the scope of this PR.

Suppose there is an Iceberg table iceberg_ctlg.my_db.my_table, when role X is granted permission to the table, does it mean X is granted the same permission for all metatables?

The permission solution of spark connector had not been considered for now. From the first eye, seems we should keep the same permission for metatables. Kyuubi may encounter similar problems, @pan3793 could you share your thought? cc @qqqttt123 @jerryshao

pan3793 · 2024-05-22T07:14:28Z

There was an argument in the Kyuubi community about that. There are two opinions:

metadata tables are independent resources, should be granted permissions explicitly
metadata tables should inherit permissions from the data table

We have no conclusion yet. Just want to follow the common practices.

caican00 · 2024-05-22T07:33:13Z

The current implementation retrivies metadata table (like Iceberg snapshot table) from Gravitino, I'm afraid whether this's out of the scope of Gravitino and Whether Gravitino is ready to provide the corresponding abilities (like we have to combine database and table as gravitino namespace because Gravitino only support one level namespace). Currently how about querying metadata table from underlying Iceberg catalog not though Gravitino, but this may skip the privilege check, WDYT? @jerryshao @caican00 @qqqttt123

it seems not only skipping privilege check, but audit will also be skipped if the audit ability is supported. cc @FANNG1

caican00 · 2024-05-22T07:55:39Z

I have a question about the permission management of the metadata table, though it might be out of the scope of this PR.

Suppose there is an Iceberg table iceberg_ctlg.my_db.my_table, when role X is granted permission to the table, does it mean X is granted the same permission for all metatables?

FYI: Iceberg metadata tables are created implicitly, and it seems more reasonable to inherit the permissions of data tables directly.

qqqttt123 · 2024-05-22T12:25:54Z

We have a load table operation for rest api. If someone has read or write table privilege, he can load the table, too.

jerryshao · 2024-05-22T12:30:16Z

I would suggest we think of a thorough and elegant solution in namespace and nameIdentifier to handle this scenario, we don't want a quick and dirty solution, it would be better to see how to adjust the API to support this scenario. Besides, all the privileges/audits and others should be controlled by Gravitino, so we should not bypass Gravitino.

caican00 · 2024-05-23T01:52:51Z

I would suggest we think of a thorough and elegant solution in namespace and nameIdentifier to handle this scenario, we don't want a quick and dirty solution, it would be better to see how to adjust the API to support this scenario. Besides, all the privileges/audits and others should be controlled by Gravitino, so we should not bypass Gravitino.

got it.

FANNG1 · 2024-06-04T15:28:42Z

@caican00 do you like to continue the work since #3696 is merged

caican00 · 2024-06-05T08:21:52Z

@caican00 do you like to continue the work since #3696 is merged

@FANNG1 ok, i will continue to work on this.

…etadata-tables

[apache#3365] feat(spark-connector): Support Iceberg metadata tables

1325938

caican00 mentioned this pull request May 21, 2024

[Subtask] [spark-connector] support Iceberg catalog #1571

Closed

FANNG1 reviewed May 21, 2024

View reviewed changes

...rg/src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/ops/IcebergTableOps.java Outdated Show resolved Hide resolved

FANNG1 reviewed May 21, 2024

View reviewed changes

.../src/main/java/com/datastrato/gravitino/spark/connector/iceberg/GravitinoIcebergCatalog.java Outdated Show resolved Hide resolved

FANNG1 reviewed May 21, 2024

View reviewed changes

update

a0570b8

caican00 requested a review from FANNG1 May 22, 2024 01:49

caican00 added 5 commits June 5, 2024 19:22

Merge branch 'main' of github.com:datastrato/gravitino into iceberg-m…

3ee4949

…etadata-tables

update

686076f

update

2b4a896

update

795620b

update

dac9b60

caican00 marked this pull request as draft August 12, 2024 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#3365] feat(spark-connector): Support Iceberg metadata tables #3481

[#3365] feat(spark-connector): Support Iceberg metadata tables #3481

caican00 commented May 21, 2024 •

edited

Loading

caican00 commented May 21, 2024

FANNG1 May 21, 2024

caican00 May 21, 2024

FANNG1 May 21, 2024

caican00 May 21, 2024 •

edited

Loading

caican00 May 21, 2024

FANNG1 May 21, 2024

caican00 May 22, 2024 •

edited

Loading

caican00 May 22, 2024

caican00 commented May 21, 2024

pan3793 commented May 22, 2024 •

edited

Loading

FANNG1 commented May 22, 2024 •

edited

Loading

FANNG1 commented May 22, 2024

pan3793 commented May 22, 2024

caican00 commented May 22, 2024

caican00 commented May 22, 2024

qqqttt123 commented May 22, 2024

jerryshao commented May 22, 2024 •

edited

Loading

caican00 commented May 23, 2024 •

edited

Loading

FANNG1 commented Jun 4, 2024

caican00 commented Jun 5, 2024 •

edited

Loading

		@@ -351,7 +351,8 @@ public static TableIdentifier buildIcebergTableIdentifier(
		*/
		public static TableIdentifier buildIcebergTableIdentifier(NameIdentifier nameIdentifier) {

[#3365] feat(spark-connector): Support Iceberg metadata tables #3481

Are you sure you want to change the base?

[#3365] feat(spark-connector): Support Iceberg metadata tables #3481

Conversation

caican00 commented May 21, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

caican00 commented May 21, 2024

FANNG1 May 21, 2024

Choose a reason for hiding this comment

caican00 May 21, 2024

Choose a reason for hiding this comment

FANNG1 May 21, 2024

Choose a reason for hiding this comment

caican00 May 21, 2024 • edited Loading

Choose a reason for hiding this comment

caican00 May 21, 2024

Choose a reason for hiding this comment

FANNG1 May 21, 2024

Choose a reason for hiding this comment

caican00 May 22, 2024 • edited Loading

Choose a reason for hiding this comment

caican00 May 22, 2024

Choose a reason for hiding this comment

caican00 commented May 21, 2024

pan3793 commented May 22, 2024 • edited Loading

FANNG1 commented May 22, 2024 • edited Loading

FANNG1 commented May 22, 2024

pan3793 commented May 22, 2024

caican00 commented May 22, 2024

caican00 commented May 22, 2024

qqqttt123 commented May 22, 2024

jerryshao commented May 22, 2024 • edited Loading

caican00 commented May 23, 2024 • edited Loading

FANNG1 commented Jun 4, 2024

caican00 commented Jun 5, 2024 • edited Loading

caican00 commented May 21, 2024 •

edited

Loading

caican00 May 21, 2024 •

edited

Loading

caican00 May 22, 2024 •

edited

Loading

pan3793 commented May 22, 2024 •

edited

Loading

FANNG1 commented May 22, 2024 •

edited

Loading

jerryshao commented May 22, 2024 •

edited

Loading

caican00 commented May 23, 2024 •

edited

Loading

caican00 commented Jun 5, 2024 •

edited

Loading