-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#3365] feat(spark-connector): Support Iceberg metadata tables #3481
base: main
Are you sure you want to change the base?
Conversation
supported read Iceberg metadata tables, could you please help review it if you are free? |
...rg/src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/ops/IcebergTableOps.java
Outdated
Show resolved
Hide resolved
@@ -351,7 +351,8 @@ public static TableIdentifier buildIcebergTableIdentifier( | |||
*/ | |||
public static TableIdentifier buildIcebergTableIdentifier(NameIdentifier nameIdentifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why changing this?
see #3481 (comment).
the namespace maybe be concacted by db and table, so here we must to split it.
.../src/main/java/com/datastrato/gravitino/spark/connector/iceberg/GravitinoIcebergCatalog.java
Outdated
Show resolved
Hide resolved
return getCatalogDefaultNamespace(); | ||
} else if (sparkIdentifier.namespace().length == 1) { | ||
return sparkIdentifier.namespace()[0]; | ||
} else if (sparkIdentifier.namespace().length == 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what conditions when length == 2? could you add description ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for an Iceberg metadata table, such as iceberg.default_db.test_iceberg_metadata_table.snapshots
, the sparkIdentifier.namespace()
should be ['default_db', 'test_iceberg_metadata_table']
, and there is a limitation in RelationalCatalog
, so I decide to concact the sparkIdentifier.namespace()
here to avoid exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this method, the Namespace will contain metalake, catalog and db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for iceberg.default_db.test_iceberg_metadata_table.snapshots
, getDatabase will return default_db.test_iceberg_metadata_table
? why not default_db
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for iceberg.default_db.test_iceberg_metadata_table.snapshots, getDatabase will return default_db.test_iceberg_metadata_table?
Yes.
why not default_db?
for an iceberg metadata table, such as iceberg.default_db.test_iceberg_metadata_table.snapshots
, the real table name is snapshots
. Spark Identifier
automatically uses the last part as a tableName, and everything else except the catalog and tableName as a namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for iceberg.default_db.test_iceberg_metadata_table.snapshots, getDatabase will return default_db.test_iceberg_metadata_table?
Yes.
why not default_db?
for an iceberg metadata table, such as
iceberg.default_db.test_iceberg_metadata_table.snapshots
, the real table name issnapshots
. SparkIdentifier
automatically uses the last part as a tableName, and everything else except the catalog and tableName as a namespace
cc: @FANNG1
comments have been addressed and could you please help review again? @FANNG1 |
I have a question about the permission management of the metadata table, though it might be out of the scope of this PR. Suppose there is an Iceberg table |
The current implementation retrivies metadata table (like Iceberg snapshot table) from Gravitino, I'm afraid whether this's out of the scope of Gravitino and Whether Gravitino is ready to provide the corresponding abilities (like we have to combine database and table as gravitino namespace because Gravitino only support one level namespace). Currently how about querying metadata table from underlying Iceberg catalog not though Gravitino, but this may skip the privilege check, WDYT? @jerryshao @caican00 @qqqttt123 |
The permission solution of spark connector had not been considered for now. From the first eye, seems we should keep the same permission for metatables. Kyuubi may encounter similar problems, @pan3793 could you share your thought? cc @qqqttt123 @jerryshao |
There was an argument in the Kyuubi community about that. There are two opinions:
We have no conclusion yet. Just want to follow the common practices. |
it seems not only skipping privilege check, but audit will also be skipped if the audit ability is supported. cc @FANNG1 |
FYI: Iceberg metadata tables are created implicitly, and it seems more reasonable to inherit the permissions of data tables directly. |
We have a load table operation for rest api. If someone has read or write table privilege, he can load the table, too. |
I would suggest we think of a thorough and elegant solution in |
got it. |
What changes were proposed in this pull request?
Support Iceberg metadata tables, such as:
Why are the changes needed?
Support Iceberg metadata tables.
Fix: #3365
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New IT and modified ITs.