Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4952] feat(hudi-catalog): add implementation of HMSBackend for Hudi catalog #4942

Merged
merged 3 commits into from
Oct 11, 2024

Conversation

mchades
Copy link
Contributor

@mchades mchades commented Sep 13, 2024

What changes were proposed in this pull request?

support read operations for Hudi catalog HMS backend

Why are the changes needed?

Fix: #4952

Does this PR introduce any user-facing change?

no

How was this patch tested?

UTs added

@mchades mchades force-pushed the hudi-hms branch 2 times, most recently from b235c0c to 17f93c8 Compare September 18, 2024 06:48
@mchades mchades changed the title implementation of HMSBackend for Hudi catalog [#4952] feat(hudi-catalog): add implementation of HMSBackend for Hudi catalog Sep 18, 2024
@mchades mchades marked this pull request as ready for review September 18, 2024 06:52
c ->
c.getTables(schemaIdent.name(), "*").stream()
.map(table -> NameIdentifier.of(namespace, table))
.toArray(NameIdentifier[]::new));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will here list all the tables, not just Hudi table, should we filter out non-hudi table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 64 to 74
partitioning = HiveTableConverter.getPartitioning(hmsTable);
sortOrders = HiveTableConverter.getSortOrders(hmsTable);
distribution = HiveTableConverter.getDistribution(hmsTable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Hudi store such information in HMS, is it compatible with Hive table? As I know, for Iceberg, we need some Iceberg APIs to get partitioning, sortOrders, because Iceberg will store such information in it's metadata file, not in HMS, I guess Hudi is similar, can you please confirm this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments added to the code

@mchades mchades force-pushed the hudi-hms branch 2 times, most recently from 3d8cdf2 to 591654b Compare September 26, 2024 09:53
@mchades mchades requested a review from jerryshao September 26, 2024 09:55
@mchades
Copy link
Contributor Author

mchades commented Sep 27, 2024

All comments resolved, plz help to review again when you have time, thanks! @jerryshao

.filter(
t ->
t.getSd().getInputFormat() != null
&& t.getSd().getInputFormat().startsWith("org.apache.hudi"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make this "org.apache.hudi" a static variable to avoid hard coding here? Also adding some comments about the purpose here. This is quite hacky, because if the Hudi package is changed or somehow, the assumption here will be failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

try {
Table table =
clientPool.run(client -> client.getTable(schemaIdent.name(), tableIdent.name()));
return HudiHMSTable.builder().withBackendTable(table).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also make sure that the loaded table is a Hudi table, otherwise throw an exception instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

protected HudiHMSTable buildFromTable(Table hmsTable) {
name = hmsTable.getTableName();
comment = hmsTable.getParameters().get(COMMENT);
columns = HiveTableConverter.getColumns(hmsTable, HudiColumn.builder());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Hudi's column type exactly the same as Hive table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the table in the Hudi document, the data types supported by Hudi are fewer than those of Hive. I assume they are a subset of Hive. So I think there should be no problem with the conversion here.

@jerryshao jerryshao merged commit fdb07ab into apache:main Oct 11, 2024
26 checks passed
mchades added a commit to mchades/gravitino that referenced this pull request Oct 11, 2024
…r Hudi catalog (apache#4942)

support read operations for Hudi catalog HMS backend

Fix: apache#4952

no

UTs added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support read operations of HMSBackend for Hudi catalog
2 participants