Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2962] feat(spark-connector): Add reserved properties to Table Properties when load an Iceberg table #2964

Closed
wants to merge 15 commits into from

Conversation

caican00
Copy link
Collaborator

@caican00 caican00 commented Apr 16, 2024

What changes were proposed in this pull request?

Add reserved properties to Table properties when load an Iceberg table, such as:

provider,
format,
current-snapshot-id,
location,
format-version,
sort-order,
identifier-fields

And we got the reserved properties from the proxy SparkTable of Iceberg.

Why are the changes needed?

for example, when execute desc extended IcebergTableName,it will get some information from the result properties of this Iceberg table, so it should contain the above properties.

Fix: #2962

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UTs and ITs.

@caican00
Copy link
Collaborator Author

Hi @FANNG1 could you help review this PR when you are free? Thank you very much.

Map<String, String> properties;
properties =
propertiesConverter.toSparkTableProperties(
gravitinoTable.properties(), getSparkTable().properties());
Copy link
Contributor

@FANNG1 FANNG1 Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hacky to combine properties from Gravitino and realCatalog. I think the right direction is try to provide the properties from Gravitino.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know the reason why couldn't get the reserved properties from Gravitino?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hacky to combine properties from Gravitino and realCatalog. I think the right direction is try to provide the properties from Gravitino.

My original implementation was this, but I was concerned that if all reserved properties were placed into Gravitino IcebergTable's properties, there would be differences between different computing engines, such as Spark and Flink.
Because they all will get the same reserved properties from the Gravitino IcebergTable.
For this reason, i changed the solution to retrieve reserved properties from the realTable at the spark-connector. cc @FANNG1

Copy link
Collaborator Author

@caican00 caican00 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hacky to combine properties from Gravitino and realCatalog. I think the right direction is try to provide the properties from Gravitino.

My original implementation was this, but I was concerned that if all reserved properties were placed into Gravitino IcebergTable's properties, there would be differences between different computing engines, such as Spark and Flink. Because they all will get the same reserved properties from the Gravitino IcebergTable. For this reason, i changed the solution to retrieve reserved properties from the realTable at the spark-connector. cc @FANNG1

@FANNG1 Should i add the reserved properties into the Gravitino IceebrgTable's properties directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know the reason why couldn't get the reserved properties from Gravitino?

@caican00 do you know the reason?

Copy link
Collaborator Author

@caican00 caican00 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 putting location into properties in Gravitino side will cause unexpected problems, such as trino.
https://github.com/datastrato/gravitino/actions/runs/8718294890/job/23915239099?pr=2709

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi1129 @diqiu50 could help to point out how to fix trino it after adding location properties for Iceberg table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caican00
You can directly change the file lakehouse-iceberg/00000_create_table.txt and use the correct output.

Use the wildcard character '%' if necessary.
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caican00 You can directly change the file lakehouse-iceberg/00000_create_table.txt and use the correct output.

Use the wildcard character '%' if necessary. image

got it.

@caican00 caican00 requested a review from FANNG1 April 17, 2024 03:26
@caican00 caican00 marked this pull request as draft April 17, 2024 07:33
@caican00
Copy link
Collaborator Author

close this pr as we have create a new one #3511

@caican00 caican00 closed this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Add reserved properties to Table Properties when load an Iceberg table
3 participants