Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil #4758

Merged
merged 2 commits into from
May 13, 2022

Conversation

aokolnychyi
Copy link
Contributor

This logic removes the unnecessary reflection to load metadata tables in SparkTableUtil since we have separate modules for Spark versions.

@github-actions github-actions bot added the spark label May 12, 2022
@aokolnychyi
Copy link
Contributor Author

aokolnychyi commented May 12, 2022

}
public static Dataset<Row> loadMetadataTable(SparkSession spark, Table table, MetadataTableType type,
Map<String, String> extraOptions) {
SparkTable metadataTable = new SparkTable(MetadataTableUtils.createMetadataTableInstance(table, type), false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like constructing DataSourceV2Relation directly is the easiest and safest option.
Let me know if I missed a use case where we still need the old code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this code in that private method in Spark3Util. I just moved it.

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw this too, great, this is one benefit about splitting the Spark modules.

.orNoop()
.build();

public static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, Table table, MetadataTableType type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging that this will remove public method, in case anyone may use this method (though I feel there should not be).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I missed it. I wonder why we made it public, though. I'd probably remove the method but I don't mind deprecating and delegating to the one below if folks think this may impact anyone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably deprecate it for at least one release before dropping it.

Copy link
Contributor

@flyrain flyrain May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a grey area, in which these methods actually are APIs, but not considered as APIs. Per our community sync, only the public things in the api module are considered as APIs. We need something(e.g. annotation) to mark these methods as APIs. Otherwise, the discussion of which public method should be deprecated and which shouldn't will keep going.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@aokolnychyi aokolnychyi merged commit 68f5529 into apache:master May 13, 2022
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
Iceberg 0.13.0.3 - ADT 1.1.7

2022-05-20

PRs Merged

* Internal: Parquet bloom filter support (apache#594 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/594))
* Internal: AWS Kms Client (apache#630 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/630))
* Internal: Core: Add client-side check of encryption properties (apache#626 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/626))
* Core: Align snapshot summary property names for delete files (apache#4766 (apache/iceberg#4766))
* Core: Add eq and pos delete file counts to snapshot summary (apache#4677 (apache/iceberg#4677))
* Spark 3.2: Clean static vars in SparkTableUtil (apache#4765 (apache/iceberg#4765))
* Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil (apache#4758 (apache/iceberg#4758))
* Core: Fix query failure when using projection on top of partitions metadata table (apache#4720) (apache#619 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/619))

Key Notes

Bloom filter support and Client Side Encryption Features can be used in this release. Both features are only enabled with explicit flags and will not effect existing tables or jobs.
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants