-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil #4758
Conversation
} | ||
public static Dataset<Row> loadMetadataTable(SparkSession spark, Table table, MetadataTableType type, | ||
Map<String, String> extraOptions) { | ||
SparkTable metadataTable = new SparkTable(MetadataTableUtils.createMetadataTableInstance(table, type), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like constructing DataSourceV2Relation
directly is the easiest and safest option.
Let me know if I missed a use case where we still need the old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had this code in that private method in Spark3Util
. I just moved it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw this too, great, this is one benefit about splitting the Spark modules.
.orNoop() | ||
.build(); | ||
|
||
public static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, Table table, MetadataTableType type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just flagging that this will remove public method, in case anyone may use this method (though I feel there should not be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I missed it. I wonder why we made it public, though. I'd probably remove the method but I don't mind deprecating and delegating to the one below if folks think this may impact anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably deprecate it for at least one release before dropping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a grey area, in which these methods actually are APIs, but not considered as APIs. Per our community sync, only the public things in the api module are considered as APIs. We need something(e.g. annotation) to mark these methods as APIs. Otherwise, the discussion of which public method should be deprecated and which shouldn't will keep going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
Iceberg 0.13.0.3 - ADT 1.1.7 2022-05-20 PRs Merged * Internal: Parquet bloom filter support (apache#594 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/594)) * Internal: AWS Kms Client (apache#630 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/630)) * Internal: Core: Add client-side check of encryption properties (apache#626 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/626)) * Core: Align snapshot summary property names for delete files (apache#4766 (apache/iceberg#4766)) * Core: Add eq and pos delete file counts to snapshot summary (apache#4677 (apache/iceberg#4677)) * Spark 3.2: Clean static vars in SparkTableUtil (apache#4765 (apache/iceberg#4765)) * Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil (apache#4758 (apache/iceberg#4758)) * Core: Fix query failure when using projection on top of partitions metadata table (apache#4720) (apache#619 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/619)) Key Notes Bloom filter support and Client Side Encryption Features can be used in this release. Both features are only enabled with explicit flags and will not effect existing tables or jobs.
…apache#4758) (cherry picked from commit 68f5529)
This logic removes the unnecessary reflection to load metadata tables in
SparkTableUtil
since we have separate modules for Spark versions.