Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil #4758

aokolnychyi · 2022-05-12T17:53:19Z

This logic removes the unnecessary reflection to load metadata tables in SparkTableUtil since we have separate modules for Spark versions.

aokolnychyi · 2022-05-12T17:54:38Z

cc @RussellSpitzer @flyrain @karuppayya @szehon-ho @rdblue @kbendick

aokolnychyi · 2022-05-12T17:56:21Z

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java

-    }
+  public static Dataset<Row> loadMetadataTable(SparkSession spark, Table table, MetadataTableType type,
+                                               Map<String, String> extraOptions) {
+    SparkTable metadataTable = new SparkTable(MetadataTableUtils.createMetadataTableInstance(table, type), false);


Looks like constructing DataSourceV2Relation directly is the easiest and safest option.
Let me know if I missed a use case where we still need the old code.

We had this code in that private method in Spark3Util. I just moved it.

szehon-ho

I just saw this too, great, this is one benefit about splitting the Spark modules.

szehon-ho · 2022-05-12T22:45:43Z

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java

-      .orNoop()
-      .build();
-
-  public static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, Table table, MetadataTableType type) {


Just flagging that this will remove public method, in case anyone may use this method (though I feel there should not be).

Good point, I missed it. I wonder why we made it public, though. I'd probably remove the method but I don't mind deprecating and delegating to the one below if folks think this may impact anyone.

I think we should probably deprecate it for at least one release before dropping it.

Yeah, this is a grey area, in which these methods actually are APIs, but not considered as APIs. Per our community sync, only the public things in the api module are considered as APIs. We need something(e.g. annotation) to mark these methods as APIs. Otherwise, the discussion of which public method should be deprecated and which shouldn't will keep going.

Sounds good to me.

Iceberg 0.13.0.3 - ADT 1.1.7 2022-05-20 PRs Merged * Internal: Parquet bloom filter support (apache#594 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/594)) * Internal: AWS Kms Client (apache#630 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/630)) * Internal: Core: Add client-side check of encryption properties (apache#626 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/626)) * Core: Align snapshot summary property names for delete files (apache#4766 (apache/iceberg#4766)) * Core: Add eq and pos delete file counts to snapshot summary (apache#4677 (apache/iceberg#4677)) * Spark 3.2: Clean static vars in SparkTableUtil (apache#4765 (apache/iceberg#4765)) * Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil (apache#4758 (apache/iceberg#4758)) * Core: Fix query failure when using projection on top of partitions metadata table (apache#4720) (apache#619 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/619)) Key Notes Bloom filter support and Client Side Encryption Features can be used in this release. Both features are only enabled with explicit flags and will not effect existing tables or jobs.

…apache#4758) (cherry picked from commit 68f5529)

Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil

20e95c4

github-actions bot added the spark label May 12, 2022

aokolnychyi commented May 12, 2022

View reviewed changes

szehon-ho approved these changes May 12, 2022

View reviewed changes

Deprecate method

f1bd31f

aokolnychyi merged commit 68f5529 into apache:master May 13, 2022

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023

Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil (…

69b5c21

…apache#4758) (cherry picked from commit 68f5529)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil #4758

Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil #4758

aokolnychyi commented May 12, 2022

aokolnychyi commented May 12, 2022 •

edited

Loading

aokolnychyi May 12, 2022

aokolnychyi May 12, 2022

szehon-ho left a comment •

edited

Loading

szehon-ho May 12, 2022

aokolnychyi May 12, 2022

RussellSpitzer May 13, 2022

flyrain May 13, 2022 •

edited

Loading

aokolnychyi May 13, 2022

Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil #4758

Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil #4758

Conversation

aokolnychyi commented May 12, 2022

aokolnychyi commented May 12, 2022 • edited Loading

aokolnychyi May 12, 2022

Choose a reason for hiding this comment

aokolnychyi May 12, 2022

Choose a reason for hiding this comment

szehon-ho left a comment • edited Loading

Choose a reason for hiding this comment

szehon-ho May 12, 2022

Choose a reason for hiding this comment

aokolnychyi May 12, 2022

Choose a reason for hiding this comment

RussellSpitzer May 13, 2022

Choose a reason for hiding this comment

flyrain May 13, 2022 • edited Loading

Choose a reason for hiding this comment

aokolnychyi May 13, 2022

Choose a reason for hiding this comment

aokolnychyi commented May 12, 2022 •

edited

Loading

szehon-ho left a comment •

edited

Loading

flyrain May 13, 2022 •

edited

Loading