-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Iceberg metadata scans #7367
Conversation
Also |
59413d1
to
809e773
Compare
@sopel39 thanks, will fix it in a separate PR |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HdfsFileIo.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperationsProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrackingFileIoProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrackingFileIoProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ForTrackingFileIoProvider.java
Outdated
Show resolved
Hide resolved
...n/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetadataFileOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableHandle.java
Outdated
Show resolved
Hide resolved
b75337f
to
9355941
Compare
I haven't looked at the code but I recommend looking at https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/CachingCatalog.html and moving the connector to catalog Apis and just using this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
...n/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetadataFileOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrackingFileIoModule.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperations.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
return TupleDomain.fromFixedValues(partitionValues); | ||
}); | ||
Iterable<TupleDomain<ColumnHandle>> discreteTupleDomain = Iterables.transform( | ||
// Avoid invoking tableScan.planFiles() eagerly which fetches metadata file and manifest lists. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment below for new ConnectorTableProperties
still accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think so. that comment talks about narrowing down enforcedPredicate
by using the discretePredicates
, which we're still not doing - the same reason as before.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
9355941
to
5c99dcc
Compare
Cache Iceberg's TableMetadata object so that contents of metadata file can be reused. Lazily initialize iterator that lists data files in getTableProperties.
5c99dcc
to
79d3919
Compare
@electrum applied comments. |
Merged #7367. |
#7336
This PR
metadata
file on the read path by reusing Iceberg'sTableMetadata
snapshot
file on the read path by lazily initializing file iterator duringIcebergMetadata#getTableProperties
some other related things to discuss:
Iceberg's
TableMetadata
does not provide fully immutable objects (e.g. some lazy loading in BaseSnapshot, Schema etc). On a cursory look, iceberg code seems to have synchronization (or rely on reference assignments being atomic and marking them as volatile), but not sure if it's ideal to trust that as the spec/api-docs don't say anything. Adding a wrapper on Iceberg'sTableMetadata
isn't very useful since we need to use the fullTableMetadata
object in IcebergSplitManager anyway. (It'd require bigger refactoring if we want to create replica of Iceberg objects in trino)getTableStatistics
also invokesplanFiles
, which causes repeated iterations on metadata, snapshot and manifest files. however, invocations of getTableStatistics during planning may also have different predicates. This PR doesn't cache/optimize it.Streams.stream(combinedScanIterable)
seems to cause manifest files to be read twice, but it shouldn't need to, imo this needs to be fixed in Iceberg.