-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Iceberg partition predicate enforcement #13239
Conversation
4b74206
to
e27be8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems reasonable to me
@@ -1769,6 +1769,10 @@ public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(C | |||
else { | |||
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName()); | |||
|
|||
Set<Integer> partitionSpecIds = icebergTable.snapshot(table.getSnapshotId().orElseThrow()).allManifests().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call me ignorant, but under what conditions would table.getSnapshotId() be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It's only ever empty during new table creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a message so that -- if it throws -- we know what assumption was violated
snapshotId = table.getSnapshotId().orElseThrow(() -> new IllegalStateException("no snapshot id"))
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM expect for the call to allManifests()
.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
@@ -1769,6 +1769,10 @@ public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(C | |||
else { | |||
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName()); | |||
|
|||
Set<Integer> partitionSpecIds = icebergTable.snapshot(table.getSnapshotId().orElseThrow()).allManifests().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a message so that -- if it throws -- we know what assumption was violated
snapshotId = table.getSnapshotId().orElseThrow(() -> new IllegalStateException("no snapshot id"))
@@ -1769,6 +1769,10 @@ public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(C | |||
else { | |||
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName()); | |||
|
|||
Set<Integer> partitionSpecIds = icebergTable.snapshot(table.getSnapshotId().orElseThrow()).allManifests().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think calling allManifests
can be expensive, right?
can we prune them out using the predicates we already know + the new ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting this list doesn't require reading all of the individual Manifests, just the ManifestList, so it's one Avro file to read
@findepi AC, thanks. Had to make one more small change to |
42996c4
to
a14e665
Compare
@@ -129,7 +129,7 @@ public void testDereferencePushdown() | |||
|
|||
getQueryRunner().execute(format( | |||
"CREATE TABLE %s (col0, col1) WITH (partitioning = ARRAY['col1']) AS" + | |||
" SELECT CAST(row(5, 6) AS row(x bigint, y bigint)) AS col0, 5 AS col1 WHERE false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an implicit change here that we can now enforce all predicates on empty tables. That broke this test because the table is empty but was not expecting a predicate to be enforced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't it mean that now iceberg will fail on valid sql ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke with Konrad offline about this, not a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please squash
Only partition specs which are used by the Snapshot being queried are relevant when determining if partitions can be used to enforce a Predicate.
a14e665
to
6c54592
Compare
Squashed |
Description
Only partition specs which are used by the Snapshot being queried are relevant when determining if partitions can be used to enforce a Predicate.
Improvement
Iceberg connector
Improve filtering on partition columns in Iceberg
Related issues, pull requests, and links
Relates to: #12795
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: