-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsume non-identity partition predicate in Iceberg #12795
Subsume non-identity partition predicate in Iceberg #12795
Conversation
Still a draft, so just FYI @alexjo2144 @losipiuk @ebyhr @erichwang @phd3 @findinpath @homar |
9f5c58d
to
6316cc7
Compare
CI #12726 |
6316cc7
to
89ad178
Compare
03f49a9
to
997709f
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
86b162b
to
e609691
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
f4275ea
to
50b834d
Compare
a0d139d
to
09f133c
Compare
09f133c
to
69a2446
Compare
6bb399c
to
7e23ddc
Compare
This is ready to review. @alexjo2144 @losipiuk @ebyhr @hashhar @erichwang @martint PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat way of implementing this. I want to give it another read
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits here, but the logic looks sound.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoTypes.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoTypes.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
* | ||
* @throws IllegalStateException if this type is not {@link Type#isOrderable() orderable} | ||
*/ | ||
public static Optional<Object> getPreviousValue(Type type, Object value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you follow the java Math library naming conventions, this would be called: previousBefore or nextDown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to follow the java Math library here.
Which particular methods are you referring to?
("nextDown" still sounds odd to me)
anyway, i think this should be provided by the Type
itself, so let's discuss best naming in #12797
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoTypes.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/PartitionTransforms.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoTypes.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
7e23ddc
to
15998a5
Compare
AC |
Build is green. Rebasing after #12865 merged. |
Before the change, Iceberg connector accepted all predicates expressible as `TupleDomain` on primitive columns and they were used to filter data files during split generation. However, only predicates defined on identity partitioning columns were subsumed into connector. This commit extends Iceberg capabilities to subsume predicates on partitioning columns. Besides subsuming predicates on identity partitioning columns, it also subsumes predicates if they align with partitioning boundaries. For example, for `truncate(col, 2)` (round to 100s) partitioning, predicates `col >= 1200` OR `col > 1199` are subsumed, while `col > 1200` or `col > 1250` are not. This change is especially important for Iceberg OPTIMIZE table procedure, which requires the `WHERE` condition to be fully subsumed into the connector. It is also helpful for `DELETE`, as it allows to do metadata-only delete in more cases, where we don't really needing to do a row-level delete.
15998a5
to
4b54f26
Compare
Before the change, Iceberg connector accepted all predicates expressible
as
TupleDomain
on primitive columns and they were used to filter datafiles during split generation. However, only predicates defined on
identity partitioning columns were subsumed into connector.
This commit extends Iceberg capabilities to subsume predicates on
partitioning columns. Besides subsuming predicates on identity
partitioning columns, it also subsumes predicates if they align with
partitioning boundaries. For example, for
truncate(col, 2)
(round to100s) partitioning, predicates
col >= 1200
ORcol > 1199
aresubsumed, while
col > 1200
orcol > 1250
are not.This change is especially important for Iceberg OPTIMIZE table
procedure, which requires the
WHERE
condition to be fully subsumedinto the connector. It is also helpful for
DELETE
, as it allows to dometadata-only delete in more cases, where we don't really needing to do
a row-level delete.
Fixes #7905
For #12362