Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsume non-identity partition predicate in Iceberg #12795

Merged
merged 4 commits into from
Jun 20, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Jun 10, 2022

Before the change, Iceberg connector accepted all predicates expressible
as TupleDomain on primitive columns and they were used to filter data
files during split generation. However, only predicates defined on
identity partitioning columns were subsumed into connector.

This commit extends Iceberg capabilities to subsume predicates on
partitioning columns. Besides subsuming predicates on identity
partitioning columns, it also subsumes predicates if they align with
partitioning boundaries. For example, for truncate(col, 2) (round to
100s) partitioning, predicates col >= 1200 OR col > 1199 are
subsumed, while col > 1200 or col > 1250 are not.

This change is especially important for Iceberg OPTIMIZE table
procedure, which requires the WHERE condition to be fully subsumed
into the connector. It is also helpful for DELETE, as it allows to do
metadata-only delete in more cases, where we don't really needing to do
a row-level delete.

Fixes #7905
For #12362

@cla-bot cla-bot bot added the cla-signed label Jun 10, 2022
@findepi
Copy link
Member Author

findepi commented Jun 10, 2022

Still a draft, so just FYI @alexjo2144 @losipiuk @ebyhr @erichwang @phd3 @findinpath @homar

@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch from 9f5c58d to 6316cc7 Compare June 10, 2022 13:44
@findepi
Copy link
Member Author

findepi commented Jun 10, 2022

CI #12726

@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch from 6316cc7 to 89ad178 Compare June 10, 2022 14:52
@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch 2 times, most recently from 03f49a9 to 997709f Compare June 10, 2022 15:19
@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch from 86b162b to e609691 Compare June 10, 2022 20:22
@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch 2 times, most recently from f4275ea to 50b834d Compare June 14, 2022 13:10
@findepi findepi marked this pull request as ready for review June 14, 2022 13:11
@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch 3 times, most recently from a0d139d to 09f133c Compare June 15, 2022 09:04
@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch from 09f133c to 69a2446 Compare June 15, 2022 09:18
@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch 2 times, most recently from 6bb399c to 7e23ddc Compare June 15, 2022 12:31
@findepi
Copy link
Member Author

findepi commented Jun 15, 2022

This is ready to review.
Test-only prefix is extracted to: #12865

@alexjo2144 @losipiuk @ebyhr @hashhar @erichwang @martint PTAL

@findepi findepi requested a review from alexjo2144 June 15, 2022 12:35
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat way of implementing this. I want to give it another read

Copy link
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits here, but the logic looks sound.

*
* @throws IllegalStateException if this type is not {@link Type#isOrderable() orderable}
*/
public static Optional<Object> getPreviousValue(Type type, Object value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you follow the java Math library naming conventions, this would be called: previousBefore or nextDown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to follow the java Math library here.
Which particular methods are you referring to?
("nextDown" still sounds odd to me)

anyway, i think this should be provided by the Type itself, so let's discuss best naming in #12797

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall

@findepi findepi force-pushed the findepi/iceberg-expression-pushdown branch from 7e23ddc to 15998a5 Compare June 20, 2022 10:53
@findepi
Copy link
Member Author

findepi commented Jun 20, 2022

AC

@findepi
Copy link
Member Author

findepi commented Jun 20, 2022

Build is green. Rebasing after #12865 merged.

Before the change, Iceberg connector accepted all predicates expressible
as `TupleDomain` on primitive columns and they were used to filter data
files during split generation. However, only predicates defined on
identity partitioning columns were subsumed into connector.

This commit extends Iceberg capabilities to subsume predicates on
partitioning columns. Besides subsuming predicates on identity
partitioning columns, it also subsumes predicates if they align with
partitioning boundaries. For example, for `truncate(col, 2)` (round to
100s) partitioning, predicates `col >= 1200` OR `col > 1199` are
subsumed, while `col > 1200` or `col > 1250` are not.

This change is especially important for Iceberg OPTIMIZE table
procedure, which requires the `WHERE` condition to be fully subsumed
into the connector. It is also helpful for `DELETE`, as it allows to do
metadata-only delete in more cases, where we don't really needing to do
a row-level delete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Support predicate pushdown and deletion for non-identity partition columns in Iceberg
6 participants