-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(pruning): Support IS NOT NULL
predicates in PruningPredicate
#9208
chore(pruning): Support IS NOT NULL
predicates in PruningPredicate
#9208
Conversation
PruningPredicate
IS NOT NULL
predicates in PruningPredicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @appletreeisyellow -- this looks great. The PR description is clear and the code is 👍
Can you please one more test of this logic integrated into the overall row group pruning logic (to make sure it works in the context of a predicate). I think you can follow the model of these tests for IS NULL
:
cc @viirya who I think contributed the original IS NULL
rewrite
datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs
Outdated
Show resolved
Hide resolved
chore: remove Debug derive
60ea827
to
5040cf8
Compare
// int > 1 and IsNull(bool) => c1_max > 1 and bool_null_count > 0 | ||
// c1 > 15 and IsNull(c2) => c1_max > 15 and c2_null_count > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment to reflect the examples given in the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This reads more smoothly.
// int > 1 and IsNull(bool) => c1_max > 1 and bool_null_count > 0 | ||
// c1 > 15 and IsNull(c2) => c1_max > 15 and c2_null_count > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This reads more smoothly.
datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thank you @appletreeisyellow and @viirya
I believe this PR has a logic bug that we did not catch in code review: #9230 |
…redicate` (apache#9208)" This reverts commit cc139c9.
Which issue does this PR close?
Help with #9171 and #7869
Rationale for this change
PruningPredicate
rewrites the original predicate into an expression that references min/max/null_count values of each column in the original predicateCurrently,
IS NOT NULL
is not supported, which means the rewritten predicate is ALWAYS TRUE, e.g.:x IS NOT NULL
true
See the code below,
phys_expr::IsNotNullExpr
is not listed/supported:https://github.com/apache/arrow-datafusion/blob/292865e38cba1275a0f863230d816ee0e08c1318/datafusion/core/src/physical_optimizer/pruning.rs#L1139-L1140
What changes are included in this PR?
Support
IS NOT NULL
predicate rewrite, e.g.:x IS NOT NULL
x_null_count = 0
Are these changes tested?
Yes
Are there any user-facing changes?
No API change. User will experience performance improvements on queries that include
IS NOT NULL
predicates