-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for boolean columns in pruning logic #500
Conversation
@@ -324,42 +417,20 @@ impl<'a> PruningExpressionBuilder<'a> { | |||
self.scalar_expr | |||
} | |||
|
|||
fn is_stat_column_missing(&self, statistics_type: StatisticsType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic was just refactored into RequiredStatColumns
so I could reuse it
use crate::logical_plan; | ||
let field = schema.field_with_name(column_name).ok()?; | ||
|
||
if matches!(field.data_type(), &DataType::Boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the actual logic / rules
let result = p.prune(&statistics).unwrap_err(); | ||
assert!( | ||
result.to_string().contains( | ||
"Data type Boolean not supported for scalar operation on dyn array" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these aren't great messages, but they are what happens on master today, and I figured I would document them for posterity (and maybe inspire people to help fix them)
// predicate expression can only be a binary expression | ||
let (left, op, right) = match expr { | ||
Expr::BinaryExpr { left, op, right } => (left, *op, right), | ||
Expr::Column(name) => { | ||
if let Some(expr) = | ||
build_single_column_expr(&name, schema, required_columns, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of pattern probably can be written a bit shorter with some combinators. Something like:
build_single_column_expr(&name, schema, required_columns).ok().or(Ok(unhandled))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An excellent idea -- I will do so
Codecov Report
@@ Coverage Diff @@
## master #500 +/- ##
==========================================
+ Coverage 75.92% 76.02% +0.09%
==========================================
Files 154 154
Lines 26195 26421 +226
==========================================
+ Hits 19889 20087 +198
- Misses 6306 6334 +28
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature + great refactoring
I rebased this PR and added a few more tests. The code is unchanged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. 👍 Very well documented also 💯
Closes #490
This PR adds support for pruning of boolean predicates such as
flag_col
, andnot flag_col
so that they can be used to prune row groups from parquet files and other predicatesIt does not add code to handle
flag_col = true
andflag_col != false
(which currently error and continue to do so) as those are simplified in the ConstantEvaluation pass.This ended up being a larger change than I wanted because the logic to create
col_min
andcol_max
references was intertwined inPruningExpressionBuilder
Rationale for this change
See #490
What changes are included in this PR?
Major changes:
stat_column_req
into a newRequiredStatColumns
structStatisticsColumns
Are there any user-facing changes?
Additional predicates can be used to prune