-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support =
, <
, <=
, >
, >=
, !=
, is distinct from
, is not distinct from
for BooleanArray
#1163
Conversation
@@ -814,14 +870,68 @@ pub fn binary( | |||
Ok(Arc::new(BinaryExpr::new(l, op, r))) | |||
} | |||
|
|||
// TODO file a ticket with arrow-rs to include these kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO file a ticket with arrow-rs to include these kernels | |
// When arrow-rs has these kernels, can remove this implementation | |
// see https://github.com/apache/arrow-rs/issues/842 |
Filed ticket in arrow apache/arrow-rs#842
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW @Dandandan has added these kernels upstream to Arrow so we can use 6.1.0 when that comes out (in a week or so): apache/arrow-rs#844
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimexist has actually implemented operations like bool_lt
etc in apache/arrow-rs#860 so when that is available in datafusion (next week) I will update this PR to include those operations as well
This one is waiting on arrow-rs 6.1 to be released, and then I should be able to clean it up and get it ready for a proper review |
fda35ac
to
3cb11a4
Compare
Turns out that we forgot to make the required functions public 🤦 . Will wait for arrow 6.2 to include apache/arrow-rs#913 |
3cb11a4
to
5d09920
Compare
=
, <
, <=
, >
, >=
, !=
for BooleanArray
=
, <
, <=
, >
, >=
, !=
for BooleanArray
=
, <
, <=
, >
, >=
, !=
, is distinct from
, is not distinct from
for BooleanArray
…distinct from` for `BooleanArray`
5d09920
to
c990851
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @Dandandan @jimexist
result | ||
) | ||
let result = p.prune(&statistics).unwrap(); | ||
assert_eq!(result, expected_true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pruning works for boolean columns now
@@ -276,6 +377,7 @@ macro_rules! binary_array_op_scalar { | |||
DataType::Date64 => { | |||
compute_op_scalar!($LEFT, $RIGHT, $OP, Date64Array) | |||
} | |||
DataType::Boolean => compute_bool_op_scalar!($LEFT, $RIGHT, $OP, BooleanArray), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding this line and the one below it adds all the new support, which is kind of cool! It is terrifying how many functions end up being called :)
// where a null array is generated for some statistics columns | ||
// int > 1 and bool = true => c1_max > 1 and null | ||
let expr = col("c1").gt(lit(15)).and(col("c2").eq(lit(true))); | ||
// test row group predicate with an unknown (Null) expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now bool stats don't result in null columns, so I needed to use a constant to get the same effect
This PR is ready for review / analysis if/when you get a chance @jimexist / @Dandandan / @houqp / @rdettai. It looks much bigger than it is because of the tests. It is mostly about hooking up some more arrow compute kernels There are many PRs flying in DataFusion recently 😅 fun times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition @alamb ! thanks !
Co-authored-by: rdettai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @rdettai -- I'll plan to merge this one in tomorrow and file arrow-rs tickets if there are no other comments.
.expect("compute_op failed to downcast array"); | ||
// generate the scalar function name, such as lt_scalar, from the $OP parameter | ||
// (which could have a value of lt) and the suffix _scalar | ||
Ok(Arc::new(paste::expr! {[<$OP _bool_scalar>]}( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record this pattern is used elsewhere in this file, I was just following it :)
Which issue does this PR close?
Resolves #1159
PR is mostly tests
Sorry for what looks like a large PR :( I blame the test
Rationale for this change
See #1159
This is mostly interesting so that during constant folding / simplification we can simplify down de-generate expressions like
true = false
Now that @jimexist added apache/arrow-rs#860 and @Dandandan added apache/arrow-rs#844 in arrow, this PR hooks that up
Also, it has the nice side effect benefit parquet row group pruning is now supported for boolean columns as well 🎉
What changes are included in this PR?
=
,<
,<=
,>
,>=
,!=
,is distinct from
,is not distinct from
forBooleanArray
(aka for boolean columns)*_scalar_bool
Are there any user-facing changes?
Less errors