-
Notifications
You must be signed in to change notification settings - Fork 821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix invalid null handling in filter #296
Conversation
This looks cool! Also if we can simplify / speed up filters this way, that would be very interesting, there is still quite some optimization potential in the filter kernel I believe. |
Codecov Report
@@ Coverage Diff @@
## master #296 +/- ##
==========================================
+ Coverage 82.49% 82.52% +0.02%
==========================================
Files 162 162
Lines 43980 44029 +49
==========================================
+ Hits 36283 36336 +53
+ Misses 7697 7693 -4
Continue to review full report at Codecov.
|
@jhorstmann / @Dandandan do you think this one is ready to go? |
I checked out the branch to have a look with a bit more context. The logic looks good and makes this kernel a lot easier to use. The test looks good and should cover exactly this problem. One minor thing: There is a doc comment with a warning about null values above the |
One small future improvement might be not creating a new array for filters, but computing / passing the changed buffer instead. But the removal of undefined behavior for this kernel is really good I would say 👍 |
.downcast_ref::<PrimitiveArray<Int64Type>>() | ||
.unwrap(); | ||
assert_eq!(mask0, mask1); | ||
assert_eq!(out_arr0, out_arr1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check and test makes sense to me (that the result of filtering using the output of eq
should equal the output of filtering with a boolean mask that has nulls) 👍
I also ran this test with the change in this PR commented out and it failed (as expected) in this way:
---- compute::kernels::filter::tests::test_null_mask stdout ----
thread 'compute::kernels::filter::tests::test_null_mask' panicked at 'assertion failed: `(left == right)`
left: `PrimitiveArray<Int64>
[
1,
2,
null,
]`,
right: `PrimitiveArray<Int64>
[
1,
2,
]`', arrow/src/compute/kernels/filter.rs:606:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
So 👍
Rerunning the CI as part of it showed red for some reason. |
* fix invalid null handling in filter * take offset into account * remove incorrect UB warning
* fix invalid null handling in filter * take offset into account * remove incorrect UB warning Co-authored-by: Ritchie Vink <[email protected]>
Which issue does this PR close?
This fixes #295.
This is a very simple solution to the problem and doesn't change any filtering behavior (though we may simplify the filtering if this PR is ok). In case of
null
values in a boolean mask I do amask
&null_bitmask
operation and create a newboolean
predicate that has no null values (all nulls arefalse
), due to the AND operation.