Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix invalid null handling in filter #296

Merged
merged 3 commits into from
May 21, 2021
Merged

Conversation

ritchie46
Copy link
Contributor

Which issue does this PR close?

This fixes #295.

This is a very simple solution to the problem and doesn't change any filtering behavior (though we may simplify the filtering if this PR is ok). In case of null values in a boolean mask I do a mask & null_bitmask operation and create a new boolean predicate that has no null values (all nulls are false), due to the AND operation.

@Dandandan
Copy link
Contributor

This looks cool! Also if we can simplify / speed up filters this way, that would be very interesting, there is still quite some optimization potential in the filter kernel I believe.

@codecov-commenter
Copy link

codecov-commenter commented May 16, 2021

Codecov Report

Merging #296 (973a10c) into master (3665296) will increase coverage by 0.02%.
The diff coverage is 89.44%.

❗ Current head 973a10c differs from pull request most recent head 5fc668f. Consider uploading reports for the commit 5fc668f to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #296      +/-   ##
==========================================
+ Coverage   82.49%   82.52%   +0.02%     
==========================================
  Files         162      162              
  Lines       43980    44029      +49     
==========================================
+ Hits        36283    36336      +53     
+ Misses       7697     7693       -4     
Impacted Files Coverage Δ
arrow/src/array/builder.rs 85.29% <ø> (ø)
arrow/src/ffi.rs 82.61% <87.15%> (+1.38%) ⬆️
arrow/src/array/ffi.rs 100.00% <100.00%> (+14.86%) ⬆️
arrow/src/compute/kernels/filter.rs 92.98% <100.00%> (+0.58%) ⬆️
parquet/src/encodings/encoding.rs 94.85% <0.00%> (-0.20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3665296...5fc668f. Read the comment docs.

@alamb
Copy link
Contributor

alamb commented May 17, 2021

@jhorstmann / @Dandandan do you think this one is ready to go?

@jhorstmann
Copy link
Contributor

I checked out the branch to have a look with a bit more context. The logic looks good and makes this kernel a lot easier to use. The test looks good and should cover exactly this problem.

One minor thing: There is a doc comment with a warning about null values above the filter function, which is now out of data.

@Dandandan
Copy link
Contributor

One small future improvement might be not creating a new array for filters, but computing / passing the changed buffer instead.

But the removal of undefined behavior for this kernel is really good I would say 👍

@alamb alamb added arrow Changes to the arrow crate bug labels May 18, 2021
.downcast_ref::<PrimitiveArray<Int64Type>>()
.unwrap();
assert_eq!(mask0, mask1);
assert_eq!(out_arr0, out_arr1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check and test makes sense to me (that the result of filtering using the output of eq should equal the output of filtering with a boolean mask that has nulls) 👍

I also ran this test with the change in this PR commented out and it failed (as expected) in this way:

---- compute::kernels::filter::tests::test_null_mask stdout ----
thread 'compute::kernels::filter::tests::test_null_mask' panicked at 'assertion failed: `(left == right)`
  left: `PrimitiveArray<Int64>
[
  1,
  2,
  null,
]`,
 right: `PrimitiveArray<Int64>
[
  1,
  2,
]`', arrow/src/compute/kernels/filter.rs:606:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

So 👍

@Dandandan
Copy link
Contributor

Rerunning the CI as part of it showed red for some reason.

@Dandandan Dandandan merged commit f042191 into apache:master May 21, 2021
alamb pushed a commit that referenced this pull request May 25, 2021
* fix invalid null handling in filter

* take offset into account

* remove incorrect UB warning
alamb added a commit that referenced this pull request May 26, 2021
* fix invalid null handling in filter

* take offset into account

* remove incorrect UB warning

Co-authored-by: Ritchie Vink <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter has inconsistent null handling
5 participants