Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement boolean equality kernels #844

Merged
merged 4 commits into from
Oct 23, 2021
Merged

Conversation

Dandandan
Copy link
Contributor

Which issue does this PR close?

Closes #842

Rationale for this change

Implementing (fast) equality kernels for booleans.

What changes are included in this PR?

Implement eq_bool, neq_bool , eq_bool_scalar, neq_bool_scalar.

@Dandandan Dandandan requested a review from alamb October 21, 2021 21:05
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 21, 2021
right,
right_offset_in_bits,
len_in_bits,
|a, b| !(a ^ b),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works per pair of u64 instead of per bool.

};

let data = unsafe {
ArrayData::new_unchecked(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is a more elegant way of doing this.

@Dandandan
Copy link
Contributor Author

We have some formatting differences in some tests it seems from a rust version upgrade (?).

@Dandandan
Copy link
Contributor Author

Dandandan commented Oct 21, 2021

Also some new (good, unrelated) clippy error.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to review this in the morning. Thank you @Dandandan !

right: &Buffer,
right_offset_in_bits: usize,
len_in_bits: usize| {
bitwise_bin_op_helper(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using bitwise_bin_op_simd_helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only should be used together with a simd feature / implementation.
Certainly possible, but could be implemented as follow up work.


/// Perform `left != right` operation on [`BooleanArray`] and a scalar
fn neq_bool_scalar(left: &BooleanArray, right: bool) -> Result<BooleanArray> {
let len = left.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could simply be a call to eq_bool_scalar(left, !right)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@@ -1252,6 +1354,67 @@ mod tests {
);
}

#[test]
fn test_boolean_array_eq() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally since there is special case handling for processing these at 64 bits at a time, I would suggest tests that have at least 65 entries to test the boundary conditions. However, in this case as it is using a pre-existing helper method bitwise_bin_op_helper I am not sure such extra coverage is needed

@alamb
Copy link
Contributor

alamb commented Oct 22, 2021

Very cool -- thanks a lot @Dandandan

@codecov-commenter
Copy link

Codecov Report

Merging #844 (ecad119) into master (4cfe621) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #844      +/-   ##
==========================================
+ Coverage   82.64%   82.67%   +0.03%     
==========================================
  Files         168      168              
  Lines       48088    48129      +41     
==========================================
+ Hits        39742    39791      +49     
+ Misses       8346     8338       -8     
Impacted Files Coverage Δ
arrow/src/compute/kernels/boolean.rs 96.81% <100.00%> (ø)
arrow/src/compute/kernels/cast.rs 94.62% <100.00%> (+0.01%) ⬆️
arrow/src/compute/kernels/comparison.rs 95.55% <100.00%> (+0.35%) ⬆️
arrow/src/ipc/reader.rs 86.08% <100.00%> (+1.33%) ⬆️
parquet/src/schema/parser.rs 86.37% <100.00%> (+0.09%) ⬆️
arrow/src/array/equal_json.rs 84.92% <0.00%> (-0.29%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd9d561...ecad119. Read the comment docs.

@jhorstmann
Copy link
Contributor

Thanks! These look correct to me, including the null handling.

@alamb
Copy link
Contributor

alamb commented Oct 23, 2021

Thanks again @Dandandan

@alamb alamb merged commit 3ea0e18 into apache:master Oct 23, 2021
alamb pushed a commit that referenced this pull request Oct 24, 2021
* Implement boolean equality kernels

* Respect offset

* Simplify
@jimexist
Copy link
Member

@Dandandan @alamb i wonder if you think defining lt et. al. makes sense here, if we allow false < true. if that's defined then i can simplify this branch

@alamb
Copy link
Contributor

alamb commented Oct 25, 2021

@Dandandan @alamb i wonder if you think defining lt et. al. makes sense here, if we allow false < true. if that's defined then i can simplify this branch

I personally think it does make sense to define the other logical comparison operators for BooleanArray (even though the usecase for bool < bool is limited). My primary rationale is for completeness (SQL and Postgres allows for it) and I do think it will avoid having to special case boolean columns for features like DataFusion partition pruning (which converts = into <= and > comparisons for comparing to statistics)

Also, postgres supports it, FWIW

alamb=# select true < false;
 ?column? 
----------
 f
(1 row)

alamb=# select true > false;
 ?column? 
----------
 t
(1 row)

alamb added a commit that referenced this pull request Oct 25, 2021
* Implement boolean equality kernels

* Respect offset

* Simplify

Co-authored-by: Daniël Heres <[email protected]>
@Dandandan
Copy link
Contributor Author

I allgree it makes sense to define lt etc. 👍

@jimexist
Copy link
Member

I allgree it makes sense to define lt etc. 👍

in that case, after #858 I'll try to add it

@jimexist
Copy link
Member

I allgree it makes sense to define lt etc. 👍

see #860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add boolean equality and inequality kernels
6 participants