-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43187: [C++] Support basic is_in predicate simplification #43761
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, thanks!
.WithGuarantee( | ||
or_(equal(field_ref("i32"), literal(3)), is_null(field_ref("i32")))) | ||
.ExpectUnchanged(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this to pass:
Simplify{is_in(field_ref("i32"), int32(), "[1,3,null]", SetLookupOptions::MATCH)} | |
.WithGuarantee( | |
or_(equal(field_ref("i32"), literal(3)), is_null(field_ref("i32")))) | |
.Expect(is_in(field_ref("i32"), int32(), "[3,null]", SetLookupOptions::MATCH)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intention was to not support simplification if nulls in the value set cannot be dropped (either the guarantee is nullable or the null matching behavior is INCONCLUSIVE
). This is because in the optimized implementation where we binary search and slice the value set array, slicing the front would drop nulls (assuming they are placed at the end) so we would have to reallocate a new array for the simplified value set.
Do you think we ought to support nulls in the value set, and if so any thoughts on how we'd continue to support this with the binary search/slice implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering approach should be able to support arbitrary value sets, so it could serve as a fallback for the binary search/slice implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that sounds reasonable to me. I added new tests for nullable guarantees and nulls in the value set for all of the different null matching behaviors.
Co-authored-by: Benjamin Kietzman <[email protected]>
@bkietz any ideas on the failing checks? The errors don't appear to be relevant, but lmk if I should look into them. |
Failed CI unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this!
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 44b72d5. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 26 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…pache#43761) ### Rationale for this change Prior to apache#43256, this PR adds a basic implementation that does a linear scan filter over the value set on each guarantee. This isolates the correctness/semantics of `is_in` predicate simplification from the binary search performance optimization. ### What changes are included in this PR? `SimplifyWithGuarantee` now handles `is_in` expressions. ### Are these changes tested? A new unit test was added to arrow-compute-expression-test testing this change. ### Are there any user-facing changes? No. * GitHub Issue: apache#43187 Lead-authored-by: Larry Wang <[email protected]> Co-authored-by: larry98 <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…pache#43761) ### Rationale for this change Prior to apache#43256, this PR adds a basic implementation that does a linear scan filter over the value set on each guarantee. This isolates the correctness/semantics of `is_in` predicate simplification from the binary search performance optimization. ### What changes are included in this PR? `SimplifyWithGuarantee` now handles `is_in` expressions. ### Are these changes tested? A new unit test was added to arrow-compute-expression-test testing this change. ### Are there any user-facing changes? No. * GitHub Issue: apache#43187 Lead-authored-by: Larry Wang <[email protected]> Co-authored-by: larry98 <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
Rationale for this change
Prior to #43256, this PR adds a basic implementation that does a linear scan filter over the value set on each guarantee. This isolates the correctness/semantics of
is_in
predicate simplification from the binary search performance optimization.What changes are included in this PR?
SimplifyWithGuarantee
now handlesis_in
expressions.Are these changes tested?
A new unit test was added to arrow-compute-expression-test testing this change.
Are there any user-facing changes?
No.