Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Support SimplifyWithGuarantee for is_in expressions #43187

Closed
larry98 opened this issue Jul 8, 2024 · 3 comments
Closed

[C++] Support SimplifyWithGuarantee for is_in expressions #43187

larry98 opened this issue Jul 8, 2024 · 3 comments

Comments

@larry98
Copy link
Contributor

larry98 commented Jul 8, 2024

Describe the enhancement requested

We'd like to use parquet predicate pushdown on is_in expressions, but this currently isn't supported in SimplifyWithGuarantee. We implemented a proof of concept where we sort and deduplicate the is_in expression's value set, then have SimplifyWithGuarantee binary search on the inequality bound and slice the value set accordingly. This works well, but I'm not sure what the correct interface for enabling this code path should be. Our current approach adds a new field to SetLookupOptions which allows the user to declare whether the value set is pre-sorted and deduplicated.

Any thoughts? I'd be happy to put up a PR if we agree on an interface.

Component(s)

C++

@larry98 larry98 changed the title Support SimplifyWithGuarantee for is_in expressions [C++] Support SimplifyWithGuarantee for is_in expressions Jul 8, 2024
@mapleFU
Copy link
Member

mapleFU commented Jul 11, 2024

@larry98 feel free to do this and @ me if you finish

This requires some enhancement on KnownFieldValues in this function. Besides the syntax of "in" in SQL with null would be a bit hacking, which should be take care of

@larry98
Copy link
Contributor Author

larry98 commented Jul 15, 2024

@mapleFU I opened a PR (#43256) with the change that I was thinking of. I don't fully follow the point about KnownFieldValues - I think I may be proposing something simpler.

bkietz added a commit that referenced this issue Sep 10, 2024
### Rationale for this change

Prior to #43256, this PR adds a basic implementation that does a linear scan filter over the value set on each guarantee. This isolates the correctness/semantics of `is_in` predicate simplification from the binary search performance optimization.

### What changes are included in this PR?

`SimplifyWithGuarantee` now handles `is_in` expressions.

### Are these changes tested?

A new unit test was added to arrow-compute-expression-test testing this change.

### Are there any user-facing changes?

No.
* GitHub Issue: #43187

Lead-authored-by: Larry Wang <[email protected]>
Co-authored-by: larry98 <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
@bkietz bkietz added this to the 18.0.0 milestone Sep 10, 2024
@bkietz
Copy link
Member

bkietz commented Sep 10, 2024

Issue resolved by pull request 43761
#43761

@bkietz bkietz closed this as completed Sep 10, 2024
khwilson pushed a commit to khwilson/arrow that referenced this issue Sep 14, 2024
…pache#43761)

### Rationale for this change

Prior to apache#43256, this PR adds a basic implementation that does a linear scan filter over the value set on each guarantee. This isolates the correctness/semantics of `is_in` predicate simplification from the binary search performance optimization.

### What changes are included in this PR?

`SimplifyWithGuarantee` now handles `is_in` expressions.

### Are these changes tested?

A new unit test was added to arrow-compute-expression-test testing this change.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#43187

Lead-authored-by: Larry Wang <[email protected]>
Co-authored-by: larry98 <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
zeroshade pushed a commit to zeroshade/arrow that referenced this issue Sep 30, 2024
…pache#43761)

### Rationale for this change

Prior to apache#43256, this PR adds a basic implementation that does a linear scan filter over the value set on each guarantee. This isolates the correctness/semantics of `is_in` predicate simplification from the binary search performance optimization.

### What changes are included in this PR?

`SimplifyWithGuarantee` now handles `is_in` expressions.

### Are these changes tested?

A new unit test was added to arrow-compute-expression-test testing this change.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#43187

Lead-authored-by: Larry Wang <[email protected]>
Co-authored-by: larry98 <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants