-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36420: [C++] Add An Enum Option For SetLookup Options #36739
Conversation
Why not improve |
+1. When possible, we typically prefer to add new options to existing functions instead of adding new functions. |
Hi, here are my two considerations:
But these two considerations are not strict obstacles for using an option instead. For 1, we can set If you think the solution is ok or have a better solution, I will change code to an new option version soon. |
@R-JunmingChen see my comments at #36420 (comment). Based on that, here is what I recommend: In
Alternatively we could name it For backward compatibility:
I believe this should work with both @westonpace does this look OK to you? |
Hi, @ianmcook. I think it's a good solution to use an enum option, but there are still somethings I think should be discussed.
|
Thanks @R-JunmingChen
It is nice to have
I think it is possible to define 2 explicit public functions of the class
I think that will give backward compatibility. |
Hi @ianmcook, do you mean we remove the member variable I think this way will affect some users who set If we don't remove member variable `skip_nulls, the translation problem still exists. For example:
|
Yes, I think this is OK. We can mark this as a breaking change if needed, but I think it should not affect most users. @westonpace or @bkietz can you please confirm? |
I would not expect there to be many direct users of the /// Options for IsIn and IndexIn functions
class ARROW_EXPORT SetLookupOptions : public FunctionOptions {
public:
enum NullMatchingBehavior { MATCH, SKIP, EMIT_NULL, INCONCLUSIVE };
explicit SetLookupOptions(Datum value_set, NullMatchingBehavior = MATCH);
explicit SetLookupOptions(Datum value_set, bool skip_nulls);
SetLookupOptions();
static constexpr char const kTypeName[] = "SetLookupOptions";
/// The set of values to look up input values into.
Datum value_set;
// will be overridden by skip_nulls if that is explicitly assigned
NullMatchingBehavior null_matching_behavior;
// DEPRECATED(use null_matching_behavior instead)
std::optional<bool> skip_nulls;
}; with this, |
I will implement this version ASAP. Let me turn this PR to a draft temporarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me in terms of its API and docs. I defer to @bkietz to give a technical approval.
It looks like Python tests are failing. Can you troubleshoot that please? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
CI failures seem unrelated:
Dev failure looks like #37803
Appveyor failure looks like #37790
I've restarted the python test (which seemed to fail inside pip). If that passes I think we can merge
@ursabot please benchmark |
Benchmark runs are scheduled for commit 38c1a04. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 38c1a04. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
The Python failure looks like #37803 too. The error in its screenshot is similar with what we fails in our Python CI. |
I'm not sure why Conbench isn't working as expected here. The benchmarks should run after this merges and we can check for regressions there. |
Sorry, this looks to be a bug with the "ursabot please benchmark" workflow. Will fix today. |
That should be fixed. If you push an empty commit (voltrondata-labs/arrow-benchmarks-ci#111) and then re-run the command, you can try benchmarks again. Else, yeah, they will be run post-merge. |
@bkietz if you want to go ahead and merge when you're comfortable with this, we can just look at the benchmarks post-merge |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 772a01c. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…e#36739) ### Rationale for this change As apache#36420 says, we want add an sql-compatible `is_in` variant, which has a different logic handling Null. After a dicussion with @ ianmcook and @ bkietz, we decide to support an enum option `null_matching_behavior` for SetLookup, which actually adds two semantics of null handling for `is_in` and doesn't add an new behavior for `index_in`. The enum option `null_matching_behavior` will replace `skip_nulls` in the future. ### What changes are included in this PR? Add an enum parameter `null_matching_behavior` for SetLookupOptions. ### Are these changes tested? Two kinds of tests are implemented - Replace default parameter with `null_matching_behavior` instead of `skip_nulls` for `is_in` and `index_in` tests - Add tests for `NullMatchingBehavior::EMIT_NULL` and `NullMatchingBehavior::INCONCLUSIVE` for `is_in` Besides, since the `skip_nulls` is not deprecated now, I still preserve the old tests with `skip_nulls`. When the `skip_nulls` is totally deprecated, we can replace the test parameter `skip_nulls=false` with `null_matching_behavior=MATCH` and `skip_nulls=true` with `null_matching_behavior=SKIP` for these old tests. ### Are there any user-facing changes? No. Currently we support backward compatibility. In the future, we plan to replace `skip_nulls` with `null_matching_behavior` completely. * Closes: apache#36420 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…e#36739) ### Rationale for this change As apache#36420 says, we want add an sql-compatible `is_in` variant, which has a different logic handling Null. After a dicussion with @ ianmcook and @ bkietz, we decide to support an enum option `null_matching_behavior` for SetLookup, which actually adds two semantics of null handling for `is_in` and doesn't add an new behavior for `index_in`. The enum option `null_matching_behavior` will replace `skip_nulls` in the future. ### What changes are included in this PR? Add an enum parameter `null_matching_behavior` for SetLookupOptions. ### Are these changes tested? Two kinds of tests are implemented - Replace default parameter with `null_matching_behavior` instead of `skip_nulls` for `is_in` and `index_in` tests - Add tests for `NullMatchingBehavior::EMIT_NULL` and `NullMatchingBehavior::INCONCLUSIVE` for `is_in` Besides, since the `skip_nulls` is not deprecated now, I still preserve the old tests with `skip_nulls`. When the `skip_nulls` is totally deprecated, we can replace the test parameter `skip_nulls=false` with `null_matching_behavior=MATCH` and `skip_nulls=true` with `null_matching_behavior=SKIP` for these old tests. ### Are there any user-facing changes? No. Currently we support backward compatibility. In the future, we plan to replace `skip_nulls` with `null_matching_behavior` completely. * Closes: apache#36420 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…e#36739) ### Rationale for this change As apache#36420 says, we want add an sql-compatible `is_in` variant, which has a different logic handling Null. After a dicussion with @ ianmcook and @ bkietz, we decide to support an enum option `null_matching_behavior` for SetLookup, which actually adds two semantics of null handling for `is_in` and doesn't add an new behavior for `index_in`. The enum option `null_matching_behavior` will replace `skip_nulls` in the future. ### What changes are included in this PR? Add an enum parameter `null_matching_behavior` for SetLookupOptions. ### Are these changes tested? Two kinds of tests are implemented - Replace default parameter with `null_matching_behavior` instead of `skip_nulls` for `is_in` and `index_in` tests - Add tests for `NullMatchingBehavior::EMIT_NULL` and `NullMatchingBehavior::INCONCLUSIVE` for `is_in` Besides, since the `skip_nulls` is not deprecated now, I still preserve the old tests with `skip_nulls`. When the `skip_nulls` is totally deprecated, we can replace the test parameter `skip_nulls=false` with `null_matching_behavior=MATCH` and `skip_nulls=true` with `null_matching_behavior=SKIP` for these old tests. ### Are there any user-facing changes? No. Currently we support backward compatibility. In the future, we plan to replace `skip_nulls` with `null_matching_behavior` completely. * Closes: apache#36420 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…e#36739) ### Rationale for this change As apache#36420 says, we want add an sql-compatible `is_in` variant, which has a different logic handling Null. After a dicussion with @ ianmcook and @ bkietz, we decide to support an enum option `null_matching_behavior` for SetLookup, which actually adds two semantics of null handling for `is_in` and doesn't add an new behavior for `index_in`. The enum option `null_matching_behavior` will replace `skip_nulls` in the future. ### What changes are included in this PR? Add an enum parameter `null_matching_behavior` for SetLookupOptions. ### Are these changes tested? Two kinds of tests are implemented - Replace default parameter with `null_matching_behavior` instead of `skip_nulls` for `is_in` and `index_in` tests - Add tests for `NullMatchingBehavior::EMIT_NULL` and `NullMatchingBehavior::INCONCLUSIVE` for `is_in` Besides, since the `skip_nulls` is not deprecated now, I still preserve the old tests with `skip_nulls`. When the `skip_nulls` is totally deprecated, we can replace the test parameter `skip_nulls=false` with `null_matching_behavior=MATCH` and `skip_nulls=true` with `null_matching_behavior=SKIP` for these old tests. ### Are there any user-facing changes? No. Currently we support backward compatibility. In the future, we plan to replace `skip_nulls` with `null_matching_behavior` completely. * Closes: apache#36420 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
Rationale for this change
As #36420 says, we want add an sql-compatible
is_in
variant, which has a different logic handling Null. After a dicussion with @ianmcook and @bkietz, we decide to support an enum optionnull_matching_behavior
for SetLookup, which actually adds two semantics of null handling foris_in
and doesn't add an new behavior forindex_in
.The enum option
null_matching_behavior
will replaceskip_nulls
in the future.What changes are included in this PR?
Add an enum parameter
null_matching_behavior
for SetLookupOptions.Are these changes tested?
Two kinds of tests are implemented
null_matching_behavior
instead ofskip_nulls
foris_in
andindex_in
testsNullMatchingBehavior::EMIT_NULL
andNullMatchingBehavior::INCONCLUSIVE
foris_in
Besides, since the
skip_nulls
is not deprecated now, I still preserve the old tests withskip_nulls
. When theskip_nulls
is totally deprecated, we can replace the test parameterskip_nulls=false
withnull_matching_behavior=MATCH
andskip_nulls=true
withnull_matching_behavior=SKIP
for these old tests.Are there any user-facing changes?
No. Currently we support backward compatibility. In the future, we plan to replace
skip_nulls
withnull_matching_behavior
completely.