Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35789: [C++] Remove check_overflow from CumulativeSumOptions #35790

Merged
merged 2 commits into from
Jun 1, 2023

Conversation

js8544
Copy link
Collaborator

@js8544 js8544 commented May 26, 2023

Rationale for this change

There are two variants of cumsum, cumulative_sum always wraps around on overflow and cumulative_sum_checked always return Status::Invalid on overflow, regardless of the check_overflow parameter in CumulativeSumOptions.
For example:

CumulativeSumOptions options(/*start=*/0, /*skip_nulls=*/true, /*check_overflow=*/true);
auto res = CallFunction("cumulative_sum", {ArrayFromJSON(int8(), "[127, 1]")}, &options);
std::cout << res->make_array()->ToString() << std::endl;
// prints [127, -128], but user might expect an error returned.

I believe it's more of a ambiguity rather than a bug. The document of cumulative_sum also doesn't mention check_overflow at all, and check_overflow is not exposed to pyarrow. So IMO the best approach to avoid confusion is to remove check_overflow from CumulativeSumOptions.

The only place where check_overflow is used is in the C++ convenience function CumulativeSum defined in api_vector.h, which calls cumulative_sum_checked or cumulative_sum depending on the parameter. It can be replaced by a bool parameter in the C++ convenience function.

What changes are included in this PR?

  1. check_overflow is removed from CumulativeSumOptions.
  2. Add a bool check_overflow parameter to the C++ convenience function CumulativeSum.

Are these changes tested?

It doesn't affect the current tests.

Are there any user-facing changes?

Yes, but only the C++ convenience function because check_overflow is not used in the kernel implementation and not exposed to pyarrow anyways.

@js8544 js8544 requested a review from westonpace as a code owner May 26, 2023 11:59
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #35789 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Member

pitrou commented Jun 1, 2023

There are two variants of cumsum, cumulative_sum always wraps around on overflow and cumulative_sum always return Status::Invalid on overflow,

This does not seem to make sense to me, is there a typo?

@pitrou
Copy link
Member

pitrou commented Jun 1, 2023

Regardless, the unused option is misleading, so I agree with removing it.

@js8544
Copy link
Collaborator Author

js8544 commented Jun 1, 2023

There are two variants of cumsum, cumulative_sum always wraps around on overflow and cumulative_sum always return Status::Invalid on overflow,

This does not seem to make sense to me, is there a typo?

Sorry! It's a typo indeed. I meant cumulative_sum_checked.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @js8544 . Let's just wait for CI now.

@js8544 js8544 force-pushed the jinshang/cumsum_check_overflow branch from bcef039 to c6b945a Compare June 1, 2023 15:33
@js8544
Copy link
Collaborator Author

js8544 commented Jun 1, 2023

The CI failure looks weird. I pushed an empty commit to trigger a rerun.

@pitrou
Copy link
Member

pitrou commented Jun 1, 2023

The CI failure was probably unrelated, but let's wait for it.

@pitrou pitrou merged commit d20a1d1 into apache:main Jun 1, 2023
@ursabot
Copy link

ursabot commented Jun 2, 2023

Benchmark runs are scheduled for baseline = e0fd7ef and contender = d20a1d1. d20a1d1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.41% ⬆️0.0%] test-mac-arm
[Finished ⬇️5.56% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.63% ⬆️0.06%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] d20a1d1e ec2-t3-xlarge-us-east-2
[Finished] d20a1d1e test-mac-arm
[Finished] d20a1d1e ursa-i9-9960x
[Finished] d20a1d1e ursa-thinkcentre-m75q
[Finished] e0fd7ef8 ec2-t3-xlarge-us-east-2
[Finished] e0fd7ef8 test-mac-arm
[Finished] e0fd7ef8 ursa-i9-9960x
[Finished] e0fd7ef8 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Jun 2, 2023

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] cumulative_sum ignores CumulativeSumOptions::check_overflow
3 participants