Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] cumulative_sum ignores CumulativeSumOptions::check_overflow #35789

Closed
js8544 opened this issue May 26, 2023 · 0 comments · Fixed by #35790
Closed

[C++] cumulative_sum ignores CumulativeSumOptions::check_overflow #35789

js8544 opened this issue May 26, 2023 · 0 comments · Fixed by #35790
Assignees
Milestone

Comments

@js8544
Copy link
Collaborator

js8544 commented May 26, 2023

Describe the bug, including details regarding any error messages, version, and platform.

There are two variants of cumsum, cumulative_sum always wraps around on overflow and cumulative_sum always return Status::Invalid on overflow, regardless of the check_overflow parameter in CumulativeSumOptions.
For example:

CumulativeSumOptions options(/*start=*/0, /*skip_nulls=*/true, /*check_overflow=*/true);
auto res = CallFunction("cumulative_sum", {ArrayFromJSON(int8(), "[127, 1]")}, &options);
std::cout << res->make_array()->ToString() << std::endl;
// prints [127, -128], but user might expect an error returned.

I believe it's more of a ambiguity rather than a bug. The document of cumulative_sum also doesn't memtion check_overflow at all, and check_overflow is not exposed to pyarrow. So IMO the best approach to avoid confusion is to remove check_overflow from CumulativeSumOptions.

The only place where check_overflow is used is in the convenience function CumulativeSum defined in api_vector.h, which calls cumulative_sum_checked or cumulative_sum depending on the parameter. It can be replaced by a bool parameter in the convenience function.

Component(s)

C++

pitrou pushed a commit that referenced this issue Jun 1, 2023
### Rationale for this change

There are two variants of cumsum, `cumulative_sum` always wraps around on overflow and `cumulative_sum_checked` always return `Status::Invalid` on overflow, regardless of the `check_overflow` parameter in CumulativeSumOptions.
For example:
```cpp
CumulativeSumOptions options(/*start=*/0, /*skip_nulls=*/true, /*check_overflow=*/true);
auto res = CallFunction("cumulative_sum", {ArrayFromJSON(int8(), "[127, 1]")}, &options);
std::cout << res->make_array()->ToString() << std::endl;
// prints [127, -128], but user might expect an error returned.
```

I believe it's more of a ambiguity rather than a bug. The document of `cumulative_sum` also doesn't mention `check_overflow` at all, and `check_overflow` is not exposed to pyarrow. So IMO the best approach to avoid confusion is to remove `check_overflow` from CumulativeSumOptions.

The only place where `check_overflow` is used is in the C++ convenience function `CumulativeSum` defined in api_vector.h, which calls `cumulative_sum_checked` or `cumulative_sum` depending on the parameter. It can be replaced by a bool parameter in the C++ convenience function.

### What changes are included in this PR?

1. `check_overflow` is removed from CumulativeSumOptions.
2. Add a bool check_overflow parameter to the C++ convenience function CumulativeSum.

### Are these changes tested?

It doesn't affect the current tests.

### Are there any user-facing changes?

Yes, but only the C++ convenience function because check_overflow is not used in the kernel implementation and not exposed to pyarrow anyways.

* Closes: #35789

Authored-by: Jin Shang <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@pitrou pitrou added this to the 13.0.0 milestone Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants