Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] sum doesn't check for overflow #37090

Open
pitrou opened this issue Aug 9, 2023 · 9 comments · May be fixed by #37536
Open

[C++] sum doesn't check for overflow #37090

pitrou opened this issue Aug 9, 2023 · 9 comments · May be fixed by #37536

Comments

@pitrou
Copy link
Member

pitrou commented Aug 9, 2023

Describe the bug, including details regarding any error messages, version, and platform.

>>> pc.sum(pa.array([2**63, 2**63], type=pa.uint64()))
<pyarrow.UInt64Scalar: 0>
>>> pc.sum(pa.array([2**62, 2**62], type=pa.int64()))
<pyarrow.Int64Scalar: -9223372036854775808>

Component(s)

C++

@pitrou
Copy link
Member Author

pitrou commented Aug 9, 2023

@js8544 @felipecrv

@js8544
Copy link
Collaborator

js8544 commented Aug 9, 2023

I can add a sum_checked function that returns errors on overflow like cumulative_sum_checked. What do you think?

@pitrou
Copy link
Member Author

pitrou commented Aug 9, 2023

Yes, I think that would be worthwhile.

@pitrou
Copy link
Member Author

pitrou commented Aug 9, 2023

Of course, we have the same issue with product:

>>> pc.product([2**32,2**32])
<pyarrow.Int64Scalar: 0>

@js8544
Copy link
Collaborator

js8544 commented Aug 9, 2023

Of course, we have the same issue with product:

>>> pc.product([2**32,2**32])
<pyarrow.Int64Scalar: 0>

Sure, I'll add both.

@felipecrv
Copy link
Contributor

@js8544 ping me for help with the reviews.

@js8544 js8544 self-assigned this Aug 9, 2023
@js8544
Copy link
Collaborator

js8544 commented Aug 9, 2023

@felipecrv @pitrou I noticed that mean reuses the sum kernels and also suffer from overflow. It first sums all inputs (where overflow can happen) and then divide by count. However, since the result of mean is double, it should add the values as doubles, instead of reusing the sum kernels. Do we want:

  1. Refactor the mean kernel to add the values as doubles.
  2. Also add a mean_checked function.

I'm leaning towards option 1.

@assignUser
Copy link
Member

We have an issue for the mean overflow: #34909

@js8544
Copy link
Collaborator

js8544 commented Aug 9, 2023

OK, I'll first make mean use double for intermediate result and then add sum_checked in two different PRs. There are overlaps in these two changes and splitting them into two PRs would make code review easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants