-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Integer overflow in df.sum() #8449
Comments
This is because, unlike CuPy, cuDF does not always assume As a workaround, you could specify the result type: In [32]: cupy.random.seed(314)
...: cudf.DataFrame(cupy.random.randint(2, size=10000, dtype=cupy.int8)).sum(dtype="int64")
Out[32]:
0 5058
dtype: int64 |
Pandas yields an
However it seems to have no actual guards against overflow either
IMO what we do about this depends on if we'd have to internally make an upcasted copy of the data to do the reduction over, vs if libcudf's output type provides at workaround for this. heres the header. From this it looks like it's clever enough to just accumulate the result into an |
We can expect libcudf to do the right thing here. From Python, we should specify |
That seems sensible to me, just wanted to make sure that wasn't translating to |
Sounds good, it seems like it would also give issues with floats as well. Related issue, but in cupy:
Out: array(inf, dtype=float16)
Out: 0 494430.96875 So sum of floats seem to cast to float32, Making the sequence larger I get
0 inf Although this works when specifying sum dtype, which is very odd:
0 4.999604e+12 |
This issue has been labeled |
Moving this casting logic to python and updating it so that integer sum and product operations give back an `int64` and give back the original column dtype in float cases. This is a breaking change. Closes #8449 Authors: - https://github.com/brandon-b-miller Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #9717
Integer overflow occurs when calculating a sum over a dataframe column, but not in cupy
Steps/Code to reproduce bug
Out: 5058
cupy.random.seed(314)
cudf.DataFrame(cupy.random.randint(2, size=10000, dtype=cupy.int8)).sum()
Out: 0 -62
dtype: int8
Expected behavior
Out: 5058
Environment overview (please complete the following information)
conda install of rapids-0.19
The text was updated successfully, but these errors were encountered: