Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeriesGroupby.cumsum raises on object dtype #13992

Closed
agraboso opened this issue Aug 13, 2016 · 3 comments
Closed

SeriesGroupby.cumsum raises on object dtype #13992

agraboso opened this issue Aug 13, 2016 · 3 comments
Labels
Bug Groupby Strings String extension data type and string data

Comments

@agraboso
Copy link
Contributor

import pandas as pd

s = pd.Series(list('ABCDEF'))
grouper = pd.Series([0]*3+[1]*3)

The obvious attempt to obtain

0      A
1     AB
2    ABC
3      D
4     DE
5    DEF
dtype: object

using

s.groupby(grouper).cumsum()

raises a DataError: No numeric types to aggregate. A workaround is available via

s.groupby(grouper).apply(pd.Series.cumsum)

SeriesGroupby.cumsum should follow the behavior of SeriesGroupby.sum, where both s.groupby(grouper).apply(pd.Series.sum) and s.groupby(grouper).sum() produce the correct output:

0    ABC
1    DEF
dtype: object

This was introduced some time between 0.15.2 and 0.18.1, as observed here.

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Aug 16, 2016

I guess, we had some discussion we should actually remove this behavior for .sum. It is pretty confusing as its an implicit (and prob unwanted operation); rather it should be explicit.

see #13416

@rhshadrach
Copy link
Member

From #13416 (comment)

Maybe it should be excluded by default even if categorical internal is numeric, and included if numeric_only=True? Thus, groupby agg also should have numeric_only kwd.

groupby(...).sum now has a numeric_only=True argument, so as far as I can tell, the decision was made to have numeric only by default but also support non-numeric. For API consistency, it seems to me that cumsum et al should be the same.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach
Copy link
Member

numeric_only=False is now the default consistently, and cumsum on object dtype is #44009. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

4 participants