BUG: Nullable integer type ("Int64") lost after summing along columns-index [df.sum(axis=1) #50438

brobr · 2022-12-26T02:08:19Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

di = pd.DataFrame({'A':[1, 2, 4],'B':[3, 4, -5]}, dtype='Int64')
di.sum().dtypes        # dtype('int64')

di.sum(axis=1).dtypes  # dtype('float64') ??

di.T.sum().T.dtypes    # dtype('int64') 

di.T.sum().T - di.sum(axis=1)
# 0    0.0
# 1    0.0
# 2    0.0
# dtype: float64

df = pd.DataFrame({'A':[1, 2, pd.NA, 4],'B':[3, pd.NA, 4, -5]}, dtype="Int64")
assert( (df.sum().dtypes == 'int64') and 
        (df.sum(axis=1).dtypes == 'float64')and 
        (df.T.sum().T.dtypes == 'int64') )

dg = pd.DataFrame({'A':[1, 2, None, 4],'B':[3, None, 4, -5]}, dtype='int') # FutureWarning
assert( (dg.sum().dtypes == 'O') and (dg.sum(axis=1).dtypes == 'float64'))

dh = pd.DataFrame({'A':[1, 2, 4],'B':[3, 4, -5]}, dtype='int')
assert(dh.sum().dtypes == dh.sum(axis=1).dtypes == 'int64')

Issue Description

With normal integers dh.sum(axis=1); dh.sum() the obtained sums are integers as well unless a value is missing, then things go odd (with dg.sum(); dg.sum(axis=1)) one gets an object or a float.

The proposed solution for this, the Nullable integer type ('Int64'), only partly works here.
Summing along the index axis (0, default), di.sum(), keeps 'Int64' as would be expected.
But this type is not kept when summing over rows, along the columns-axis.
See code example: with di.sum(axis=1) the resulting sums are dtype 'float' not 'Int64'.

Using the expected behaviour for axis=0, one can keep 'Int64' after summing rows by means of double transposition .

A pd.NA missing value in a dataframe of dtype 'Int64', also yields a float after summing rows (df.sum(axis=1))

Expected Behavior

Nullability of 'Int64', would mean that integers are not becoming floats due to other datatypes or after some normal operation on the dataframe (that would not affect integers, like summing).

Installed Versions

INSTALLED VERSIONS

commit : 8dab54d
python : 3.9.16.final.0
python-bits : 64
OS : Linux
OS-release : 5.19.17
Version : #1 SMP PREEMPT_DYNAMIC Mon Oct 24 13:00:29 CDT 2022
machine : x86_64
processor : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.5.2
numpy : 1.23.5
pytz : 2022.1
dateutil : 2.8.2
setuptools : 65.1.1
pip : 22.2.2
Cython : 0.29.28
pytest : 7.2.0
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : 1.4.45
tables : None
tabulate : None
xarray : 2022.12.0
xlrd : 1.1.0
xlwt : None
zstandard : None
tzdata : None

The text was updated successfully, but these errors were encountered:

phofl · 2022-12-26T11:06:48Z

Hi, thanks for your report. We have a bunch of open issues discussing this for reduction operations. Please search the issue tracker

brobr · 2022-12-26T15:30:22Z

Sorry, for the bother, but whatever you meant by "discussing this for reduction operations", I did not notice this seemingly quite weird error mentioned among the open issues for 'Int64' .

Could you maybe explain what exactly it was the duplicate of? There is talk of 'reduction operations' in #49603 (which does not mention that summing integer values over one axis should change type; it starts with objects), while #42895 referred there concerned the bug that a mean of 'Int64' values did not give a float (which was to be expected). By summing 'Int64' values you would expect to keep the type, or at least consistent output.

Possibly all this stuff is programmatically related but I am not too familiar with the inner workings of pandas. In view of the experimental state of "Int64" I hoped this user-feedback would have been helpful.

Please, don't get me wrong, I appreciate pandas enormously (it made an idea possible I had carried around for years before getting any clue how to do it until I learnt a bit of pandas). Keep up the good work.

phofl · 2022-12-26T15:41:34Z

The underlying reason is 1D eas like in #42895, the behavior you observed is off for all reduction operations

brobr · 2022-12-26T16:47:34Z

Thanks, that 1D 'eas' explains it all; sorry I missed that.

brobr added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 26, 2022

phofl closed this as completed Dec 26, 2022

phofl added Duplicate Report Duplicate issue or pull request NA - MaskedArrays Related to pd.NA and nullable extension arrays Reduction Operations sum, mean, min, max, etc. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Nullable integer type ("Int64") lost after summing along columns-index [df.sum(axis=1) #50438

BUG: Nullable integer type ("Int64") lost after summing along columns-index [df.sum(axis=1) #50438

brobr commented Dec 26, 2022 •

edited

Loading

phofl commented Dec 26, 2022

brobr commented Dec 26, 2022 •

edited

Loading

phofl commented Dec 26, 2022

brobr commented Dec 26, 2022 •

edited

Loading

BUG: Nullable integer type ("Int64") lost after summing along columns-index [df.sum(axis=1) #50438

BUG: Nullable integer type ("Int64") lost after summing along columns-index [df.sum(axis=1) #50438

Comments

brobr commented Dec 26, 2022 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Dec 26, 2022

brobr commented Dec 26, 2022 • edited Loading

phofl commented Dec 26, 2022

brobr commented Dec 26, 2022 • edited Loading

brobr commented Dec 26, 2022 •

edited

Loading

brobr commented Dec 26, 2022 •

edited

Loading

brobr commented Dec 26, 2022 •

edited

Loading