Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isna does not work with explicit MultiIndex nan-representation #25

Open
coroa opened this issue Jun 25, 2023 · 1 comment
Open

isna does not work with explicit MultiIndex nan-representation #25

coroa opened this issue Jun 25, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@coroa
Copy link
Owner

coroa commented Jun 25, 2023

MultiIndex normally use -1 entries in .codes, which is correctly checked by isna, but .groupby(..., dropna=False) adds NaN to the end of .levels and uses their code instead.

Minimal demonstration:

>>> from numpy import nan
>>> from pandas import Series, MultiIndex
>>> s = (
...     Series(
...         3,
...         MultiIndex.from_tuples(
...             [(1, nan), (1, nan), (1, 2)], names=["a", "b"]
...         ),
...     )
...     .groupby(["a", "b"], dropna=False)
...     .sum()
...     .index
... )
>>> s
MultiIndex([(1, 2.0),
            (1, nan)],
           names=['a', 'b'])
>>> s.levels
FrozenList([[1], [2.0, nan]])
>>> s.codes
FrozenList([[0, 0], [0, 1]])

while all the regular MultiIndex constructors consolidate the NaN values to -1

>>> s2 = MultiIndex(s.levels, s.codes)
>>> s2.codes
FrozenList([[0, 0], [0, -1]])

It looks like this leads to all sorts of subtle bugs in pandas itself: pandas-dev/pandas#29111 , pandas-dev/pandas#36060 , pandas-dev/pandas#30750 ,
pandas-dev/pandas#43814 .

@coroa
Copy link
Owner Author

coroa commented Jun 25, 2023

It looks like this happened, because pandas-dev/pandas#30584 was implemented as a hack to factorize that effectively hid nan multi-index entries by defining them as a new category and afterwards no-one cleaned up :/. Would be fixed by pandas-dev/pandas#43943 , which made the grouper remove the added nan's after grouping!

@coroa coroa added the bug Something isn't working label Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant