-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Remove nan-likes from MultiIndex levels #29111
Comments
@jreback @TomAugspurger , @jorisvandenbossche, any comments? All of the nan-likes are encoded to -1 already, so no information will really be lost from this, and this will unite |
In principle, it would certainly be nice to clean that up I think, as that doesn't look good. Can we think of potential changes that can impact users? This is also only applicable for object dtype level? |
Yes, I think this should only affect object dtype levels, because other dtypes can only have one nan-like value. I can't think how it could affect indexing, because indexing works using the codes, so all of NaN, NaT, None etc. already translate to the same code (-1). I could start working on it, and if I'm missing some effect that has unexpected implications, we could discuss it again. |
Can you try some examples with the index you show above? |
Eg |
Yes, I agree, seems like indexing MultiIndex by nans doesn't work currently. Also, given that all the nan-likes are encoded to -1 in Also, object dtype seems to be an anamoly as other dtypes actually don't keep nan-likes in the level: >>> mi = pd.MultiIndex.from_product([[10, np.nan]])
>>> mi.levels[0]
Int64Index([10], dtype='int64')
>>> mi.codes[0]
FrozenNDArray([0, -1], dtype='int8') |
Your proposal sounds reasonable.
…On Wed, Oct 23, 2019 at 2:08 PM Terji Petersen ***@***.***> wrote:
Yes, I agree, seems like indexing MultiIndex by nans doesn't work
currently.
Also, given that all the nan-likes are encoded to -1 in .codes, even it
it indexing with nans did work, there AFAIKC, couldn't be any way to
differentiate between nan and None anyway...
Also, object dtype seems to be an anamoly as other dtypes actually don't
keep nan.likes in the level:
>>> mi = pd.MultiIndex.from_product([[10, np.nan]])>>> mi.levels[0]
Int64Index([1], dtype='int64')>>> mi.codes[0]
FrozenNDArray([0, -1], dtype='int8')
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29111?email_source=notifications&email_token=AAKAOIRJUGUCLSQKO3KWGJ3QQCOKBA5CNFSM4JCTSLCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECCRE3Y#issuecomment-545591919>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIQ6T5WFM6VLHIYEXBLQQCOKBANCNFSM4JCTSLCA>
.
|
cc @toobaz in case you have any experience with NaNs in object MultiIndexes
If it would have worked, we might need to ensure it keeps working (even when that specific indicator would not be in the values anymore), or have some deprecation for it. But ok, since it seems to not work, no need to argue for this ;) |
Some random other observations / thoughts: I was wondering what happens if you convert such a MI into a normal index by eg dropping index levels:
So it seems you only get NaNs, and that also does not depend on the order of the missing value indicators in the levels (so if NaN is not the first, you still get NaN). If you create a MultiIndex in a more typical way (not by constructing it with the MI constructor from levels and codes, but eg by setting columns as the index), you get "properly" constructed MIs:
The same is true for
All more reasons to fix this inconsistency. However, on:
It seems not unique to object dtype (you used
So it seems this is a general issue with the MultiIndex constructor. |
Can we mark this as a bug? It seems that there is agreement that |
Working on #27138 I've found that
MultiIndex
keeps nan-likes in the levels, but encode them all to -1:All the MultiIndex nan-likes are encoded to -1, so it's not possible to decode them to their constituent values. So it's not possible to get more than one nan-like values out of the MultiIndex, so in this case
None
andNaT
disappears when converting:I think if nan-likes are all encoded to -1, it'd be more consistent to not have them in the levels, similarly to how
Categorical
does it already.Is there acceptance to change the MultiIndex API so we get nan-likes out of the labels? That would give them an API more similar to
Categorical
.@pandas-dev/pandas-core.
The text was updated successfully, but these errors were encountered: