-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: GroupBy.idxmax breaks when grouping by Categorical with unused categories #10694
Comments
The strange thing is I get even different error messages with the example on SO compared to my dummy example:
|
Also, On the other hand, |
Just ran into this bug (in the variant where I insert |
A quick fix for this is to handle empty categories yourself: |
I currently get The only value that seems to make sense for output in the case of a missing category Any thoughts on what the output should be, @jorisvandenbossche? |
Note: in pandas 3.0, we'll need to specify |
I'm thinking we should use NA for unobserved categories. This agrees with the other reductions (e.g. min), and makes it so that the op doesn't raise on certain values rather than others. |
Raising is the expected behavior for consistency with Series and DataFrame (see #33941 (comment)). However, we don't currently raise: we only raise when there is one grouping (e.g. |
From SO: http://stackoverflow.com/questions/31690493/idxmax-doesnt-work-on-seriesgroupby-that-contains-nan
idxmax
works with a normal dataframe:Also when 'a' is a categorical there are no problems:
But when it is a categorical with an unused category,
idxmax
andapply
don't work, onlyagg
does:Others like
first
,last
,max
,mean
do work correctly (idxmin
also fails).The text was updated successfully, but these errors were encountered: