-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame reductions with object dtype and axis=1 #49603
Comments
I'm now thinking that the result should be int instead. Here are the results with the BlockManager
and the ArrayManager
For With the ArrayManager, the 1-dim reduction results in a Python object (an integer), which is then inferred to be of integer type when the resulting array is constructed. With the BlockManager, the 2-dim reduction results in a 1-dim NumPy array of object type (which remains object type). In other words, the ArrayManager naturally needs to infer from Python objects whereas the BlockManager does not. I think these should be consistent, but could see arguments either way. I think I would prefer the BlockManager to attempt to convert from object rather than the ArrayManager casting to object. For This result being float looks like a bug to me; we explicitly try to cast to float at the end of cc @jorisvandenbossche @jbrockmendel for any thoughts |
The 1-dim reduction in AM is tricky (also shows up in 1D EAs e.g #42895). My inclination here is to treat |
In the linked issue, the dtype starts as Int64, so makes sense that object is wrong. However here the starting dtype is object; does it make sense for the output to be int as opposed to object?
Thanks, I find this compelling. What about for something like |
In line with #51205 for groupby, I think object dtypes should remain object across all aggregations. |
In pandas 1.5.x and before, the dtype of the result is different when using
axis=1
:When using
axis=0
, both the examples above result in object dtype. #49551 changed thenumeric_only=False
case to returnfloat64
in order to preserve the default behavior. However, it seems better to have the result be object dtype for both examples above instead.The text was updated successfully, but these errors were encountered: