You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
With pandas 2.0.0, the concat behavior has changed when concatenating a boolean and numeric dtype. It the resulting dtype used to be a numeric dtype, which can be written by mudata. However, this has been changed to object, which results in TypeError: Can't implicitly convert non-string objects to strings. The behavior of bool + nan is also different from the behaviour of str + nan, the latter causing no problems.
Warning in pandas 1.5.3:
FutureWarning: Behavior when concatenating bool-dtype and numeric-dtype arrays is deprecated; in a future version these will cast to object dtype (instead of coercing bools to numeric values). To retain the old behavior, explicitly cast bool-dtype arrays to numeric dtype.
Concat na (nan, float64) and na (nan, float64) results in: float64, able to write: True
Concat na (nan, float64) and string (str, category) results in: object, able to write: True
Concat na (nan, float64) and bool (True, bool) results in: object, able to write: False <--
Concat na (nan, float64) and float (1.0, float64) results in: float64, able to write: True
Concat string (str, category) and na (nan, float64) results in: object, able to write: True
Concat string (str, category) and string (str, category) results in: object, able to write: True
Concat string (str, object) and bool (True, bool) results in: object, able to write: False
Concat string (str, object) and float (1.0, float64) results in: object, able to write: False
Concat bool (True, bool) and na (nan, float64) results in: object, able to write: False <--
Concat bool (True, bool) and string (str, object) results in: object, able to write: False
Concat bool (True, bool) and bool (True, bool) results in: bool, able to write: True
Concat bool (True, bool) and float (1.0, float64) results in: object, able to write: False
Concat float (1.0, float64) and na (nan, float64) results in: float64, able to write: True
Concat float (1.0, float64) and string (str, object) results in: object, able to write: False
Concat float (1.0, float64) and bool (True, bool) results in: float64, able to write: True
Concat float (1.0, float64) and float (1.0, float64) results in: float64, able to write: True
Pandas: 2.0.0
anndata: 0.8.0
mudata: 0.2.2
With pandas 1.5.3:
Concat na (nan, float64) and na (nan, float64) results in: float64, able to write: True
Concat na (nan, float64) and string (str, category) results in: object, able to write: True
Concat na (nan, float64) and bool (True, bool) results in: float64, able to write: True <--
Concat na (nan, float64) and float (1.0, float64) results in: float64, able to write: True
Concat string (str, category) and na (nan, float64) results in: object, able to write: True
Concat string (str, category) and string (str, category) results in: object, able to write: True
Concat string (str, object) and bool (True, bool) results in: object, able to write: False
Concat string (str, object) and float (1.0, float64) results in: object, able to write: False
Concat bool (True, bool) and na (nan, float64) results in: float64, able to write: True <--
Concat bool (True, bool) and string (str, object) results in: object, able to write: False
Concat bool (True, bool) and bool (True, bool) results in: bool, able to write: True
Concat bool (True, bool) and float (1.0, float64) results in: object, able to write: False
Concat float (1.0, float64) and na (nan, float64) results in: float64, able to write: True
Concat float (1.0, float64) and string (str, object) results in: object, able to write: False
Concat float (1.0, float64) and bool (True, bool) results in: float64, able to write: True
Concat float (1.0, float64) and float (1.0, float64) results in: float64, able to write: True
Pandas: 1.5.3
anndata: 0.8.0
mudata: 0.2.2
Expected behaviour
I would not expect a change in behavior.
System
OS: macOS Ventura
Python version: 3.10.10
Versions of libraries involved: see examples above
Additional context
Could be related to scverse/anndata#679 but the issue being reported here is a behavior change so I would flag this as a separate bug (either way the discrepancy between str + nan and bool + nan should be resolved).
The text was updated successfully, but these errors were encountered:
Thanks for noticing this change of behaviour with pandas 2.0 and providing a great example to test it.
I've started addressing it in #43 with boolean + nan value combination that you highlighted.
So far I'm taking advantage of nullable boolean arrays.
In case you have any thoughts on what behaviour you would find most intuitive and/or how we can potentially generalise this decision making beyond just bool -> boolean conversion for nullable boolean arrays, I'd be interested to discuss it!
By the way, already with pandas 1.5.2 and mudata 0.2.3, float + bool is coerced to an object (same as bool + float).
And a short update is that mudata 0.3.0 will try to be more careful with using nullable boolean arrays to avoid potential issues like scverse/muon#111 (e.g. by using bool when there is no NA in the column in the end).
Describe the bug
With pandas 2.0.0, the concat behavior has changed when concatenating a boolean and numeric dtype. It the resulting dtype used to be a numeric dtype, which can be written by mudata. However, this has been changed to
object
, which results inTypeError: Can't implicitly convert non-string objects to strings
. The behavior ofbool
+nan
is also different from the behaviour ofstr
+nan
, the latter causing no problems.Warning in pandas 1.5.3:
To Reproduce
With pandas 2.0.0:
With pandas 1.5.3:
I think this can be tracked down to this concat:
mudata/mudata/_core/mudata.py
Lines 543 to 548 in da2de81
Expected behaviour
I would not expect a change in behavior.
System
Additional context
Could be related to scverse/anndata#679 but the issue being reported here is a behavior change so I would flag this as a separate bug (either way the discrepancy between
str
+nan
andbool
+nan
should be resolved).The text was updated successfully, but these errors were encountered: