-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cannot find maxima of a categorical series #15641
Comments
Hmmm. Can you try: diff --git a/python/cudf/cudf/core/column/categorical.py b/python/cudf/cudf/core/column/categorical.py
index e3e7303504..fc996e6b6a 100644
--- a/python/cudf/cudf/core/column/categorical.py
+++ b/python/cudf/cudf/core/column/categorical.py
@@ -515,6 +515,10 @@ class CategoricalColumn(column.ColumnBase):
dtype: cudf.core.dtypes.CategoricalDtype
_codes: Optional[NumericalColumn]
_children: Tuple[NumericalColumn]
+ _VALID_REDUCTIONS = {
+ "max",
+ "min",
+ }
_VALID_BINARY_OPERATIONS = {
"__eq__",
"__ne__",
@@ -699,6 +703,27 @@ class CategoricalColumn(column.ColumnBase):
),
)
+ def _reduce(
+ self,
+ op: str,
+ skipna: Optional[bool] = None,
+ min_count: int = 0,
+ *args,
+ **kwargs,
+ ) -> ScalarLike:
+ # Only valid reductions are min and max
+ if not self.ordered:
+ raise TypeError(
+ "Categorical is not ordered for operation min "
+ "you can use .as_ordered() to change the Categorical "
+ "to an ordered one."
+ )
+ return self._encode(
+ self.codes._reduce(
+ op=op, skipna=skipna, min_count=min_count, *args, **kwargs
+ )
+ )
+
def _binaryop(self, other: ColumnBinaryOperand, op: str) -> ColumnBase:
other = self._wrap_binop_normalization(other)
# TODO: This is currently just here to make mypy happy, but eventually Aside, it is mind-boggling to me that |
@rjzamora any chance you tried out @wence- 's snippet above?
A weak ordering and we don't like ties??? I don't know, I agree that this seems nonsensical on its face. |
I did some digging, it's a combination of matching R's factor API and implementation leaking into semantics. See discussions pandas-dev/pandas#9611, pandas-dev/pandas#9622, and pandas-dev/pandas#12785 Effectively by having sort-based groupby be a promise, you're backed into a corner by having unordered categoricals as groupby keys. Of course, the right answer is to say "no can do", but ... |
Closes #15641 Applies patch suggested by @wence- Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #15701
Describe the bug
While debugging some strange dask-expr + cudf sorting behavior, I realized that I cannot call
ser.min()
whenser
is a categoricalSeries
. This is a problem for thesort_values
logic used in dask.Steps/Code to reproduce bug
Expected behavior
I'd expect for
min
/max
to work (assuming the dtype is "ordered").The text was updated successfully, but these errors were encountered: