Skip to content

Commit

Permalink
DEPR: inplace keyword for Categorical.set_ordered, setting .categorie…
Browse files Browse the repository at this point in the history
…s directly (pandas-dev#47834)

* DEPR: inplcae keyword for Categorical.set_ordered, setting .categories directly

* update docs

* typo fixup

* suppress warning

Co-authored-by: Jeff Reback <[email protected]>
  • Loading branch information
2 people authored and noatamir committed Nov 9, 2022
1 parent 5b6b7f8 commit f815233
Show file tree
Hide file tree
Showing 12 changed files with 108 additions and 55 deletions.
6 changes: 3 additions & 3 deletions doc/source/user_guide/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -680,12 +680,12 @@ Converting the raw grades to a categorical data type:
df["grade"] = df["raw_grade"].astype("category")
df["grade"]
Rename the categories to more meaningful names (assigning to
:meth:`Series.cat.categories` is in place!):
Rename the categories to more meaningful names:

.. ipython:: python
df["grade"].cat.categories = ["very good", "good", "very bad"]
new_categories = ["very good", "good", "very bad"]
df["grade"] = df["grade"].cat.rename_categories(new_categories)
Reorder the categories and simultaneously add the missing categories (methods under :meth:`Series.cat` return a new :class:`Series` by default):

Expand Down
17 changes: 8 additions & 9 deletions doc/source/user_guide/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -334,18 +334,16 @@ It's also possible to pass in the categories in a specific order:
Renaming categories
~~~~~~~~~~~~~~~~~~~

Renaming categories is done by assigning new values to the
``Series.cat.categories`` property or by using the
Renaming categories is done by using the
:meth:`~pandas.Categorical.rename_categories` method:


.. ipython:: python
s = pd.Series(["a", "b", "c", "a"], dtype="category")
s
s.cat.categories = ["Group %s" % g for g in s.cat.categories]
s
s = s.cat.rename_categories([1, 2, 3])
new_categories = ["Group %s" % g for g in s.cat.categories]
s = s.cat.rename_categories(new_categories)
s
# You can also pass a dict-like object to map the renaming
s = s.cat.rename_categories({1: "x", 2: "y", 3: "z"})
Expand All @@ -365,7 +363,7 @@ Categories must be unique or a ``ValueError`` is raised:
.. ipython:: python
try:
s.cat.categories = [1, 1, 1]
s = s.cat.rename_categories([1, 1, 1])
except ValueError as e:
print("ValueError:", str(e))
Expand All @@ -374,7 +372,7 @@ Categories must also not be ``NaN`` or a ``ValueError`` is raised:
.. ipython:: python
try:
s.cat.categories = [1, 2, np.nan]
s = s.cat.rename_categories([1, 2, np.nan])
except ValueError as e:
print("ValueError:", str(e))
Expand Down Expand Up @@ -702,7 +700,7 @@ of length "1".
.. ipython:: python
df.iat[0, 0]
df["cats"].cat.categories = ["x", "y", "z"]
df["cats"] = df["cats"].cat.rename_categories(["x", "y", "z"])
df.at["h", "cats"] # returns a string
.. note::
Expand Down Expand Up @@ -960,7 +958,7 @@ relevant columns back to ``category`` and assign the right categories and catego
s = pd.Series(pd.Categorical(["a", "b", "b", "a", "a", "d"]))
# rename the categories
s.cat.categories = ["very good", "good", "bad"]
s = s.cat.rename_categories(["very good", "good", "bad"])
# reorder the categories and add missing categories
s = s.cat.set_categories(["very bad", "bad", "medium", "good", "very good"])
df = pd.DataFrame({"cats": s, "vals": [1, 2, 3, 4, 5, 6]})
Expand Down Expand Up @@ -1164,6 +1162,7 @@ Constructing a ``Series`` from a ``Categorical`` will not copy the input
change the original ``Categorical``:

.. ipython:: python
:okwarning:
cat = pd.Categorical([1, 2, 3, 10], categories=[1, 2, 3, 4, 10])
s = pd.Series(cat, name="cat")
Expand Down
3 changes: 2 additions & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -558,7 +558,8 @@ This matches the behavior of :meth:`Categorical.set_categories`.
df = pd.read_csv(StringIO(data), dtype="category")
df.dtypes
df["col3"]
df["col3"].cat.categories = pd.to_numeric(df["col3"].cat.categories)
new_categories = pd.to_numeric(df["col3"].cat.categories)
df["col3"] = df["col3"].cat.rename_categories(new_categories)
df["col3"]
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,7 @@ Individual columns can be parsed as a ``Categorical`` using a dict specification
such as :func:`to_datetime`.

.. ipython:: python
:okwarning:
df = pd.read_csv(StringIO(data), dtype="category")
df.dtypes
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -820,6 +820,8 @@ Other Deprecations
- Deprecated :meth:`Series.rank` returning an empty result when the dtype is non-numeric and ``numeric_only=True`` is provided; this will raise a ``TypeError`` in a future version (:issue:`47500`)
- Deprecated argument ``errors`` for :meth:`Series.mask`, :meth:`Series.where`, :meth:`DataFrame.mask`, and :meth:`DataFrame.where` as ``errors`` had no effect on this methods (:issue:`47728`)
- Deprecated arguments ``*args`` and ``**kwargs`` in :class:`Rolling`, :class:`Expanding`, and :class:`ExponentialMovingWindow` ops. (:issue:`47836`)
- Deprecated the ``inplace`` keyword in :meth:`Categorical.set_ordered`, :meth:`Categorical.as_ordered`, and :meth:`Categorical.as_unordered` (:issue:`37643`)
- Deprecated setting a categorical's categories with ``cat.categories = ['a', 'b', 'c']``, use :meth:`Categorical.rename_categories` instead (:issue:`37643`)
- Deprecated unused arguments ``encoding`` and ``verbose`` in :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` (:issue:`47912`)
- Deprecated producing a single element when iterating over a :class:`DataFrameGroupBy` or a :class:`SeriesGroupBy` that has been grouped by a list of length 1; A tuple of length one will be returned instead (:issue:`42795`)

Expand Down
70 changes: 48 additions & 22 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -745,15 +745,14 @@ def categories(self) -> Index:

@categories.setter
def categories(self, categories) -> None:
new_dtype = CategoricalDtype(categories, ordered=self.ordered)
if self.dtype.categories is not None and len(self.dtype.categories) != len(
new_dtype.categories
):
raise ValueError(
"new categories need to have the same number of "
"items as the old categories!"
)
super().__init__(self._ndarray, new_dtype)
warn(
"Setting categories in-place is deprecated and will raise in a "
"future version. Use rename_categories instead.",
FutureWarning,
stacklevel=find_stack_level(),
)

self._set_categories(categories)

@property
def ordered(self) -> Ordered:
Expand Down Expand Up @@ -814,7 +813,7 @@ def _set_categories(self, categories, fastpath=False):
):
raise ValueError(
"new categories need to have the same number of "
"items than the old categories!"
"items as the old categories!"
)

super().__init__(self._ndarray, new_dtype)
Expand All @@ -836,7 +835,9 @@ def _set_dtype(self, dtype: CategoricalDtype) -> Categorical:
return type(self)(codes, dtype=dtype, fastpath=True)

@overload
def set_ordered(self, value, *, inplace: Literal[False] = ...) -> Categorical:
def set_ordered(
self, value, *, inplace: NoDefault | Literal[False] = ...
) -> Categorical:
...

@overload
Expand All @@ -848,7 +849,9 @@ def set_ordered(self, value, *, inplace: bool) -> Categorical | None:
...

@deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "value"])
def set_ordered(self, value, inplace: bool = False) -> Categorical | None:
def set_ordered(
self, value, inplace: bool | NoDefault = no_default
) -> Categorical | None:
"""
Set the ordered attribute to the boolean value.
Expand All @@ -859,7 +862,22 @@ def set_ordered(self, value, inplace: bool = False) -> Categorical | None:
inplace : bool, default False
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to the value.
.. deprecated:: 1.5.0
"""
if inplace is not no_default:
warn(
"The `inplace` parameter in pandas.Categorical."
"set_ordered is deprecated and will be removed in "
"a future version. setting ordered-ness on categories will always "
"return a new Categorical object.",
FutureWarning,
stacklevel=find_stack_level(),
)
else:
inplace = False

inplace = validate_bool_kwarg(inplace, "inplace")
new_dtype = CategoricalDtype(self.categories, ordered=value)
cat = self if inplace else self.copy()
Expand All @@ -869,15 +887,15 @@ def set_ordered(self, value, inplace: bool = False) -> Categorical | None:
return None

@overload
def as_ordered(self, *, inplace: Literal[False] = ...) -> Categorical:
def as_ordered(self, *, inplace: NoDefault | Literal[False] = ...) -> Categorical:
...

@overload
def as_ordered(self, *, inplace: Literal[True]) -> None:
...

@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
def as_ordered(self, inplace: bool = False) -> Categorical | None:
def as_ordered(self, inplace: bool | NoDefault = no_default) -> Categorical | None:
"""
Set the Categorical to be ordered.
Expand All @@ -887,24 +905,29 @@ def as_ordered(self, inplace: bool = False) -> Categorical | None:
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to True.
.. deprecated:: 1.5.0
Returns
-------
Categorical or None
Ordered Categorical or None if ``inplace=True``.
"""
inplace = validate_bool_kwarg(inplace, "inplace")
if inplace is not no_default:
inplace = validate_bool_kwarg(inplace, "inplace")
return self.set_ordered(True, inplace=inplace)

@overload
def as_unordered(self, *, inplace: Literal[False] = ...) -> Categorical:
def as_unordered(self, *, inplace: NoDefault | Literal[False] = ...) -> Categorical:
...

@overload
def as_unordered(self, *, inplace: Literal[True]) -> None:
...

@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
def as_unordered(self, inplace: bool = False) -> Categorical | None:
def as_unordered(
self, inplace: bool | NoDefault = no_default
) -> Categorical | None:
"""
Set the Categorical to be unordered.
Expand All @@ -914,12 +937,15 @@ def as_unordered(self, inplace: bool = False) -> Categorical | None:
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to False.
.. deprecated:: 1.5.0
Returns
-------
Categorical or None
Unordered Categorical or None if ``inplace=True``.
"""
inplace = validate_bool_kwarg(inplace, "inplace")
if inplace is not no_default:
inplace = validate_bool_kwarg(inplace, "inplace")
return self.set_ordered(False, inplace=inplace)

def set_categories(
Expand Down Expand Up @@ -1108,11 +1134,11 @@ def rename_categories(
cat = self if inplace else self.copy()

if is_dict_like(new_categories):
cat.categories = [new_categories.get(item, item) for item in cat.categories]
new_categories = [new_categories.get(item, item) for item in cat.categories]
elif callable(new_categories):
cat.categories = [new_categories(item) for item in cat.categories]
else:
cat.categories = new_categories
new_categories = [new_categories(item) for item in cat.categories]

cat._set_categories(new_categories)
if not inplace:
return cat
return None
Expand Down
5 changes: 2 additions & 3 deletions pandas/io/stata.py
Original file line number Diff line number Diff line change
Expand Up @@ -1921,9 +1921,8 @@ def _do_convert_categoricals(
categories = list(vl.values())
try:
# Try to catch duplicate categories
# error: Incompatible types in assignment (expression has
# type "List[str]", variable has type "Index")
cat_data.categories = categories # type: ignore[assignment]
# TODO: if we get a non-copying rename_categories, use that
cat_data = cat_data.rename_categories(categories)
except ValueError as err:
vc = Series(categories).value_counts()
repeated_cats = list(vc.index[vc > 1])
Expand Down
15 changes: 12 additions & 3 deletions pandas/tests/arrays/categorical/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,13 +323,22 @@ def test_validate_inplace_raises(self, value):
f"received type {type(value).__name__}"
)
with pytest.raises(ValueError, match=msg):
cat.set_ordered(value=True, inplace=value)
with tm.assert_produces_warning(
FutureWarning, match="Use rename_categories"
):
cat.set_ordered(value=True, inplace=value)

with pytest.raises(ValueError, match=msg):
cat.as_ordered(inplace=value)
with tm.assert_produces_warning(
FutureWarning, match="Use rename_categories"
):
cat.as_ordered(inplace=value)

with pytest.raises(ValueError, match=msg):
cat.as_unordered(inplace=value)
with tm.assert_produces_warning(
FutureWarning, match="Use rename_categories"
):
cat.as_unordered(inplace=value)

with pytest.raises(ValueError, match=msg):
with tm.assert_produces_warning(FutureWarning):
Expand Down
18 changes: 13 additions & 5 deletions pandas/tests/arrays/categorical/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,22 +34,30 @@ def test_ordered_api(self):
assert cat4.ordered

def test_set_ordered(self):

msg = (
"The `inplace` parameter in pandas.Categorical.set_ordered is "
"deprecated and will be removed in a future version. setting "
"ordered-ness on categories will always return a new Categorical object"
)
cat = Categorical(["a", "b", "c", "a"], ordered=True)
cat2 = cat.as_unordered()
assert not cat2.ordered
cat2 = cat.as_ordered()
assert cat2.ordered
cat2.as_unordered(inplace=True)
with tm.assert_produces_warning(FutureWarning, match=msg):
cat2.as_unordered(inplace=True)
assert not cat2.ordered
cat2.as_ordered(inplace=True)
with tm.assert_produces_warning(FutureWarning, match=msg):
cat2.as_ordered(inplace=True)
assert cat2.ordered

assert cat2.set_ordered(True).ordered
assert not cat2.set_ordered(False).ordered
cat2.set_ordered(True, inplace=True)
with tm.assert_produces_warning(FutureWarning, match=msg):
cat2.set_ordered(True, inplace=True)
assert cat2.ordered
cat2.set_ordered(False, inplace=True)
with tm.assert_produces_warning(FutureWarning, match=msg):
cat2.set_ordered(False, inplace=True)
assert not cat2.ordered

# removed in 0.19.0
Expand Down
8 changes: 5 additions & 3 deletions pandas/tests/arrays/categorical/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,8 @@ def test_periodindex(self):
def test_categories_assignments(self):
cat = Categorical(["a", "b", "c", "a"])
exp = np.array([1, 2, 3, 1], dtype=np.int64)
cat.categories = [1, 2, 3]
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
cat.categories = [1, 2, 3]
tm.assert_numpy_array_equal(cat.__array__(), exp)
tm.assert_index_equal(cat.categories, Index([1, 2, 3]))

Expand All @@ -216,8 +217,9 @@ def test_categories_assignments_wrong_length_raises(self, new_categories):
"new categories need to have the same number of items "
"as the old categories!"
)
with pytest.raises(ValueError, match=msg):
cat.categories = new_categories
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
with pytest.raises(ValueError, match=msg):
cat.categories = new_categories

# Combinations of sorted/unique:
@pytest.mark.parametrize(
Expand Down
12 changes: 8 additions & 4 deletions pandas/tests/series/accessors/test_cat_accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,8 @@ def test_categorical_delegations(self):
ser = Series(Categorical(["a", "b", "c", "a"], ordered=True))
exp_categories = Index(["a", "b", "c"])
tm.assert_index_equal(ser.cat.categories, exp_categories)
ser.cat.categories = [1, 2, 3]
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
ser.cat.categories = [1, 2, 3]
exp_categories = Index([1, 2, 3])
tm.assert_index_equal(ser.cat.categories, exp_categories)

Expand All @@ -120,7 +121,8 @@ def test_categorical_delegations(self):
assert ser.cat.ordered
ser = ser.cat.as_unordered()
assert not ser.cat.ordered
return_value = ser.cat.as_ordered(inplace=True)
with tm.assert_produces_warning(FutureWarning, match="The `inplace`"):
return_value = ser.cat.as_ordered(inplace=True)
assert return_value is None
assert ser.cat.ordered

Expand Down Expand Up @@ -267,8 +269,10 @@ def test_set_categories_setitem(self):
df = DataFrame({"Survived": [1, 0, 1], "Sex": [0, 1, 1]}, dtype="category")

# change the dtype in-place
df["Survived"].cat.categories = ["No", "Yes"]
df["Sex"].cat.categories = ["female", "male"]
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
df["Survived"].cat.categories = ["No", "Yes"]
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
df["Sex"].cat.categories = ["female", "male"]

# values should not be coerced to NaN
assert list(df["Sex"]) == ["female", "male", "male"]
Expand Down
Loading

0 comments on commit f815233

Please sign in to comment.