BUG-24971 copying blocks also considers ndim #25521

JustinZhengBC · 2019-03-03T03:00:18Z

closes Categorical.replace returns unexpected dimensions for length 1 Series #24971
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The following gives a series containing [1] instead of 1

>>> pd.Series(pd.Categorical('A', categories=['A', 'B'])).replace({'A': 1, 'B': 2})
0    [1]
dtype: object

This bug occurs because in the process of copying the original categorical block (which is needed as the operation is not inplace), the constructor class for the new object defaults to ObjectBlock, whose constructor has a default ndim of 2. This PR alters the block copy function to specify that the newly constructed block should have the same ndim as the block being copied.

jreback · 2019-03-03T03:08:35Z

pandas/tests/series/test_constructors.py

+    def test_copy_categorical_ndim(self):
+        # GH 24971
+        s = Series(Categorical(['A'], categories=['A']))
+        assert not is_list_like(s.replace({'A': 1})[0])


use assert_series_equal

jreback · 2019-03-03T03:09:36Z

doc/source/whatsnew/v0.24.2.rst

@@ -66,7 +66,7 @@ Bug Fixes

 **Categorical**

-
+- Bug where a copy of a categorical could have the wrong dimensions (:issue:`24971`)


can you reword to make this more clear. mentione this is on a Series of categorical dtype. The dimenension issue is an internal one. You want to emphasize the user visible effects.

this is visible when you call .replace()

jreback · 2019-03-03T03:10:42Z

pandas/tests/series/test_constructors.py

@@ -532,6 +532,11 @@ def test_constructor_copy(self):
            assert x[0] == 2.
            assert y[0] == 1.

+    def test_copy_categorical_ndim(self):
+        # GH 24971


can you put all 3 tests from the OP here.

jreback · 2019-03-03T03:10:51Z

pandas/tests/series/test_constructors.py

+        # GH 24971
+        s = Series(Categorical(['A'], categories=['A']))
+        assert not is_list_like(s.replace({'A': 1})[0])
+


this should go with the replace tests

codecov · 2019-03-03T04:11:12Z

Codecov Report

Merging #25521 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #25521   +/-   ##
=======================================
  Coverage   91.75%   91.75%           
=======================================
  Files         173      173           
  Lines       52960    52960           
=======================================
  Hits        48591    48591           
  Misses       4369     4369

Flag	Coverage Δ
#multiple	`90.32% <100%> (ø)`	⬆️
#single	`41.71% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/internals/blocks.py	`94.08% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c193c6...3cfd808. Read the comment docs.

codecov · 2019-03-03T04:11:12Z

Codecov Report

Merging #25521 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25521      +/-   ##
==========================================
- Coverage   91.26%   91.26%   -0.01%     
==========================================
  Files         173      173              
  Lines       52982    52982              
==========================================
- Hits        48356    48355       -1     
- Misses       4626     4627       +1

Flag	Coverage Δ
#multiple	`89.83% <100%> (ø)`	⬆️
#single	`41.76% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/internals/blocks.py	`94.08% <100%> (ø)`	⬆️
pandas/util/testing.py	`89.3% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4663951...f7c9e3a. Read the comment docs.

jreback · 2019-03-04T18:57:30Z

pandas/tests/series/test_replace.py

+        s = pd.Series(categorical)
+        result = s.replace({'A': 1, 'B': 2})
+        expected = pd.Series(numeric)
+        print(result.dtype)


can you remove the prints

jreback · 2019-03-04T18:57:43Z

pandas/tests/series/test_replace.py

+        expected = pd.Series(numeric)
+        print(result.dtype)
+        print(expected.dtype)
+        tm.assert_series_equal(expected, result, check_dtype=False)


this coerces? (I think we have another issue about this) can you find & reference it

Looks like L292 has the reference now (#23305)

jreback · 2019-03-04T18:57:58Z

pandas/tests/series/test_replace.py

+        (pd.Categorical(('A', ), categories=['A', 'B']), [1]),
+        (pd.Categorical(('A', 'B'), categories=['A', 'B']), [1, 2]),
+    ])
+    def test_copy_categorical_ndim(self, categorical, numeric):


name to to test_replace_categorical

JustinZhengBC · 2019-03-04T23:13:13Z

@jreback I've made the requested changes

jreback · 2019-03-20T02:01:11Z

can you merge master

jreback · 2019-03-20T12:27:02Z

thanks @JustinZhengBC nice patch, keep em coming!

* upstream/master: (55 commits) PERF: Improve performance of StataReader (pandas-dev#25780) Speed up tokenizing of a row in csv and xstrtod parsing (pandas-dev#25784) BUG: Fix _binop for operators for serials which has more than one returns (divmod/rdivmod). (pandas-dev#25588) BUG-24971 copying blocks also considers ndim (pandas-dev#25521) CLN: Panel reference from documentation (pandas-dev#25649) ENH: Quoting column names containing spaces with backticks to use them in query and eval. (pandas-dev#24955) BUG: reading windows utf8 filenames in py3.6 (pandas-dev#25769) DOC: clean bug fix section in whatsnew (pandas-dev#25792) DOC: Fixed PeriodArray api ref (pandas-dev#25526) Move locale code out of tm, into _config (pandas-dev#25757) Unpin pycodestyle (pandas-dev#25789) Add test for rdivmod on EA array (GH23287) (pandas-dev#24047) ENH: Support datetime.timezone objects (pandas-dev#25065) Cython language level 3 (pandas-dev#24538) API: concat on sparse values (pandas-dev#25719) TST: assert_produces_warning works with filterwarnings (pandas-dev#25721) make core.config self-contained (pandas-dev#25613) CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721) TST: Check pytables<3.5.1 when skipping (pandas-dev#25773) DOC: Fix typo in docstring of DataFrame.memory_usage (pandas-dev#25770) ...

BUG-24971 copying blocks also considers ndim

c482548

jreback requested changes Mar 3, 2019

View reviewed changes

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 3, 2019

BUG-24971 clarify whatsnew and move tests

3cfd808

fix test

e22d67c

jreback requested changes Mar 4, 2019

View reviewed changes

rename test and add message

6c75e77

somiandras mentioned this pull request Mar 4, 2019

Columns lose category dtype after calling replace on the dataframe #23305

Closed

merge master

f7c9e3a

jreback added this to the 0.25.0 milestone Mar 20, 2019

jreback added the Categorical Categorical Data Type label Mar 20, 2019

jreback approved these changes Mar 20, 2019

View reviewed changes

jreback merged commit 27aa9d8 into pandas-dev:master Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG-24971 copying blocks also considers ndim #25521

BUG-24971 copying blocks also considers ndim #25521

JustinZhengBC commented Mar 3, 2019

jreback Mar 3, 2019

jreback Mar 3, 2019

jreback Mar 3, 2019

jreback Mar 3, 2019

jreback Mar 3, 2019

codecov bot commented Mar 3, 2019

codecov bot commented Mar 3, 2019 •

edited

Loading

jreback Mar 4, 2019

jreback Mar 4, 2019

TomAugspurger Mar 5, 2019

jreback Mar 4, 2019

JustinZhengBC commented Mar 4, 2019

jreback commented Mar 20, 2019

jreback commented Mar 20, 2019

BUG-24971 copying blocks also considers ndim #25521

BUG-24971 copying blocks also considers ndim #25521

Conversation

JustinZhengBC commented Mar 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 3, 2019

Codecov Report

codecov bot commented Mar 3, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JustinZhengBC commented Mar 4, 2019

jreback commented Mar 20, 2019

jreback commented Mar 20, 2019

codecov bot commented Mar 3, 2019 •

edited

Loading