Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG-24971 copying blocks also considers ndim #25521

Merged
merged 5 commits into from
Mar 20, 2019

Conversation

JustinZhengBC
Copy link
Contributor

The following gives a series containing [1] instead of 1

>>> pd.Series(pd.Categorical('A', categories=['A', 'B'])).replace({'A': 1, 'B': 2})
0    [1]
dtype: object

This bug occurs because in the process of copying the original categorical block (which is needed as the operation is not inplace), the constructor class for the new object defaults to ObjectBlock, whose constructor has a default ndim of 2. This PR alters the block copy function to specify that the newly constructed block should have the same ndim as the block being copied.

def test_copy_categorical_ndim(self):
# GH 24971
s = Series(Categorical(['A'], categories=['A']))
assert not is_list_like(s.replace({'A': 1})[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assert_series_equal

@@ -66,7 +66,7 @@ Bug Fixes

**Categorical**

-
- Bug where a copy of a categorical could have the wrong dimensions (:issue:`24971`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reword to make this more clear. mentione this is on a Series of categorical dtype. The dimenension issue is an internal one. You want to emphasize the user visible effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is visible when you call .replace()

@@ -532,6 +532,11 @@ def test_constructor_copy(self):
assert x[0] == 2.
assert y[0] == 1.

def test_copy_categorical_ndim(self):
# GH 24971
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put all 3 tests from the OP here.

# GH 24971
s = Series(Categorical(['A'], categories=['A']))
assert not is_list_like(s.replace({'A': 1})[0])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go with the replace tests

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 3, 2019
@codecov
Copy link

codecov bot commented Mar 3, 2019

Codecov Report

Merging #25521 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #25521   +/-   ##
=======================================
  Coverage   91.75%   91.75%           
=======================================
  Files         173      173           
  Lines       52960    52960           
=======================================
  Hits        48591    48591           
  Misses       4369     4369
Flag Coverage Δ
#multiple 90.32% <100%> (ø) ⬆️
#single 41.71% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 94.08% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c193c6...3cfd808. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 3, 2019

Codecov Report

Merging #25521 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25521      +/-   ##
==========================================
- Coverage   91.26%   91.26%   -0.01%     
==========================================
  Files         173      173              
  Lines       52982    52982              
==========================================
- Hits        48356    48355       -1     
- Misses       4626     4627       +1
Flag Coverage Δ
#multiple 89.83% <100%> (ø) ⬆️
#single 41.76% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 94.08% <100%> (ø) ⬆️
pandas/util/testing.py 89.3% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4663951...f7c9e3a. Read the comment docs.

s = pd.Series(categorical)
result = s.replace({'A': 1, 'B': 2})
expected = pd.Series(numeric)
print(result.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove the prints

expected = pd.Series(numeric)
print(result.dtype)
print(expected.dtype)
tm.assert_series_equal(expected, result, check_dtype=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this coerces? (I think we have another issue about this) can you find & reference it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like L292 has the reference now (#23305)

(pd.Categorical(('A', ), categories=['A', 'B']), [1]),
(pd.Categorical(('A', 'B'), categories=['A', 'B']), [1, 2]),
])
def test_copy_categorical_ndim(self, categorical, numeric):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name to to test_replace_categorical

@JustinZhengBC
Copy link
Contributor Author

@jreback I've made the requested changes

@jreback
Copy link
Contributor

jreback commented Mar 20, 2019

can you merge master

@jreback jreback added this to the 0.25.0 milestone Mar 20, 2019
@jreback jreback added the Categorical Categorical Data Type label Mar 20, 2019
@jreback jreback merged commit 27aa9d8 into pandas-dev:master Mar 20, 2019
@jreback
Copy link
Contributor

jreback commented Mar 20, 2019

thanks @JustinZhengBC nice patch, keep em coming!

thoo added a commit to thoo/pandas that referenced this pull request Mar 20, 2019
* upstream/master: (55 commits)
  PERF: Improve performance of StataReader (pandas-dev#25780)
  Speed up tokenizing of a row in csv and xstrtod parsing (pandas-dev#25784)
  BUG: Fix _binop for operators for serials which has more than one returns (divmod/rdivmod). (pandas-dev#25588)
  BUG-24971 copying blocks also considers ndim (pandas-dev#25521)
  CLN: Panel reference from documentation (pandas-dev#25649)
  ENH: Quoting column names containing spaces with backticks to use them in query and eval. (pandas-dev#24955)
  BUG: reading windows utf8 filenames in py3.6 (pandas-dev#25769)
  DOC: clean bug fix section in whatsnew (pandas-dev#25792)
  DOC: Fixed PeriodArray api ref (pandas-dev#25526)
  Move locale code out of tm, into _config (pandas-dev#25757)
  Unpin pycodestyle (pandas-dev#25789)
  Add test for rdivmod on EA array (GH23287) (pandas-dev#24047)
  ENH: Support datetime.timezone objects (pandas-dev#25065)
  Cython language level 3 (pandas-dev#24538)
  API: concat on sparse values (pandas-dev#25719)
  TST: assert_produces_warning works with filterwarnings (pandas-dev#25721)
  make core.config self-contained (pandas-dev#25613)
  CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721)
  TST: Check pytables<3.5.1 when skipping (pandas-dev#25773)
  DOC: Fix typo in docstring of DataFrame.memory_usage  (pandas-dev#25770)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Categorical.replace returns unexpected dimensions for length 1 Series
3 participants