Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH union_categoricals supports ignore_order GH13410 #15219

Closed

Conversation

js3711
Copy link

@js3711 js3711 commented Jan 25, 2017

xref #13410 (ignore_order portion)

@js3711 js3711 mentioned this pull request Jan 25, 2017
3 tasks
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Added a few comments. Can you add a whatsnew note in v0.20.0.txt and add something to the docs about this (http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning)

Can you also add a test case with differing categories (not only different order)? So a case that would raise in case of ignore_ordered=False (eg the first test case of test_union_categoricals_sort but with ordered categories)

tm.assert_categorical_equal(res, exp)

res = union_categoricals([c1, c1], ignore_order=True)
exp = Categorical([1, 2, 3, 1, 2, 3], ordered=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference with this and the test case above? (ordered=False is the default)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intended to test two ordered categoricals with identical categories and orders. The above was a mixed ordered and unordered with identical categories.

I did remove ordered=False from this test and other tests since it is the default.

tm.assert_categorical_equal(res, exp)

c1 = Categorical([1, 2, 3], categories=[3, 2, 1], ordered=True)
c2 = Categorical([1, 2, 3], ordered=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this out, and use the c1 and c2 from above (but just swap them in the code [c2, c1])

@@ -222,6 +222,9 @@ def union_categoricals(to_union, sort_categories=False):
sort_categories : boolean, default False
If true, resulting categories will be lexsorted, otherwise
they will be ordered as they appear in the data.
ignore_order: boolean, default False
If true, ordered categories will be ignored. Results in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ordered categories" -> "the ordered attribute of the categorical" / "whether the categorical is ordered or not" ? as the categories itself are not ignored, only its "orderedness"

raise TypeError("Cannot use sort_categories=True with "
"ordered Categoricals")

if sort_categories and not categories.is_monotonic_increasing:
categories = categories.sort_values()
indexer = categories.get_indexer(first.categories)
new_codes = take_1d(indexer, new_codes, fill_value=-1)
elif all(not c.ordered for c in to_union):
elif ignore_order | all(not c.ordered for c in to_union):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| -> or

@@ -297,6 +300,9 @@ def _maybe_unwrap(x):
else:
raise TypeError('Categorical.ordered must be the same')

if ignore_order:
ordered = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ordered is already False? (line 263) Is this still needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if statement on line 264 can be entered if the ordered categoricals have the same categories and order.

is_dtype_equal checks categories and ordering

@js3711
Copy link
Author

js3711 commented Jan 26, 2017

Thanks for the comments. Pull request has been updated.

@codecov-io
Copy link

codecov-io commented Jan 26, 2017

Codecov Report

Merging #15219 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15219      +/-   ##
==========================================
+ Coverage   90.37%   90.37%   +<.01%     
==========================================
  Files         135      135              
  Lines       49464    49466       +2     
==========================================
+ Hits        44702    44705       +3     
+ Misses       4762     4761       -1
Impacted Files Coverage Δ
pandas/types/concat.py 98.06% <100%> (+0.01%)
pandas/core/common.py 91.36% <ø> (+0.33%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be4a63f...e9d00de. Read the comment docs.

@js3711
Copy link
Author

js3711 commented Feb 7, 2017

@jorisvandenbossche Any other actions on my part?

@jreback jreback added this to the 0.20.0 milestone Feb 7, 2017
@jreback
Copy link
Contributor

jreback commented Feb 7, 2017

@js3711 minor corrections and a flake issue you can see

git diff master | flake8 --diff to see locally

ping when pushed / green.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2017

can you rebase / update

@js3711
Copy link
Author

js3711 commented Feb 17, 2017

I'll update this weekend. Apologies for the delay.

@js3711 js3711 force-pushed the GH13410-ENHunion_categoricals branch from 6dc5bb8 to d278d62 Compare February 20, 2017 04:47
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just a couple of minor doc changes, and you have a flake error. you can use git diff master | flake8 --diff to see this

@@ -1666,6 +1666,42 @@ def test_union_categoricals_ordered(self):
with tm.assertRaisesRegexp(TypeError, msg):
union_categoricals([c1, c2])

def test_union_categoricals_ignore_order(self):
c1 = Categorical([1, 2, 3], ordered=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here as a comment

@@ -146,6 +146,8 @@ Other enhancements
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`)
- ``pd.merge_asof()`` gained the option ``direction='backward'|'forward'|'nearest'`` (:issue:`14887`)

- ``ignore_ordered`` argument added to ``pd.types.concat.union_categoricals``; setting the argument to true will ignore the ordered attribute of unioned categoricals (:issue:`13410`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reverse this

pd.types.concat.....union_categoricals(..) has gained the ignore_ordered=False argument.

@@ -222,6 +222,9 @@ def union_categoricals(to_union, sort_categories=False):
sort_categories : boolean, default False
If true, resulting categories will be lexsorted, otherwise
they will be ordered as they appear in the data.
ignore_order: boolean, default False
If true, the ordered attribute of the Categoricals will be ignored.
Results in an unordered categorical.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded 0.20.0 tag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the versionadded tag

c2 = Categorical([1, 2, 3], ordered=False)

res = union_categoricals([c1, c2], ignore_order=True)
exp = Categorical([1, 2, 3, 1, 2, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you an an explicit test with ignore_order=False that raises (there are tests in other sections, but should have one that explicityly specifies)

@@ -1662,6 +1662,42 @@ def test_union_categoricals_ordered(self):
with tm.assertRaisesRegexp(TypeError, msg):
union_categoricals([c1, c2])

def test_union_categoricals_ignore_order(self):
c1 = Categorical([1, 2, 3], ordered=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment. pls add a tests with ignore_order=False (explicity set), and additional one with no ignore_order kw passed (to tests the defaults)

@@ -157,6 +157,7 @@ Other enhancements
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)

.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
- ``ignore_ordered`` argument added to ``pd.types.concat.union_categoricals``; setting the argument to true will ignore the ordered attribute of unioned categoricals (:issue:`13410`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add in a :ref: to the docs you added.

@js3711
Copy link
Author

js3711 commented Feb 22, 2017

@jreback thanks for the feedback. I made the suggested changes and submitted a new PR.

@jreback jreback closed this in 14fee4f Feb 22, 2017
@jreback
Copy link
Contributor

jreback commented Feb 22, 2017

thanks for the PR @js3711

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
xref pandas-dev#13410 (ignore_order portion)

Author: Justin Solinsky <[email protected]>

Closes pandas-dev#15219 from js3711/GH13410-ENHunion_categoricals and squashes the following commits:

e9d00de [Justin Solinsky] GH15219 Documentation fixes based on feedback
d278d62 [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410
9b827ef [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants