ENH union_categoricals supports ignore_order GH13410 #15219

js3711 · 2017-01-25T04:15:02Z

xref #13410 (ignore_order portion)

jorisvandenbossche

Looks good!

Added a few comments. Can you add a whatsnew note in v0.20.0.txt and add something to the docs about this (http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning)

Can you also add a test case with differing categories (not only different order)? So a case that would raise in case of ignore_ordered=False (eg the first test case of test_union_categoricals_sort but with ordered categories)

jorisvandenbossche · 2017-01-25T08:24:41Z

pandas/tools/tests/test_concat.py

+        tm.assert_categorical_equal(res, exp)
+
+        res = union_categoricals([c1, c1], ignore_order=True)
+        exp = Categorical([1, 2, 3, 1, 2, 3], ordered=False)


What is the difference with this and the test case above? (ordered=False is the default)

This was intended to test two ordered categoricals with identical categories and orders. The above was a mixed ordered and unordered with identical categories.

I did remove ordered=False from this test and other tests since it is the default.

jorisvandenbossche · 2017-01-25T08:30:41Z

pandas/tools/tests/test_concat.py

+        tm.assert_categorical_equal(res, exp)
+
+        c1 = Categorical([1, 2, 3], categories=[3, 2, 1], ordered=True)
+        c2 = Categorical([1, 2, 3], ordered=True)


You can leave this out, and use the c1 and c2 from above (but just swap them in the code [c2, c1])

jorisvandenbossche · 2017-01-25T08:35:59Z

pandas/types/concat.py

@@ -222,6 +222,9 @@ def union_categoricals(to_union, sort_categories=False):
    sort_categories : boolean, default False
        If true, resulting categories will be lexsorted, otherwise
        they will be ordered as they appear in the data.
+    ignore_order: boolean, default False
+        If true, ordered categories will be ignored.  Results in


"ordered categories" -> "the ordered attribute of the categorical" / "whether the categorical is ordered or not" ? as the categories itself are not ignored, only its "orderedness"

jorisvandenbossche · 2017-01-25T08:36:31Z

pandas/types/concat.py

            raise TypeError("Cannot use sort_categories=True with "
                            "ordered Categoricals")

        if sort_categories and not categories.is_monotonic_increasing:
            categories = categories.sort_values()
            indexer = categories.get_indexer(first.categories)
            new_codes = take_1d(indexer, new_codes, fill_value=-1)
-    elif all(not c.ordered for c in to_union):
+    elif ignore_order | all(not c.ordered for c in to_union):


jorisvandenbossche · 2017-01-25T08:38:02Z

pandas/types/concat.py

@@ -297,6 +300,9 @@ def _maybe_unwrap(x):
        else:
            raise TypeError('Categorical.ordered must be the same')

+    if ignore_order:
+        ordered = False


I think ordered is already False? (line 263) Is this still needed?

The if statement on line 264 can be entered if the ordered categoricals have the same categories and order.

is_dtype_equal checks categories and ordering

js3711 · 2017-01-26T03:58:15Z

Thanks for the comments. Pull request has been updated.

codecov-io · 2017-01-26T04:22:06Z

Codecov Report

Merging #15219 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15219      +/-   ##
==========================================
+ Coverage   90.37%   90.37%   +<.01%     
==========================================
  Files         135      135              
  Lines       49464    49466       +2     
==========================================
+ Hits        44702    44705       +3     
+ Misses       4762     4761       -1

Impacted Files	Coverage Δ
pandas/types/concat.py	`98.06% <100%> (+0.01%)`	✅
pandas/core/common.py	`91.36% <ø> (+0.33%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be4a63f...e9d00de. Read the comment docs.

js3711 · 2017-02-07T01:42:18Z

@jorisvandenbossche Any other actions on my part?

jreback · 2017-02-07T15:47:36Z

@js3711 minor corrections and a flake issue you can see

git diff master | flake8 --diff to see locally

ping when pushed / green.

jreback · 2017-02-16T17:52:36Z

can you rebase / update

js3711 · 2017-02-17T13:03:27Z

I'll update this weekend. Apologies for the delay.

jreback

lgtm. just a couple of minor doc changes, and you have a flake error. you can use git diff master | flake8 --diff to see this

jreback · 2017-02-07T15:42:29Z

pandas/tools/tests/test_concat.py

@@ -1666,6 +1666,42 @@ def test_union_categoricals_ordered(self):
        with tm.assertRaisesRegexp(TypeError, msg):
            union_categoricals([c1, c2])

+    def test_union_categoricals_ignore_order(self):
+        c1 = Categorical([1, 2, 3], ordered=True)


can you add the issue number here as a comment

jreback · 2017-02-07T15:43:09Z

doc/source/whatsnew/v0.20.0.txt

@@ -146,6 +146,8 @@ Other enhancements
 - ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
 - ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`)
 - ``pd.merge_asof()`` gained the option ``direction='backward'|'forward'|'nearest'`` (:issue:`14887`)
+
+- ``ignore_ordered`` argument added to ``pd.types.concat.union_categoricals``; setting the argument to true will ignore the ordered attribute of unioned categoricals (:issue:`13410`)


can you reverse this

pd.types.concat.....union_categoricals(..) has gained the ignore_ordered=False argument.

jreback · 2017-02-07T15:43:43Z

pandas/types/concat.py

@@ -222,6 +222,9 @@ def union_categoricals(to_union, sort_categories=False):
    sort_categories : boolean, default False
        If true, resulting categories will be lexsorted, otherwise
        they will be ordered as they appear in the data.
+    ignore_order: boolean, default False
+        If true, the ordered attribute of the Categoricals will be ignored.
+        Results in an unordered categorical.


add a versionadded 0.20.0 tag

can you add the versionadded tag

jreback · 2017-02-07T15:45:27Z

pandas/tools/tests/test_concat.py

+        c2 = Categorical([1, 2, 3], ordered=False)
+
+        res = union_categoricals([c1, c2], ignore_order=True)
+        exp = Categorical([1, 2, 3, 1, 2, 3])


can you an an explicit test with ignore_order=False that raises (there are tests in other sections, but should have one that explicityly specifies)

jreback · 2017-02-20T14:25:52Z

pandas/tests/tools/test_concat.py

@@ -1662,6 +1662,42 @@ def test_union_categoricals_ordered(self):
        with tm.assertRaisesRegexp(TypeError, msg):
            union_categoricals([c1, c2])

+    def test_union_categoricals_ignore_order(self):
+        c1 = Categorical([1, 2, 3], ordered=True)


add the issue number as a comment. pls add a tests with ignore_order=False (explicity set), and additional one with no ignore_order kw passed (to tests the defaults)

jreback · 2017-02-20T14:26:20Z

doc/source/whatsnew/v0.20.0.txt

@@ -157,6 +157,7 @@ Other enhancements
 - HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)

 .. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
+- ``ignore_ordered`` argument added to ``pd.types.concat.union_categoricals``; setting the argument to true will ignore the ordered attribute of unioned categoricals (:issue:`13410`)


add in a :ref: to the docs you added.

js3711 · 2017-02-22T03:04:57Z

@jreback thanks for the feedback. I made the suggested changes and submitted a new PR.

jreback · 2017-02-22T16:32:47Z

thanks for the PR @js3711

xref pandas-dev#13410 (ignore_order portion) Author: Justin Solinsky <[email protected]> Closes pandas-dev#15219 from js3711/GH13410-ENHunion_categoricals and squashes the following commits: e9d00de [Justin Solinsky] GH15219 Documentation fixes based on feedback d278d62 [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410 9b827ef [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410

js3711 mentioned this pull request Jan 25, 2017

ENH: Categorical.from_union #13410

Closed

3 tasks

jorisvandenbossche reviewed Jan 25, 2017

View reviewed changes

jorisvandenbossche added Categorical Categorical Data Type Enhancement labels Jan 25, 2017

jreback added this to the 0.20.0 milestone Feb 7, 2017

Justin Solinsky added 2 commits February 19, 2017 22:58

ENH union_categoricals supports ignore_order GH13410

9b827ef

ENH union_categoricals supports ignore_order GH13410

d278d62

js3711 force-pushed the GH13410-ENHunion_categoricals branch from 6dc5bb8 to d278d62 Compare February 20, 2017 04:47

jreback approved these changes Feb 20, 2017

View reviewed changes

GH15219 Documentation fixes based on feedback

e9d00de

jreback closed this in 14fee4f Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH union_categoricals supports ignore_order GH13410 #15219

ENH union_categoricals supports ignore_order GH13410 #15219

js3711 commented Jan 25, 2017 •

edited by jreback

Loading

jorisvandenbossche left a comment

jorisvandenbossche Jan 25, 2017

js3711 Jan 26, 2017

jorisvandenbossche Jan 25, 2017

jorisvandenbossche Jan 25, 2017

jorisvandenbossche Jan 25, 2017

jorisvandenbossche Jan 25, 2017

js3711 Jan 26, 2017

js3711 commented Jan 26, 2017

codecov-io commented Jan 26, 2017 •

edited

Loading

js3711 commented Feb 7, 2017

jreback commented Feb 7, 2017 •

edited

Loading

jreback commented Feb 16, 2017

js3711 commented Feb 17, 2017

jreback left a comment

jreback Feb 7, 2017

jreback Feb 7, 2017

jreback Feb 7, 2017

jreback Feb 20, 2017

jreback Feb 7, 2017

jreback Feb 20, 2017

jreback Feb 20, 2017

js3711 commented Feb 22, 2017

jreback commented Feb 22, 2017

ENH union_categoricals supports ignore_order GH13410 #15219

ENH union_categoricals supports ignore_order GH13410 #15219

Conversation

js3711 commented Jan 25, 2017 • edited by jreback Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

js3711 commented Jan 26, 2017

codecov-io commented Jan 26, 2017 • edited Loading

Codecov Report

js3711 commented Feb 7, 2017

jreback commented Feb 7, 2017 • edited Loading

jreback commented Feb 16, 2017

js3711 commented Feb 17, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

js3711 commented Feb 22, 2017

jreback commented Feb 22, 2017

js3711 commented Jan 25, 2017 •

edited by jreback

Loading

codecov-io commented Jan 26, 2017 •

edited

Loading

jreback commented Feb 7, 2017 •

edited

Loading