dispatch frame methods to series versions instead of re-implementing masking etc #19611

jbrockmendel · 2018-02-09T02:07:33Z

This moves some of DataFrame's dispatching logic into ops, mostly in ways that don't change any logic.

The place that logic is changed is in DataFrame._combine_frame where instead of defining arith_op to mask func and wrapping {col: arith_op(this[col].values, other[col].values) for col in this.columns}, we dispatch directly to the Series methods and wrap {col: func(this[col], other[col]) for col in this.columns}

…masking etc

jbrockmendel · 2018-02-09T02:08:47Z

pandas/core/frame.py

    def _combine_match_index(self, other, func, level=None):
        left, right = self.align(other, join='outer', axis=0, level=level,
                                 copy=False)
-        return self._constructor(func(left.values.T, right.values).T,
+        new_data = func(left.values.T, right.values).T


I think we can do better than operating on left.values.T and right.values. Especially with mixed-dtypes this seems like low-hanging fruit.

most of these types of operations need to be pushed down to internals via block operations. this should dispatch to ._data.eval to do this (like the other match functions)

Yah, but I'm not there yet. These particular cases are tricky because SparseDataFrame implements _combine_match_foo differently

jbrockmendel · 2018-02-09T02:09:59Z

pandas/core/frame.py

                result = self._constructor(result, index=new_index, copy=False)
                result.columns = new_columns
-                return result
+            return result

        else:
            result = _arith_op(this.values, other.values)


I'd like to dispatch to Series here, but presumably doing the operation once instead of per-column has a perf benefit. One idea that didn't work on the first try was to ravel() this.values and other.values, wrap them in Series, then operate and re-shape.

this is pretty tricky, yes that would work for most ufuncs, but is not generally applicable e.g. concat is a ufunc too.
so hold off on this as other low hanging fruit w/o opening up this box.

more to the point is to push this down to the block manager for execution which handles the mixed dtypes case, this is just a special case of that.

codecov · 2018-02-09T03:03:41Z

Codecov Report

Merging #19611 into master will decrease coverage by <.01%.
The diff coverage is 94.23%.

@@            Coverage Diff             @@
##           master   #19611      +/-   ##
==========================================
- Coverage   91.58%   91.57%   -0.01%     
==========================================
  Files         150      150              
  Lines       48867    48856      -11     
==========================================
- Hits        44755    44741      -14     
- Misses       4112     4115       +3

Flag	Coverage Δ
#multiple	`89.95% <94.23%> (-0.01%)`	⬇️
#single	`41.75% <40.38%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.22% <100%> (+0.06%)`	⬆️
pandas/core/indexes/base.py	`96.45% <93.33%> (-0.02%)`	⬇️
pandas/core/ops.py	`96.44% <93.33%> (-0.39%)`	⬇️
pandas/util/testing.py	`83.64% <0%> (-0.21%)`	⬇️
pandas/core/series.py	`94.56% <0%> (-0.01%)`	⬇️
pandas/core/indexes/api.py	`98.78% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fdf1e2...118cd5d. Read the comment docs.

jreback · 2018-02-10T17:34:35Z

pandas/core/frame.py

    def _combine_match_index(self, other, func, level=None):
        left, right = self.align(other, join='outer', axis=0, level=level,
                                 copy=False)
-        return self._constructor(func(left.values.T, right.values).T,
+        new_data = func(left.values.T, right.values).T


most of these types of operations need to be pushed down to internals via block operations. this should dispatch to ._data.eval to do this (like the other match functions)

jreback · 2018-02-10T17:36:38Z

The place that logic is changed is in DataFrame._combine_frame where instead of defining arith_op to mask func and wrapping {col: arith_op(this[col].values, other[col].values) for col in this.columns}, we dispatch directly to the Series methods and wrap {col: func(this[col], other[col]) for col in this.columns}

does this change anything from a user pov?

jbrockmendel · 2018-02-10T18:32:14Z

does this change anything from a user pov?

No, but I may have made a mistake here. Pls put a pin in this until I double-check.

jbrockmendel · 2018-02-11T16:31:07Z

I did in fact make a mistake, but there are no existing tests that catch it. Next push will fix both of these.

jbrockmendel · 2018-02-11T17:02:43Z

Will hold off on this until #19613 is closed.

jreback · 2018-02-11T17:57:12Z

pandas/core/ops.py

@@ -998,6 +998,33 @@ def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
 # -----------------------------------------------------------------------------
 # DataFrame

+def _combine_series_frame(self, other, func, fill_value=None, axis=None,
+                          level=None, try_cast=True):


can you add a doc-string

jreback · 2018-02-13T00:19:47Z

rebase

…s-frames

jreback · 2018-02-15T12:11:12Z

did you have another PR which touched this code and we wanted to merge first? or is this one ok? if so rebase

jbrockmendel · 2018-02-15T15:17:50Z

There was, and it was merged a few days ago. #19613. This should be good to go (after rebase)

…s-frames

jbrockmendel · 2018-02-15T15:19:04Z

#19649 and #19582 touch nearby code, but distinct enough that it shouldn't cause a problem.

jreback · 2018-02-16T18:36:07Z

pandas/core/ops.py

+    invalid_op : function
+    """
+    def invalid_op(self, other=None):
+        raise TypeError("cannot perform {name} with this index type: "


you might need to parameterize this on the work 'index' but not sure (IOW can it be a Series)? or maybe just remove the word index ?

ATM it is only used for indexes. I could add an assertion that self is an Index so that we know to change it if/when the wording ceases to be accurate.

jreback · 2018-02-16T18:37:07Z

pandas/core/ops.py

+    self : DataFrame
+    other : Series
+    func : binary operator
+    fill_value : object (default None)


use the format
fill_value : object, default None

no parens

jreback · 2018-02-16T18:38:19Z

pandas/core/ops.py

+                                  .format(fill=fill_value))
+
+    if axis is not None:
+        axis = self._get_axis_name(axis)


we normally use axis numbers internally (and not names). I agree its a bit easier on the eyes to do this, but we should be consistent (so revert here, ok to change it all over the base in another PR).

OK. This isn't a change from the current implementation, just moving it from frame. Will change.

jreback · 2018-02-16T18:38:51Z

pandas/tests/frame/test_arithmetic.py

@@ -72,6 +72,23 @@ def test_tz_aware_scalar_comparison(self, timestamps):
 # -------------------------------------------------------------------
 # Arithmetic

+class TestFrameFlexArithmetic(object):
+    def test_df_add_flex_filled_mixed_dtypes(self):


does this represent a change in user facing API?

No, just a case that is not currently tested. I made a mistake in an early commit in this PR; this test would have caught it.

jreback

small comments for the future

jreback · 2018-02-18T16:33:34Z

pandas/core/frame.py

                result = self._constructor(result, index=new_index, copy=False)
                result.columns = new_columns
-                return result
+            return result

        else:
            result = _arith_op(this.values, other.values)


this is pretty tricky, yes that would work for most ufuncs, but is not generally applicable e.g. concat is a ufunc too.
so hold off on this as other low hanging fruit w/o opening up this box.

jreback · 2018-02-18T16:34:05Z

pandas/core/frame.py

                result = self._constructor(result, index=new_index, copy=False)
                result.columns = new_columns
-                return result
+            return result

        else:
            result = _arith_op(this.values, other.values)


more to the point is to push this down to the block manager for execution which handles the mixed dtypes case, this is just a special case of that.

jreback · 2018-02-18T16:36:27Z

thanks

…masking etc (pandas-dev#19611)

jbrockmendel added 2 commits February 8, 2018 18:02

dispatch frame methods to series versions instead of re-implementing …

6d59458

…masking etc

fixup typo

4540e82

jbrockmendel commented Feb 9, 2018

View reviewed changes

jreback requested changes Feb 10, 2018

View reviewed changes

jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Clean labels Feb 10, 2018

jbrockmendel mentioned this pull request Feb 10, 2018

De-duplicate masking/fallback logic in ops #19613

Merged

jbrockmendel added 2 commits February 11, 2018 09:45

test for mixed flex op with fill

0ffe430

fix filling

92fc179

jreback requested changes Feb 11, 2018

View reviewed changes

jbrockmendel mentioned this pull request Feb 12, 2018

Reduce redirection in ops #19649

Merged

jbrockmendel added 4 commits February 12, 2018 17:24

Merge branch 'master' of https://github.com/pandas-dev/pandas into op…

ff292f3

…s-frames

docstring

8a82a23

move make_invalid_op to ops

a221fa8

Merge branch 'master' of https://github.com/pandas-dev/pandas into op…

3473910

…s-frames

Merge branch 'master' of https://github.com/pandas-dev/pandas into op…

1a47d83

…s-frames

jbrockmendel added 2 commits February 15, 2018 09:25

harmless commit to force CI

02eceab

better err message

956b9f9

jbrockmendel mentioned this pull request Feb 16, 2018

Parametrize PeriodIndex tests #19659

Merged

4 tasks

jreback requested changes Feb 16, 2018

View reviewed changes

jbrockmendel added 2 commits February 16, 2018 12:33

docstring edits

7583e7d

fix typo

118cd5d

jbrockmendel mentioned this pull request Feb 17, 2018

Fix wraparound/overflow in date_range #19740

Closed

4 tasks

jreback approved these changes Feb 18, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Feb 18, 2018

jreback merged commit 64e155c into pandas-dev:master Feb 18, 2018

jbrockmendel deleted the ops-frames branch February 18, 2018 18:23

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

dispatch frame methods to series versions instead of re-implementing …

c0f761d

…masking etc (pandas-dev#19611)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dispatch frame methods to series versions instead of re-implementing masking etc #19611

dispatch frame methods to series versions instead of re-implementing masking etc #19611

jbrockmendel commented Feb 9, 2018

jbrockmendel Feb 9, 2018

jreback Feb 10, 2018

jbrockmendel Feb 10, 2018

jbrockmendel Feb 9, 2018

jreback Feb 18, 2018

jreback Feb 18, 2018

codecov bot commented Feb 9, 2018 •

edited

Loading

jreback Feb 10, 2018

jreback commented Feb 10, 2018

jbrockmendel commented Feb 10, 2018

jbrockmendel commented Feb 11, 2018

jbrockmendel commented Feb 11, 2018

jreback Feb 11, 2018

jreback commented Feb 13, 2018

jreback commented Feb 15, 2018

jbrockmendel commented Feb 15, 2018

jbrockmendel commented Feb 15, 2018 •

edited

Loading

jreback Feb 16, 2018

jbrockmendel Feb 16, 2018

jreback Feb 16, 2018

jreback Feb 16, 2018

jbrockmendel Feb 16, 2018

jreback Feb 16, 2018

jbrockmendel Feb 16, 2018

jreback left a comment

jreback Feb 18, 2018

jreback Feb 18, 2018

jreback commented Feb 18, 2018

dispatch frame methods to series versions instead of re-implementing masking etc #19611

dispatch frame methods to series versions instead of re-implementing masking etc #19611

Conversation

jbrockmendel commented Feb 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 9, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

jreback commented Feb 10, 2018

jbrockmendel commented Feb 10, 2018

jbrockmendel commented Feb 11, 2018

jbrockmendel commented Feb 11, 2018

Choose a reason for hiding this comment

jreback commented Feb 13, 2018

jreback commented Feb 15, 2018

jbrockmendel commented Feb 15, 2018

jbrockmendel commented Feb 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 18, 2018

codecov bot commented Feb 9, 2018 •

edited

Loading

jbrockmendel commented Feb 15, 2018 •

edited

Loading