-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: collect ops dispatch functions in one place, try to de-duplicate SparseDataFrame methods #23060
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23060 +/- ##
=======================================
Coverage 92.16% 92.16%
=======================================
Files 166 166
Lines 51224 51224
=======================================
Hits 47212 47212
Misses 4012 4012
Continue to review full report at Codecov.
|
Woops, accidentally pushed some unrelated commits collecting arithmetic tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. just a couple of questions / comments.
""" | ||
# Note: we use iloc to access columns for compat with cases | ||
# with non-unique columns. | ||
import pandas.core.computation.expressions as expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be imported at the top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but I think this is a run-time import to make import pandas as pd
faster
if own_default == other_default: | ||
# TOOD: won't this evaluate as False if both are np.nan? | ||
fill_value = own_default | ||
elif np.isnan(own_default) and not np.isnan(other_default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be isna checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought so, but the module-level docstring says only float64 is supported, so I kept the behavior as-is. I think the overall takeaway is that this isn't especially well-maintained, and we should all look forward to Sparse EA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be amendednow that that Sparse EA is here? (followup ok too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend holding off on changing it.
- SparseDataFrame may be going away, so why bother.
- We may have to change the
default_fill_value
if we want its type to match that ofsp_values
(Require the dtype of SparseArray.fill_value and sp_values.dtype to match #23124 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks.
'floatcol': np.random.randn(10), | ||
'stringcol': list(tm.rands(10))}) | ||
df.loc[np.random.rand(len(df)) > 0.5, 'dates2'] = pd.NaT | ||
ops = {'gt': 'lt', 'lt': 'gt', 'ge': 'le', 'le': 'ge', 'eq': 'eq', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should parameterize if you can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, the point of collecting these arithmetic tests is to parametrize/fixturize and especially de-duplicate them in an upcoming pass.
# DataFrame | ||
assert df.eq(df).values.all() | ||
assert not df.ne(df).values.any() | ||
for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs paramaterization!
with tm.assert_raises_regex(ValueError, msg): | ||
f(ndim_5) | ||
|
||
# Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull lthis out to a separatate, parameterized test (future PR is ok for these, though since you are moving around, maybe better here)
lambda x: tm.makeFloatSeries(), | ||
True) | ||
]) | ||
@pytest.mark.parametrize('opname', ['add', 'sub', 'mul', 'floordiv', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally switch to our operators fixture
# == and !=, inequalities should raise | ||
result = x == y | ||
expected = pd.DataFrame({col: x[col] == y[col] | ||
for col in x.columns}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this (next pass ok)
If it will help, I can separate out the unrelated test parts of this. There is a bunch of test cleanup to do, and already a healthy number of test-touching PRs in play. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment, but can be a followup, ping on green.
if own_default == other_default: | ||
# TOOD: won't this evaluate as False if both are np.nan? | ||
fill_value = own_default | ||
elif np.isnan(own_default) and not np.isnan(other_default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be amendednow that that Sparse EA is here? (followup ok too)
@TomAugspurger any idea about this? I have no clue |
also happy to merge this and followup later on sparse refactorings (prob better). |
Ping |
can you rebase. the isort is playing havoc :> |
thanks @jbrockmendel nice as always! |
…y_tests * repo_org/master: (52 commits) ENH: Allow rename_axis to specify index and columns arguments (pandas-dev#20046) STY: proposed isort settings [ci skip] [skip ci] [ciskip] [skipci] (pandas-dev#23366) MAINT: Remove extraneous test.parquet file CLN: Follow-up comments to pandas-devgh-23392 (pandas-dev#23401) BUG GH23282 calling min on series of NaT returns NaT (pandas-dev#23289) unpin openpyxl (pandas-dev#23361) REF: collect ops dispatch functions in one place, try to de-duplicate SparseDataFrame methods (pandas-dev#23060) CLN: Remove pandas.tools module (pandas-dev#23376) CLN: Remove some dtype methods from API (pandas-dev#23390) CLN: Cleanup toplevel namespace shims (pandas-dev#23386) DOC: fixup whatsnew note for GH21394 (pandas-dev#23355) Fix import format at pandas/tests/extension directory (pandas-dev#23365) DOC: Remove Series.sortlevel from api.rst (pandas-dev#23395) API: Disallow dtypes w/o frequency when casting (pandas-dev#23392) BUG/TST/REF: Datetimelike Arithmetic Methods (pandas-dev#23215) STYLE: lint add np.nan* funcs to cython_table (pandas-dev#22109) Run Isort on tests/util single PR (pandas-dev#23347) BUG: Fix date_range overflow (pandas-dev#23345) Run Isort on tests/arrays single PR (pandas-dev#23346) ...
… SparseDataFrame methods (pandas-dev#23060)
… SparseDataFrame methods (pandas-dev#23060)
… SparseDataFrame methods (pandas-dev#23060)
No description provided.