Generalize comparison binary operations #9542

vyasr · 2021-10-27T22:44:43Z

This PR is a follow-up to #9357 that consolidates the comparison binops (eq, ne, lt, le, gt, ge) into Frame, enabling their use on DataFrame objects as well as Series. Note that pandas DataFrames do not support the fill_value parameter for these operations (unlike other binop functions), so we could choose to disable it for ours as well but I have not done so at present (I don't see a particularly strong reason to lean either way).

…t all common.

python/cudf/cudf/core/dataframe.py

codecov · 2021-10-28T23:29:41Z

Codecov Report

Merging #9542 (c36a9f9) into branch-21.12 (ab4bfaa) will decrease coverage by 0.12%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-21.12    #9542      +/-   ##
================================================
- Coverage         10.79%   10.66%   -0.13%     
================================================
  Files               116      117       +1     
  Lines             18869    19720     +851     
================================================
+ Hits               2036     2104      +68     
- Misses            16833    17616     +783

Impacted Files	Coverage Δ
python/dask_cudf/dask_cudf/sorting.py	`92.90% <0.00%> (-1.21%)`	⬇️
python/cudf/cudf/io/csv.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/hdf.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_version.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/abc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/dlpack.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0951ba...c36a9f9. Read the comment docs.

brandon-b-miller

I couldn't find anything I didn't like about the implementation here. The only thing that caught my eye was the test - I see that its mostly derived from test_operator_func_dataframe, which is probably what I would have started with as well, but overall I am not a huge fan of testing like this for a few reasons:

IMO it's a little hard to see what is happening vs defining concrete data as a parameter especially if we're setting the random seed anyways.
I don't really think we need 100 rows in the test data, when dealing with nulls there's really only 4 things that can happen anyways TT, TF, FT, FF. Granted I have seen bugs where it only shows up in larger datasets, but I think there should be a different approach entirely for detecting those issues that is specifically designed for that.
I think the scalar test should be its own thing rather than the test inspecting its own parameters to turn into a different test.

Overall we're already at a point (with binops specifically) where I think it is a little hard to understand what coverage we have, and where we have redundant tests, and I would like to position ourselves such that we can rethink how we organize these tests next maintenance release. To that end I would (weakly) prefer to add more diffuse, decoupled, and simpler tests for now that are easier to refactor later even at the expense of coverage, and if something comes up we can add something at that point.

Approved

vyasr · 2021-11-02T18:51:25Z

@brandon-b-miller those are all definitely good points, and I'm very much in favor of an overhaul of our testing infrastructure. That's on the roadmap, so I'm comfortable punting on this change for now and dedicating some effort to this during our next maintenance release.

vyasr · 2021-11-02T18:51:37Z

@gpucibot merge

…names (#1928) cuDF PR rapidsai/cudf#9542 fixes a problem where comparisons between Series objects would return `True` even if the Series names were different, which is inconsistent with Pandas. Some cuGraph tests relied on the old behavior which ignored names and were failing. This PR updates those tests to use the proper comparison utility provided by cuDF to restore the intended behavior of the test assertions. This PR also updates the `gpu` CI script to use `mamba` for an install step to hopefully speed up CI runs. Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - Brad Rees (https://github.com/BradReesWork) - Chuck Hastings (https://github.com/ChuckHastings) - Joseph Nke (https://github.com/jnke2016) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1928

This PR resolves #9648, fixing a bug introduced in #9542 wherein name equality was enforced for Series objects compared via `equals`. It looks like pandas requires name equality only for DataFrames, and we simply didn't have tests for this before. This should go in the 21.12 release since it's a bug affecting downstream dependencies. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #9653

…string (#10576) This PR moves all the binary operation methods such as `Frame.add` into `IndexedFrame`. This removes these methods from `Index` objects to match pandas, where Index objects do not have these methods. I have also consolidated the docstrings for these methods using a single template and our `docutils`. Note that this is technically a breaking change that I am implementing without a deprecation cycle; however, I feel comfortable doing so because these methods were introduced incidentally in #9357 and #9542 into 21.12, so just a couple of releases ago, and since they are not pandas APIs I doubt that they have made significant penetration into user workflows. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10576

vyasr added 4 commits October 26, 2021 17:27

Remove DataFrame.equals and move necessary logic to Frame.

4a5e1b0

Move equality comparisons into Frame and update documentation.

f3b1ebc

Add tests of DataFrame binop funcs and fix cases where columns are no…

3500e87

…t all common.

Fix remaining test.

aa8d545

vyasr added 3 - Ready for Review Ready for review by team code quality Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 27, 2021

vyasr added this to the CuDF Python Refactoring milestone Oct 27, 2021

vyasr requested a review from isVoid October 27, 2021 22:44

vyasr self-assigned this Oct 27, 2021

vyasr requested a review from a team as a code owner October 27, 2021 22:44

vyasr requested a review from brandon-b-miller October 27, 2021 22:44

isVoid reviewed Oct 27, 2021

View reviewed changes

python/cudf/cudf/core/dataframe.py Show resolved Hide resolved

Add developer documentation for binop helpers.

c36a9f9

vyasr requested a review from isVoid October 28, 2021 22:06

brandon-b-miller approved these changes Nov 2, 2021

View reviewed changes

rapids-bot bot merged commit 1c10790 into rapidsai:branch-21.12 Nov 2, 2021

rlratzel mentioned this pull request Nov 4, 2021

Updates to support correct comparisons of cuDF Series with different names rapidsai/cugraph#1928

Merged

This was referenced Nov 10, 2021

[BUG] Different results from cudf.Series.equals #9648

Closed

Fix behavior of equals for non-DataFrame Frames and add tests. #9653

Merged

vyasr deleted the refactor/additional_binops branch January 14, 2022 18:01

vyasr mentioned this pull request Apr 1, 2022

Move binop methods from Frame to IndexedFrame and standardize the docstring #10576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize comparison binary operations #9542

Generalize comparison binary operations #9542

vyasr commented Oct 27, 2021

codecov bot commented Oct 28, 2021 •

edited

Loading

brandon-b-miller left a comment •

edited

Loading

vyasr commented Nov 2, 2021

vyasr commented Nov 2, 2021

Generalize comparison binary operations #9542

Generalize comparison binary operations #9542

Conversation

vyasr commented Oct 27, 2021

codecov bot commented Oct 28, 2021 • edited Loading

Codecov Report

brandon-b-miller left a comment • edited Loading

Choose a reason for hiding this comment

vyasr commented Nov 2, 2021

vyasr commented Nov 2, 2021

codecov bot commented Oct 28, 2021 •

edited

Loading

brandon-b-miller left a comment •

edited

Loading