Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize comparison binary operations #9542

Merged
merged 5 commits into from
Nov 2, 2021

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Oct 27, 2021

This PR is a follow-up to #9357 that consolidates the comparison binops (eq, ne, lt, le, gt, ge) into Frame, enabling their use on DataFrame objects as well as Series. Note that pandas DataFrames do not support the fill_value parameter for these operations (unlike other binop functions), so we could choose to disable it for ours as well but I have not done so at present (I don't see a particularly strong reason to lean either way).

@vyasr vyasr added 3 - Ready for Review Ready for review by team code quality Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 27, 2021
@vyasr vyasr added this to the CuDF Python Refactoring milestone Oct 27, 2021
@vyasr vyasr requested a review from isVoid October 27, 2021 22:44
@vyasr vyasr self-assigned this Oct 27, 2021
@vyasr vyasr requested a review from a team as a code owner October 27, 2021 22:44
@vyasr vyasr requested a review from isVoid October 28, 2021 22:06
@codecov
Copy link

codecov bot commented Oct 28, 2021

Codecov Report

Merging #9542 (c36a9f9) into branch-21.12 (ab4bfaa) will decrease coverage by 0.12%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9542      +/-   ##
================================================
- Coverage         10.79%   10.66%   -0.13%     
================================================
  Files               116      117       +1     
  Lines             18869    19720     +851     
================================================
+ Hits               2036     2104      +68     
- Misses            16833    17616     +783     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0951ba...c36a9f9. Read the comment docs.

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find anything I didn't like about the implementation here. The only thing that caught my eye was the test - I see that its mostly derived from test_operator_func_dataframe, which is probably what I would have started with as well, but overall I am not a huge fan of testing like this for a few reasons:

  • IMO it's a little hard to see what is happening vs defining concrete data as a parameter especially if we're setting the random seed anyways.
  • I don't really think we need 100 rows in the test data, when dealing with nulls there's really only 4 things that can happen anyways TT, TF, FT, FF. Granted I have seen bugs where it only shows up in larger datasets, but I think there should be a different approach entirely for detecting those issues that is specifically designed for that.
  • I think the scalar test should be its own thing rather than the test inspecting its own parameters to turn into a different test.

Overall we're already at a point (with binops specifically) where I think it is a little hard to understand what coverage we have, and where we have redundant tests, and I would like to position ourselves such that we can rethink how we organize these tests next maintenance release. To that end I would (weakly) prefer to add more diffuse, decoupled, and simpler tests for now that are easier to refactor later even at the expense of coverage, and if something comes up we can add something at that point.

Approved

@vyasr
Copy link
Contributor Author

vyasr commented Nov 2, 2021

@brandon-b-miller those are all definitely good points, and I'm very much in favor of an overhaul of our testing infrastructure. That's on the roadmap, so I'm comfortable punting on this change for now and dedicating some effort to this during our next maintenance release.

@vyasr
Copy link
Contributor Author

vyasr commented Nov 2, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 1c10790 into rapidsai:branch-21.12 Nov 2, 2021
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Nov 4, 2021
…names (#1928)

cuDF PR rapidsai/cudf#9542 fixes a problem where comparisons between Series objects would return `True` even if the Series names were different, which is inconsistent with Pandas.  Some cuGraph tests relied on the old behavior which ignored names and were failing.

This PR updates those tests to use the proper comparison utility provided by cuDF to restore the intended behavior of the test assertions.

This PR also updates the `gpu` CI script to use `mamba` for an install step to hopefully speed up CI runs.

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Brad Rees (https://github.com/BradReesWork)
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Joseph Nke (https://github.com/jnke2016)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1928
rapids-bot bot pushed a commit that referenced this pull request Nov 10, 2021
This PR resolves #9648, fixing a bug introduced in #9542 wherein name equality was enforced for Series objects compared via `equals`. It looks like pandas requires name equality only for DataFrames, and we simply didn't have tests for this before.

This should go in the 21.12 release since it's a bug affecting downstream dependencies.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #9653
@vyasr vyasr deleted the refactor/additional_binops branch January 14, 2022 18:01
rapids-bot bot pushed a commit that referenced this pull request Apr 6, 2022
…string (#10576)

This PR moves all the binary operation methods such as `Frame.add` into `IndexedFrame`. This removes these methods from `Index` objects to match pandas, where Index objects do not have these methods. I have also consolidated the docstrings for these methods using a single template and our `docutils`.

Note that this is technically a breaking change that I am implementing without a deprecation cycle; however, I feel comfortable doing so because these methods were introduced incidentally in #9357 and #9542 into 21.12, so just a couple of releases ago, and since they are not pandas APIs I doubt that they have made significant penetration into user workflows.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10576
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants