-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize comparison binary operations #9542
Generalize comparison binary operations #9542
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9542 +/- ##
================================================
- Coverage 10.79% 10.66% -0.13%
================================================
Files 116 117 +1
Lines 18869 19720 +851
================================================
+ Hits 2036 2104 +68
- Misses 16833 17616 +783
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find anything I didn't like about the implementation here. The only thing that caught my eye was the test - I see that its mostly derived from test_operator_func_dataframe
, which is probably what I would have started with as well, but overall I am not a huge fan of testing like this for a few reasons:
- IMO it's a little hard to see what is happening vs defining concrete data as a parameter especially if we're setting the random seed anyways.
- I don't really think we need 100 rows in the test data, when dealing with nulls there's really only 4 things that can happen anyways
TT, TF, FT, FF
. Granted I have seen bugs where it only shows up in larger datasets, but I think there should be a different approach entirely for detecting those issues that is specifically designed for that. - I think the scalar test should be its own thing rather than the test inspecting its own parameters to turn into a different test.
Overall we're already at a point (with binops specifically) where I think it is a little hard to understand what coverage we have, and where we have redundant tests, and I would like to position ourselves such that we can rethink how we organize these tests next maintenance release. To that end I would (weakly) prefer to add more diffuse, decoupled, and simpler tests for now that are easier to refactor later even at the expense of coverage, and if something comes up we can add something at that point.
Approved
@brandon-b-miller those are all definitely good points, and I'm very much in favor of an overhaul of our testing infrastructure. That's on the roadmap, so I'm comfortable punting on this change for now and dedicating some effort to this during our next maintenance release. |
@gpucibot merge |
…names (#1928) cuDF PR rapidsai/cudf#9542 fixes a problem where comparisons between Series objects would return `True` even if the Series names were different, which is inconsistent with Pandas. Some cuGraph tests relied on the old behavior which ignored names and were failing. This PR updates those tests to use the proper comparison utility provided by cuDF to restore the intended behavior of the test assertions. This PR also updates the `gpu` CI script to use `mamba` for an install step to hopefully speed up CI runs. Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - Brad Rees (https://github.com/BradReesWork) - Chuck Hastings (https://github.com/ChuckHastings) - Joseph Nke (https://github.com/jnke2016) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1928
This PR resolves #9648, fixing a bug introduced in #9542 wherein name equality was enforced for Series objects compared via `equals`. It looks like pandas requires name equality only for DataFrames, and we simply didn't have tests for this before. This should go in the 21.12 release since it's a bug affecting downstream dependencies. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #9653
…string (#10576) This PR moves all the binary operation methods such as `Frame.add` into `IndexedFrame`. This removes these methods from `Index` objects to match pandas, where Index objects do not have these methods. I have also consolidated the docstrings for these methods using a single template and our `docutils`. Note that this is technically a breaking change that I am implementing without a deprecation cycle; however, I feel comfortable doing so because these methods were introduced incidentally in #9357 and #9542 into 21.12, so just a couple of releases ago, and since they are not pandas APIs I doubt that they have made significant penetration into user workflows. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10576
This PR is a follow-up to #9357 that consolidates the comparison binops (eq, ne, lt, le, gt, ge) into Frame, enabling their use on DataFrame objects as well as Series. Note that pandas DataFrames do not support the fill_value parameter for these operations (unlike other binop functions), so we could choose to disable it for ours as well but I have not done so at present (I don't see a particularly strong reason to lean either way).