Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match Pandas logic for comparing two objects with nulls #7490

Merged

Conversation

brandon-b-miller
Copy link
Contributor

Fixes #7066

@brandon-b-miller brandon-b-miller added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. breaking Breaking change labels Mar 2, 2021
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner March 2, 2021 22:50
@brandon-b-miller brandon-b-miller added the feature request New feature or request label Mar 2, 2021
operator.gt,
operator.ge,
operator.ne,
# comparison ops will temporarily XFAIL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this PR broke tests for series and dataframe mixed arithmetic, because of course comparison ops are now producing different results. However upon investigating further, I realized that dataframe/series mixed arithmetic actually isn't working at all in some situations, and this test has a grievous error (of my own making of course) where it doesn't actually compare the results properly at the end of the test, hence the failure going unnoticed.

This led to PR #7491. However, these PRs are now mutually dependent in the sense that until this PR is merged, I can't actually fix that functionality unless I spend time making the results match the logic this PR is about to change.

It doesn't make sense to me to fix broken code to produce results that will shortly thereafter be changed, forcing me to redo the code/test again.

Let me know if this seems like a sensible way to proceed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced offline and this plan seems like the least painful path forward (without having to merge 3 different PRs into a single one)

@brandon-b-miller brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 9, 2021
@brandon-b-miller
Copy link
Contributor Author

@kkraus14 this is a breaking change. What due diligence should be done before merging?

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/string.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Show resolved Hide resolved
@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 9, 2021

@kkraus14 this is a breaking change. What due diligence should be done before merging?

We have it marked as breaking so we should be good to go. Unfortunately this may cause some breakages, but they're necessary to get the correct behavior moving forward.

@codecov
Copy link

codecov bot commented Mar 16, 2021

Codecov Report

Merging #7490 (981457d) into branch-0.19 (7871e7a) will increase coverage by 0.13%.
The diff coverage is 93.60%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7490      +/-   ##
===============================================
+ Coverage        81.86%   82.00%   +0.13%     
===============================================
  Files              101      101              
  Lines            16884    16989     +105     
===============================================
+ Hits             13822    13931     +109     
+ Misses            3062     3058       -4     
Impacted Files Coverage Δ
python/cudf/cudf/core/index.py 92.86% <ø> (ø)
python/cudf/cudf/core/column/numerical.py 94.83% <87.50%> (-0.20%) ⬇️
python/cudf/cudf/core/column/column.py 87.77% <88.88%> (+0.01%) ⬆️
python/cudf/cudf/core/frame.py 89.00% <89.47%> (-0.02%) ⬇️
python/cudf/cudf/core/column/decimal.py 92.75% <91.30%> (-2.12%) ⬇️
python/cudf/cudf/core/dataframe.py 90.45% <95.65%> (-0.01%) ⬇️
python/cudf/cudf/core/series.py 91.25% <95.83%> (+0.47%) ⬆️
python/cudf/cudf/core/column/categorical.py 91.62% <100.00%> (+0.23%) ⬆️
python/cudf/cudf/core/column/datetime.py 89.09% <100.00%> (ø)
python/cudf/cudf/core/column/string.py 86.55% <100.00%> (+0.05%) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8aeb14e...981457d. Read the comment docs.

@kkraus14
Copy link
Collaborator

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Match pandas behavior for comparisons involving true nulls
4 participants