Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) #8731

Merged

Conversation

karthikeyann
Copy link
Contributor

closes #8717

Usages of argmin, argmax depends on presence of sentinel values at nulls (ARGMIN_SENTINEL, ARGMAX_SENTINEL), group_argmin and group_argmax need to guarantee these sentinel values in their output. but cudf::detaill:gather doesn't guarantee that. This PR fixes this.

  • replace cudf::detail::gather with thrust::gather_if on indices to fix missing SENTINEL values for argmin, argmax.
  • add unit tests.

@karthikeyann karthikeyann added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) non-breaking Non-breaking change labels Jul 13, 2021
@karthikeyann karthikeyann requested a review from a team as a code owner July 13, 2021 19:20
@karthikeyann karthikeyann changed the title Fix min/max sorted groupby aggregation Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) Jul 13, 2021
@karthikeyann karthikeyann self-assigned this Jul 13, 2021
@karthikeyann karthikeyann requested a review from jlowe July 13, 2021 19:22
@codecov
Copy link

codecov bot commented Jul 13, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@3ed87f3). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #8731   +/-   ##
===============================================
  Coverage                ?   10.67%           
===============================================
  Files                   ?      109           
  Lines                   ?    18670           
  Branches                ?        0           
===============================================
  Hits                    ?     1993           
  Misses                  ?    16677           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ed87f3...956e12e. Read the comment docs.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround, @karthikeyann! The RAPIDS Accelerator tests that were failing due to this before now pass with this change.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work and quick turnaround @karthikeyann !

@harrism harrism removed the 3 - Ready for Review Ready for review by team label Jul 13, 2021
@harrism
Copy link
Member

harrism commented Jul 13, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d05de97 into rapidsai:branch-21.08 Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] min/max sorted aggregation on string columns mishandles all null case
4 participants