Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an insert count bug #132

Merged
merged 2 commits into from
Jan 14, 2022
Merged

Conversation

PointKernel
Copy link
Member

This PR fixed a bug where all threads of a CG will update thread counters. It also updated unit tests to exercise this error.

@PointKernel PointKernel added type: bug Something isn't working topic: static_map Issue related to the static_map helps: rapids Helps or needed by RAPIDS labels Jan 12, 2022
@PointKernel PointKernel merged commit 922a878 into NVIDIA:dev Jan 14, 2022
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Feb 2, 2022
Related to #9413.

This PR adds `unordered_drop_duplicates`/`unordered_distinct_count` APIs by using hash-based algorithms. It doesn't close the original issue since adding `std::unique`-like `drop_duplicates` is not addressed in this PR. It involves several changes:

- [x] Change the behavior of the existing `distinct_count`: counting the number of consecutive groups of equivalent rows instead of total unique.
- [x] Add hash-based `unordered_distinct_count`: this new API counts unique rows across the whole table by using a hash map. It requires a newer version of `cuco` with bug fixing: NVIDIA/cuCollections#132 and NVIDIA/cuCollections#138.
- [x] Add hash-based `unordered_drop_duplicates`: similar to `drop_duplicates`, but this API doesn't support `keep` option and the output is in an unspecified order.
- [x] Replace all the cpp-side `drop_duplicates`/`distinct_count` use cases with `unordered_` versions. 
- [x] Update and replace the existing compaction benchmark with `nvbench`.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - https://github.com/brandon-b-miller
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #10030
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS topic: static_map Issue related to the static_map type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants