Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize and clean countl, countr, popcount, has_single_bit #3414

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Jan 16, 2025

(Continuation of #3226. Renamed branch and PR to avoid confusion)
Fixes #2239

Description

Optimize and cleanup the following functions:

  • countl_zero/countr_zero
  • countl_one, countr_one
  • popcount
  • has_single_bit

Run-time optimizations are described in #2239.

Features:

  • Add concept-like macro
  • Add assumptions on the return values
  • Code simplification for low-level functions
  • Use C++14 constexpr to avoid several function instantiations
  • Add _CCCL_NODISCARD to all functions
  • Fully qualify function namespace

#### DO NOT MERGE

  • require C++17

@fbusato fbusato self-assigned this Jan 16, 2025
@fbusato fbusato requested a review from a team as a code owner January 16, 2025 00:18
@fbusato fbusato requested a review from ericniebler January 16, 2025 00:18
@fbusato fbusato added the 3.0 Targeted for 3.0 release label Jan 16, 2025
@fbusato fbusato changed the title [DO NOT MERGE] Optimize and clean countl, countr, popcount, has_single_bit Optimize and clean countl, countr, popcount, has_single_bit Jan 21, 2025
Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this ❤️

Could you please split this up into individual PRs, right now there are too many moving pieces

libcudacxx/include/cuda/std/__bit/clz.h Outdated Show resolved Hide resolved
libcudacxx/include/cuda/std/__bit/countr.h Show resolved Hide resolved
libcudacxx/include/cuda/std/__bit/ctz.h Outdated Show resolved Hide resolved
has_single_bit(_Tp __t) noexcept
{
return __has_single_bit(__t);
return _CUDA_VSTD::popcount(__t) == 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really more efficient than the bit trick?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 🙂, popcount is translated into just one instruction in CUDA and most host CPUs

@fbusato fbusato requested a review from miscco January 22, 2025 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 Targeted for 3.0 release
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

[FEA]: Provide optimized <bit> functions for device
2 participants