-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize and clean countl
, countr
, popcount
, has_single_bit
#3414
base: main
Are you sure you want to change the base?
Optimize and clean countl
, countr
, popcount
, has_single_bit
#3414
Conversation
countl
, countr
, popcount
, has_single_bit
countl
, countr
, popcount
, has_single_bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this ❤️
Could you please split this up into individual PRs, right now there are too many moving pieces
has_single_bit(_Tp __t) noexcept | ||
{ | ||
return __has_single_bit(__t); | ||
return _CUDA_VSTD::popcount(__t) == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really more efficient than the bit trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes 🙂, popcount
is translated into just one instruction in CUDA and most host CPUs
(Continuation of #3226. Renamed branch and PR to avoid confusion)
Fixes #2239
Description
Optimize and cleanup the following functions:
countl_zero
/countr_zero
countl_one
,countr_one
popcount
has_single_bit
Run-time optimizations are described in #2239.
Features:
constexpr
to avoid several function instantiations_CCCL_NODISCARD
to all functions#### DO NOT MERGErequire C++17