-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram refactor #1003
Histogram refactor #1003
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's beautiful. One comment to try and avoid a problematic case from mixing macros and templates.
src/cunumeric/stat/histogram_impl.h
Outdated
|
||
auto&& [global_result_size, global_result_ptr] = accessors::get_accessor_ptr(result, result_rect); | ||
|
||
#ifdef CUDA_STREAM_CHECKER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead guard this with something like
if constexpr (!std::is_same_v<exe_policy_t, std::remove_cv_t<decltype(thrust::host)>> &&
!std::is_same_v<exe_policy_t, std::remove_cv_t<decltype(thrust::omp::par)>)
and possibly abstract-out this condition into a helper template struct like is_host_policy_v
? This may be generally useful if we start reusing this pattern in other operations as well.
Using a macro like this to control the instantiation of a template makes me a bit uneasy. It's conceptually possible that another compilation unit could instantiate histogram_wrapper<thrust::cuda::par_nosync, ...>
without defining CUDA_STREAM_CHECKER
, then we end up with two instantiations of the same template with different contents.
This is kind of a far-fetched scenario for this particular case, but if it's easy to avoid it and keep the code robust, might as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if constexpr(...)
cannot be a substitute for a pre-processor macro, like #ifdef
. That's because the compile-time false
path of an if constexpr(...)
still gets compiled .
For example (see https://en.cppreference.com/w/cpp/language/if):
void f()
{
if constexpr(false)
{
int i = 0;
int *p = i; // Error even though in discarded statement
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only option to #ifdef CUDA_STREAM_CHECKER
is specializing a function template
template <typename exe_policy_t> synchronize(exe_policy_t exe_pol){...}
The problem is function template cannot be partially specialized. They need to be fully specialized. And neither the host nor the gpu policies are unique. That means having to explicitly specialize on every possible thrust execution policy.
But, I can provide a template class to that effect, which can be partially specialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the compile-time false path of an if constexpr(...) still gets compiled .
Would histogram_wrapper
fall into the exception mentioned later in https://en.cppreference.com/w/cpp/language/if?
If a constexpr if statement appears inside a templated entity, and if condition is not value-dependent after instantiation, the discarded statement is not instantiated when the enclosing template is instantiated.
I think it does, by virtue of being a function template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work in one compiler at least https://godbolt.org/z/v6b4veGrz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
histogram.cc
, and histogram_omp.cc
fail to compile for me with the if constexpr(...)
approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In file included from /home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram.cc:21:
/home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h: In function 'void cunumeric::detail::histogram_wrapper(exe_policy_t, legate::AccessorRO<elem_t, 1>&, legate::Rect<1>&, legate::AccessorRO<bin_t, 1>&, legate::Rect<1>&, legate::AccessorRO<weight_t, 1>&, legate::Rect<1>&, legate::AccessorRD<Legion::SumReduction<weight_t>, true, 1>&, legate::Rect<1>&, cudaStream_t)':
/home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h:145:52: error: there are no arguments to 'CHECK_CUDA_STREAM' that depend on a template parameter, so a declaration of 'CHECK_CUDA_STREAM' must be available [-fpermissive]
145 | if constexpr (!is_host_policy_v<exe_policy_t>) { CHECK_CUDA_STREAM(stream); }
| ^~~~~~~~~~~~~~~~~
/home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h:145:52: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h:156:52: error: there are no arguments to 'CHECK_CUDA_STREAM' that depend on a template parameter, so a declaration of 'CHECK_CUDA_STREAM' must be available [-fpermissive]
156 | if constexpr (!is_host_policy_v<exe_policy_t>) { CHECK_CUDA_STREAM(stream); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like I was giving if constexpr
too much credit.
But, I can provide a template class to that effect, which can be partially specialized.
Would this still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can use a sync_policy_t
which then can be specialized and used similarly to segmented_sum_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, clean code we can be proud of
Thanks! |
Refactoring done:
decltype(ex_pol_obj)
, which is less likely to change in future CTK releases of Thrust.