Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec::reduce including a customization for static_thread_pool #992

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

muellan
Copy link

@muellan muellan commented Jul 5, 2023

This adds

  • exec::reduce
  • a parallelized specialization of exec::reduce for running on exec::static_thread_pool's scheduler
  • tests covering basic usage with different reduction operations, input value-range edge cases, ranges of non-fundamental types, different schedulers, etc.

This does not (yet) introduce a common customization point with nvexec::reduce mainly because this would require a common default reduction operation (nvexec::reduce uses cub::Sum by default while exec::reduce uses std::plus<>).
If this is desired then one should probably also provide an entire set of common reduction operations like min, max, sum, product, etc.

@rapids-bot
Copy link

rapids-bot bot commented Jul 5, 2023

Pull requests from external contributors require approval from a NVIDIA organization member with write permissions or greater before CI can begin.

@muellan muellan changed the title added exec::reduce including a customization for static_thread_pool exec::reduce including a customization for static_thread_pool Jul 5, 2023
@ericniebler
Copy link
Collaborator

/ok to test

Copy link
Collaborator

@ericniebler ericniebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really great, thanks Andre. A few nits, and needs some more type-checking.

auto reducer = [&](auto& input) {
const auto [first, last] = even_share(std::ranges::size(input), tid, total_threads);
if (first != last) {
auto begin = std::ranges::begin(input) + first;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we require that the input range is random-access? Is that over-constraining?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that I still need to add more type checking.
If it were up to me I would constrain it to random access ranges. On the other hand std::reduce requires only input iterators or forward iterators for the execution policy overloads. I don't know if anyone would actually expect exec::reduce to work with an input range.

Comment on lines 455 to 456
// deallocate
sh_state.partials_ = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't actually free any memory; it'll just set the size to zero. If you really need to free the memory early, the surest way is to swap it with a default-constructed vector.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't believe I fell for this age-old trap.

sh_state.apply(finalize);
}
};
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't appear to be handling exceptions that may be thrown from the reduction function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. I left exception handling out, just wanted stuff to work first. I'll add it.

_InitT,
_RedOp>)
&& (!stdexec::tag_invocable<reduce_t, _Sender, _InitT, _RedOp>)
STDEXEC_DETAIL_CUDACC_HOST_DEVICE __sender<_Sender, _InitT, _RedOp>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should require that _RedOp is callable with (_InitT, _InitT), right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

Comment on lines +128 to +130
template <stdexec::__decays_to<__t> _Self, class _Env>
friend auto tag_invoke(stdexec::get_completion_signatures_t, _Self&&, _Env&&)
-> stdexec::completion_signatures<stdexec::set_value_t(_InitT)>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we have all the type information we need in order to do full type-checking. We can and should check that the sender can only complete one way, with a single value that is a range whose element type is the same as _InitT.

We also know whether the reduction function is potentially throwing, and if it is, to add set_error_t(exception_ptr) as a possible completion signature.

If you're feeling ambitious, you can see how an adaptor like then detects when the types don't match and tries to return a type that describes the error -- or propagates type errors from inner senders. If that's too much, don't sweat it. We can improve diagnostics in a separate PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen the error handling in other sender adaptors. I think I should add it as part of this PR.

Comment on lines +471 to +482
template <class T>
struct dummy_ {
using __t = T;
};

using inrange_t = //
stdexec::__decay_t<stdexec::__t<stdexec::__t< stdexec::__value_types_of_t<
Sender,
stdexec::env_of_t<Receiver>,
stdexec::__q<dummy_>,
stdexec::__q<dummy_>>>>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is equivalent:

Suggested change
template <class T>
struct dummy_ {
using __t = T;
};
using inrange_t = //
stdexec::__decay_t<stdexec::__t<stdexec::__t< stdexec::__value_types_of_t<
Sender,
stdexec::env_of_t<Receiver>,
stdexec::__q<dummy_>,
stdexec::__q<dummy_>>>>>;
using inrange_t = //
stdexec::__decay_t<
stdexec::__value_types_of_t<
Sender,
stdexec::env_of_t<Receiver>,
stdexec::__q<stdexec::__midentity>,
stdexec::__q<stdexec::__midentity>>>;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I still need to get a better overview over what metaprogramming helpers exist.

#include "../stdexec/execution.hpp"

#include <numeric>
#include <ranges>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok for it to be an error to include exec/reduce.hpp if the stdlib doesn't support <ranges>, but the reduce test should simply not run in that case instead of causing a test failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

template <
stdexec::__same_as<stdexec::set_value_t> _Tag,
class _Range,
class _Value = stdexec::range_value_t<_Range>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::ranges::range_value_t ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, that's a leftover I forgot to remove

@muellan muellan force-pushed the exec-reduce branch 2 times, most recently from e88b0fe to cc964a1 Compare July 6, 2023 07:01
@ericniebler
Copy link
Collaborator

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants