-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial sender/receiver adaptations for tile algorithms #371
Conversation
87cf65f
to
34ffb5d
Compare
bors try |
tryBuild failed: |
Still a draft, but I've updated a few things based on some of the things I presented yesterday. I've introduced a policy class, templated on the backend. The policy is supposed to be similar to an HPX execution policy, but is not currently one (not necessary at the moment, not for use with HPX parallel algoritms). In the future it could be a proper execution policy, or it could be replaced by an HPX execution policy. The reason for making it a proper class, rather than just the Backend enum is that it can hold properties, such as priorities. Since the executor concept in the latest P0443 is very simplified compared to the executors we're currently using I'm not using the new executors for anything here. I've also added a few more overloads to the tile algorithms (trsm, gemm, etc.). That means that they currently have the following overloads:
With these changes I think the first two overloads could be moved to Overall I myself quite like this direction. It aligns quite well with where I think the standard library is possibly headed. The policy overloads are also higher level than the dataflow + executor versions, and gives us room to change things below those overloads (either because we want to or because the proposals change). The higher level ideas are unlikely to change as much. We've also started looking at adding exactly these kinds of overloads to the HPX parallel algorithms. I think from an algorithm-writers perspective the most important thing is how e.g. this looks: https://github.com/msimberg/DLA-Future/blob/e578e959819142021f12439414a30d7aad193e78/include/dlaf/solver/triangular/impl.h#L39-L42. If this looks like a decent direction I'll start tidying up loose ends (for this iteration...). Let me know what you think (including @aurianer, @biddisco). |
I like how the code looks, I think it is quite clear and easy to read. Thanks for the effort! |
64f8a41
to
72009e7
Compare
8762b26
to
bb1ed9c
Compare
bors try |
tryBuild failed: |
bors try |
63153f8
to
e2c26e2
Compare
bors try |
tryAlready running a review |
tryBuild failed: |
bors try |
tryMerge conflict. |
7df4a3a
to
33d8bed
Compare
bors try |
tryBuild failed: |
d0c2340
to
b6f3370
Compare
bors try |
b6f3370
to
0d8a3d1
Compare
bors try |
tryAlready running a review |
51fc3ed
to
ef4ae42
Compare
bors try |
I've added (back in many cases) the priorities as explicit arguments to helper functions. To summarize, the algorithms that have been senderified in this PR are:
As a prerequisite all tile algorithms used in the above matrix algorithms have overloads that work with senders and policies. |
bors try |
tryAlready running a review |
bors try- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
bors try |
tryBuild succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msimberg Are you going to try @albestro 's suggestion (https://godbolt.org/z/6TcGvr7e8 i.e. delete instead of unimplemented macro) or the PR is ready to merge?
See my response here: #371 (comment). I tried it in a standalone setting and it does in principle the right thing. However, my preference is still the |
…7.X and HPX master (#428) This allows building DLA-Future with HPX 1.7.X and master (that was not the case with #371). Two reasons: - In 1.7.X sender and receiver customizations could be done as either member functions (these have priority) or tag_dispatch. On master we've made tag_dispatch the only customization mechanism (simplifies internals). - In 1.7.X we made the bad decision of renaming tag_invoke to tag_dispatch (because of concerns of us using it in non-intended ways). We realized this was a mistake and renamed it back to tag_invoke on master. So this PR changes to use tag_invoke/dispatch as the customization mechanism since that works on 1.7.X and master. Additionally, it puts a little local compatibility definition for tag_invoke/dispatch so that the correct function name is used depending on the HPX version. Finally, because of the move to tag_invoke/dispatch the customization for set_value requires SFINAE to not even be considered as an overload for set_error and set_done.
This is not meant to be anywhere near complete. It mostly contains a bunch of TODOs and notes for myself about what needs/can be adapted where. However, I'm opening this to start exposing others to the sender/receiver interfaces, and try to collect feedback on what sort of utilities/interfaces/functionality is useful/acceptable/harmful in your opinion.I've changed the gemm task in one of the triangular solver implementations to use senders/receivers and it passes tests.
I see two primary benefits to using senders/receivers in DLAF:
dataflow_finalize
.Short summary of the "sender adapters" that appear here:
template <Backend B, typename Sender> auto transform(Sender&& sender) {...}
: this is a backend-specific version offuture::then(executor, ...)
/dataflow(executor, ...)
/execution::transform
(the last one is from P1897). The reason it does not take an executor/scheduler/policy or similar is that the current proposals don't really talk about the kind of customization we want to do (inserting cuda streams etc.) and what is there is semantically a bit shaky/bound to change at least for this use case. The custom CUDA/MPI executors are already a bit questionable.keep_future
: currently the default behaviour of a future as a sender is to get the value from the future and pass that to the continuation.keep_future
is an alternative which passes the future itself, similar to howfuture::then
takes a callable which takes a future as its argument. We might swap the default to be thekeep_future
behaviour. This functionality is mostly intended for the transition phase from futures to senders, and should not be needed in the end.when_all_lift
: P1897 proposes awhen_all
that takes a pack of senders and triggers a continuation when all the input senders are done. However, it takes only senders, which means that if one has ready values one would have to wrap them inexecution::just
(this is equivalent to how one would have to dodataflow(f, make_ready_future(3))
if dataflow only supported futures as arguments). It would be nice if the proposedwhen_all
was a bit more flexible in this regard (including accepting void senders), and we'll follow up on that. If the standardwhen_all
ends up accepting only senders, this is still a useful extension for HPX/DLAF.