Initial sender/receiver adaptations for tile algorithms #371

msimberg · 2021-05-11T12:21:21Z

This is not meant to be anywhere near complete. It mostly contains a bunch of TODOs and notes for myself about what needs/can be adapted where. However, I'm opening this to start exposing others to the sender/receiver interfaces, and try to collect feedback on what sort of utilities/interfaces/functionality is useful/acceptable/harmful in your opinion.

I've changed the gemm task in one of the triangular solver implementations to use senders/receivers and it passes tests.

I see two primary benefits to using senders/receivers in DLAF:

More flexible sender/receiver customization points should allow us to remove some of the hacks currently in place e.g. for tile lifetime management, which can with senders/receivers be managed in real customization points, rather than our ad-hoc dataflow_finalize.
Potential optimizations when chaining work (I think there are actually very few, if no cases where this applies in DLAF), and potentially fewer heap allocations (although dataflow already does a decent job of this).

Short summary of the "sender adapters" that appear here:

template <Backend B, typename Sender> auto transform(Sender&& sender) {...}: this is a backend-specific version of future::then(executor, ...)/dataflow(executor, ...)/execution::transform (the last one is from P1897). The reason it does not take an executor/scheduler/policy or similar is that the current proposals don't really talk about the kind of customization we want to do (inserting cuda streams etc.) and what is there is semantically a bit shaky/bound to change at least for this use case. The custom CUDA/MPI executors are already a bit questionable.
keep_future: currently the default behaviour of a future as a sender is to get the value from the future and pass that to the continuation. keep_future is an alternative which passes the future itself, similar to how future::then takes a callable which takes a future as its argument. We might swap the default to be the keep_future behaviour. This functionality is mostly intended for the transition phase from futures to senders, and should not be needed in the end.
when_all_lift: P1897 proposes a when_all that takes a pack of senders and triggers a continuation when all the input senders are done. However, it takes only senders, which means that if one has ready values one would have to wrap them in execution::just (this is equivalent to how one would have to do dataflow(f, make_ready_future(3)) if dataflow only supported futures as arguments). It would be nice if the proposed when_all was a bit more flexible in this regard (including accepting void senders), and we'll follow up on that. If the standard when_all ends up accepting only senders, this is still a useful extension for HPX/DLAF.

msimberg · 2021-06-03T15:27:52Z

bors try

bors · 2021-06-03T15:47:46Z

try

Build failed:

ci/gitlab/full-pipeline

msimberg · 2021-06-15T16:43:57Z

Still a draft, but I've updated a few things based on some of the things I presented yesterday.

I've introduced a policy class, templated on the backend. The policy is supposed to be similar to an HPX execution policy, but is not currently one (not necessary at the moment, not for use with HPX parallel algoritms). In the future it could be a proper execution policy, or it could be replaced by an HPX execution policy. The reason for making it a proper class, rather than just the Backend enum is that it can hold properties, such as priorities. Since the executor concept in the latest P0443 is very simplified compared to the executors we're currently using I'm not using the new executors for anything here.

I've also added a few more overloads to the tile algorithms (trsm, gemm, etc.). That means that they currently have the following overloads:

plain blocking CPU overload dispatching to blaspp/lapackpp
plain asynchronous GPU overload dispatching to cuBLAS/cuSOLVER, which takes a stream/handle
an overload that takes a policy and is blocking (this is equivalent to e.g. hpx::for_loop(hpx::execution::par, ...)
an overload that takes a policy and one sender and returns a sender (somewhat close to hpx::dataflow(hpx::for_loop, hpx::execution::par, ...) except it has lazy semantics; with senders it would be used e.g. as hpx::sync_wait(hpx::for_loop(hpx::execution::par, hpx::execution::when_all(...)))
an overload that takes only a policy (this is only for use with operator|, e.g. hpx::execution::when_all(...) | hpx::for_loop(hpx::execution::par) | hpx::execution::sync_wait())

With these changes I think the first two overloads could be moved to internal, and the trsm_o etc. callable objects would never show up in the algorithms (they can be kept around for convenience, but still be internal).

Overall I myself quite like this direction. It aligns quite well with where I think the standard library is possibly headed. The policy overloads are also higher level than the dataflow + executor versions, and gives us room to change things below those overloads (either because we want to or because the proposals change). The higher level ideas are unlikely to change as much. We've also started looking at adding exactly these kinds of overloads to the HPX parallel algorithms.

I think from an algorithm-writers perspective the most important thing is how e.g. this looks: https://github.com/msimberg/DLA-Future/blob/e578e959819142021f12439414a30d7aad193e78/include/dlaf/solver/triangular/impl.h#L39-L42. If this looks like a decent direction I'll start tidying up loose ends (for this iteration...).

Let me know what you think (including @aurianer, @biddisco).

albestro · 2021-06-23T07:56:14Z

I think from an algorithm-writers perspective the most important thing is how e.g. this looks: https://github.com/msimberg/DLA-Future/blob/e578e959819142021f12439414a30d7aad193e78/include/dlaf/solver/triangular/impl.h#L39-L42. If this looks like a decent direction I'll start tidying up loose ends (for this iteration...).

I like how the code looks, I think it is quite clear and easy to read.
For me it's still a long way to fully understand all these things in detail, so maybe I'm not very helpful on commenting the internal mecanisms, but from the user-perspective they're really nice.

Thanks for the effort!

msimberg · 2021-07-12T15:14:36Z

bors try

bors · 2021-07-12T16:28:43Z

try

Build failed:

ci/gitlab/full-pipeline

msimberg · 2021-07-13T09:38:58Z

bors try

msimberg · 2021-07-13T09:43:12Z

bors try

bors · 2021-07-13T09:43:19Z

try

Already running a review

bors · 2021-07-13T10:47:42Z

try

Build failed:

ci/gitlab/full-pipeline

msimberg · 2021-07-13T14:12:53Z

bors try

bors · 2021-07-13T14:12:55Z

try

Merge conflict.

msimberg · 2021-07-13T14:20:18Z

bors try

bors · 2021-07-13T15:17:43Z

try

Build failed:

ci/gitlab/full-pipeline

msimberg · 2021-07-14T09:26:02Z

bors try

msimberg · 2021-07-14T09:44:05Z

bors try

bors · 2021-07-14T09:44:06Z

try

Already running a review

…ing_priority

msimberg · 2021-11-08T10:42:50Z

bors try

…triangular solver

msimberg · 2021-11-08T15:33:37Z

I've added (back in many cases) the priorities as explicit arguments to helper functions.

To summarize, the algorithms that have been senderified in this PR are:

solver/triangular
factorization/cholesky
eigensolver/gen_to_std
copy for Matrix

As a prerequisite all tile algorithms used in the above matrix algorithms have overloads that work with senders and policies.

msimberg · 2021-11-08T15:33:57Z

bors try

bors · 2021-11-08T15:33:58Z

try

Already running a review

msimberg · 2021-11-08T15:37:03Z

bors try-

teonnik

LGTM!

msimberg · 2021-11-08T16:17:28Z

bors try

bors · 2021-11-08T16:44:47Z

try

Build succeeded:

ci/gitlab/full-pipeline

rasolca

@msimberg Are you going to try @albestro 's suggestion (https://godbolt.org/z/6TcGvr7e8 i.e. delete instead of unimplemented macro) or the PR is ready to merge?

msimberg · 2021-11-09T13:14:19Z

@msimberg Are you going to try @albestro 's suggestion (https://godbolt.org/z/6TcGvr7e8 i.e. delete instead of unimplemented macro) or the PR is ready to merge?

See my response here: #371 (comment). I tried it in a standalone setting and it does in principle the right thing. However, my preference is still the static_assert because it let's us put a pretty obvious error message there.

…7.X and HPX master (#428) This allows building DLA-Future with HPX 1.7.X and master (that was not the case with #371). Two reasons: - In 1.7.X sender and receiver customizations could be done as either member functions (these have priority) or tag_dispatch. On master we've made tag_dispatch the only customization mechanism (simplifies internals). - In 1.7.X we made the bad decision of renaming tag_invoke to tag_dispatch (because of concerns of us using it in non-intended ways). We realized this was a mistake and renamed it back to tag_invoke on master. So this PR changes to use tag_invoke/dispatch as the customization mechanism since that works on 1.7.X and master. Additionally, it puts a little local compatibility definition for tag_invoke/dispatch so that the correct function name is used depending on the HPX version. Finally, because of the move to tag_invoke/dispatch the customization for set_value requires SFINAE to not even be considered as an overload for set_error and set_done.

msimberg force-pushed the sender-receiver-adaptation branch from 87cf65f to 34ffb5d Compare June 3, 2021 15:26

bors bot added a commit that referenced this pull request Jun 3, 2021

Try #371:

dbda159

msimberg force-pushed the sender-receiver-adaptation branch from 64f8a41 to 72009e7 Compare June 25, 2021 15:35

msimberg force-pushed the sender-receiver-adaptation branch from 8762b26 to bb1ed9c Compare July 12, 2021 15:14

bors bot added a commit that referenced this pull request Jul 12, 2021

Try #371:

c8bbb82

msimberg force-pushed the sender-receiver-adaptation branch from 63153f8 to e2c26e2 Compare July 13, 2021 09:43

bors bot added a commit that referenced this pull request Jul 13, 2021

Try #371:

3c7394c

msimberg force-pushed the sender-receiver-adaptation branch from 7df4a3a to 33d8bed Compare July 13, 2021 14:20

bors bot added a commit that referenced this pull request Jul 13, 2021

Try #371:

ccae87b

msimberg mentioned this pull request Jul 14, 2021

Porting to [email protected] #388

Merged

msimberg force-pushed the sender-receiver-adaptation branch 2 times, most recently from d0c2340 to b6f3370 Compare July 14, 2021 09:25

msimberg force-pushed the sender-receiver-adaptation branch from b6f3370 to 0d8a3d1 Compare July 14, 2021 09:43

msimberg added 5 commits November 8, 2021 11:20

Fix typo

6a1b1bd

Fix more priorities in triangular solver and rename priority to trail…

bd71139

…ing_priority

Use DLAF_MAKE_SENDER_ALGORITHM_OVERLOADS for tile copy function

d251702

Fix laset use after rebase

7486499

Fix more tile algorithm misses after rebase

ef4ae42

msimberg force-pushed the sender-receiver-adaptation branch from 51fc3ed to ef4ae42 Compare November 8, 2021 10:41

bors bot added a commit that referenced this pull request Nov 8, 2021

Try #371:

421887b

Add DLAF_STATIC_UNIMPLEMENTED macro

7399de8

albestro mentioned this pull request Nov 8, 2021

algo: triangular solver LLT (distributed) #409

Merged

3 tasks

msimberg added 2 commits November 8, 2021 15:51

Make priority always an argument to helper functions in cholesky and …

45888d4

…triangular solver

Make priority always an explicit argument in gen_to_std helper functions

b211347

teonnik approved these changes Nov 8, 2021

View reviewed changes

bors bot added a commit that referenced this pull request Nov 8, 2021

Try #371:

ca9c2f2

rasolca approved these changes Nov 9, 2021

View reviewed changes

rasolca merged commit 89ca6a0 into eth-cscs:master Nov 9, 2021

msimberg deleted the sender-receiver-adaptation branch November 9, 2021 17:04

teonnik mentioned this pull request Nov 10, 2021

Set C++17 flags in DLA-F #427

Merged

msimberg mentioned this pull request Nov 10, 2021

Add compatibility for sender/receiver customization points for HPX 1.7.X and master #428

Merged

msimberg mentioned this pull request Nov 22, 2021

Porting to pika #422

Merged

7 tasks

msimberg mentioned this pull request Dec 6, 2021

High-level plan for sender adaptation #437

Closed

12 tasks

rasolca mentioned this pull request Sep 29, 2023

Fix Cholesky priorities #999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial sender/receiver adaptations for tile algorithms #371

Initial sender/receiver adaptations for tile algorithms #371

msimberg commented May 11, 2021 •

edited

Loading

msimberg commented Jun 3, 2021

bors bot commented Jun 3, 2021

msimberg commented Jun 15, 2021 •

edited

Loading

albestro commented Jun 23, 2021

msimberg commented Jul 12, 2021

bors bot commented Jul 12, 2021

msimberg commented Jul 13, 2021

msimberg commented Jul 13, 2021

bors bot commented Jul 13, 2021

bors bot commented Jul 13, 2021

msimberg commented Jul 13, 2021

bors bot commented Jul 13, 2021

msimberg commented Jul 13, 2021

bors bot commented Jul 13, 2021

msimberg commented Jul 14, 2021

msimberg commented Jul 14, 2021

bors bot commented Jul 14, 2021

msimberg commented Nov 8, 2021

msimberg commented Nov 8, 2021

msimberg commented Nov 8, 2021

bors bot commented Nov 8, 2021

msimberg commented Nov 8, 2021

teonnik left a comment

msimberg commented Nov 8, 2021

bors bot commented Nov 8, 2021

rasolca left a comment

msimberg commented Nov 9, 2021

Initial sender/receiver adaptations for tile algorithms #371

Initial sender/receiver adaptations for tile algorithms #371

Conversation

msimberg commented May 11, 2021 • edited Loading

msimberg commented Jun 3, 2021

bors bot commented Jun 3, 2021

try

msimberg commented Jun 15, 2021 • edited Loading

albestro commented Jun 23, 2021

msimberg commented Jul 12, 2021

bors bot commented Jul 12, 2021

try

msimberg commented Jul 13, 2021

msimberg commented Jul 13, 2021

bors bot commented Jul 13, 2021

try

bors bot commented Jul 13, 2021

try

msimberg commented Jul 13, 2021

bors bot commented Jul 13, 2021

try

msimberg commented Jul 13, 2021

bors bot commented Jul 13, 2021

try

msimberg commented Jul 14, 2021

msimberg commented Jul 14, 2021

bors bot commented Jul 14, 2021

try

msimberg commented Nov 8, 2021

msimberg commented Nov 8, 2021

msimberg commented Nov 8, 2021

bors bot commented Nov 8, 2021

try

msimberg commented Nov 8, 2021

teonnik left a comment

Choose a reason for hiding this comment

msimberg commented Nov 8, 2021

bors bot commented Nov 8, 2021

try

rasolca left a comment

Choose a reason for hiding this comment

msimberg commented Nov 9, 2021

msimberg commented May 11, 2021 •

edited

Loading

msimberg commented Jun 15, 2021 •

edited

Loading