task: implement `JoinMap` #4538

hawkw · 2022-02-24T20:29:27Z

Depends on #4530.

Motivation

In many cases, it is desirable to spawn a set of tasks associated with
keys, with the ability to cancel them by key. As an example use case for
this sort of thing, see Tower's ReadyCache type.

Now that PR #4530 adds a way of cancelling tasks in a
tokio::task::JoinSet, we can implement a map-like API based on the
same IdleNotifiedSet primitive.

Solution

This PR adds an implementation of a JoinMap type to tokio::task,
using the IdleNotifiedSet that backs JoinMap, and a HashMap of the
AbortHandles added in #4530. Keys are stored in both the HashMap
and alongside the task's JoinHandle in the IdleNotifiedSet in
order to identify tasks as they complete. This allows returning the key
in join_one regardless of whether a task was cancelled, panicked, or
completed successfully, and removing the corresponding AbortHandle
from the HashMap. Of course, this requires keys to be Clone.

Overall, I think the way this works is pretty straightforward; much of
this PR is just API boilerplate to implement the union of applicable
APIs from JoinSet and HashMap.

Individual tasks can be aborted by key using the JoinMap::abort
method, and a set of tasks whose key match a given predicate can be
aborted using JoinMap::abort_matching.

When tasks complete, JoinMap::join_one returns their associated key
alongside the output from the spawned future, or the key and the
JoinError if the task did not complete successfully.

Bikesheds & Future Work

I used a mixture of HashMap and JoinSet naming. Tasks are
spawned rather than inserted, and aborted rather than removed,
but other methods use map-inspired naming. Open to changing this, of
course!
We also could change the join_one method to only return the future's
output, and not the task's key, but I thought it was nicer to do this
rather than not. In particular, this allows the user to determine
which task panicked or was cancelled when a JoinError is returned,
which seems kind of important to me.

Signed-off-by: Eliza Weisman <[email protected]>

Co-authored-by: Alice Ryhl <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

hawkw · 2022-02-24T20:32:14Z

Note that I still need to add some tests + more examples, but I wanted to go ahead and put the PR up so we could agree on the API.

tokio-util/src/task/join_map.rs

This rewrites `JoinMap` to use the same `IdleNotifiedSet` primitive as `JoinSet` directly, rather than wrapping a `JoinSet`. This way, we can just store the keys in the `IdleNotifiedSet`, along with the `JoinHandle`s, rather than having to move them into the task. This means that we now *always* get a key back when a task completes, regardless of whether that task was cancelled, panicked, or completed successfully. There are a couple good implications of always getting a key back: - It lets us change the API to always include the key in `join_one`. This way, the user can determine _which_ task completed, even if is cancelled or panics. This was not previously possible with the old API. - Because we always get the key back, we can always perform an O(1) `HashMap::remove` to clear out abort handles for dead tasks, rather than an O(_n_) `HashMap::retain`. This improves performance when tasks panic or are cancelled significantly. - We no longer need to expose task status APIs on `AbortHandle`, which we may not want to commit to. The tradeoff is that this code now has to live in `tokio` rather than `tokio-util`. I think that's probably acceptable, though.

This does, unfortunately, mean that keys cannot be returned with cancelled tasks, as they are only stored in the map and may have been evicted when replacing a task...however, we could change this behavior by not returning cancelled join errors for tasks that were aborted by replacing in the map. This would mean removing a task from the `IdleNotifiedSet` as well when it is replaced by `insert`. On the other hand, this does make the API closer to `JoinSet`'s, even if it's somewhat less nice.

Signed-off-by: Eliza Weisman <[email protected]>

hawkw · 2022-03-04T23:04:24Z

Okay, I've rewritten the implementation again based on Alice's comments in #4538 (comment). This way, we no longer need to clone the keys into the tasks, which should be much more efficient when cloning keys is relatively expensive (e.g. strings). Would love another look --- I think this is the best iteration so far.

Signed-off-by: Eliza Weisman <[email protected]>

Noah-Kennedy

I think one comment needs cleaned up a bit, otherwise this looks good imo

tokio/src/task/join_map.rs

nylonicious

Just 3 small typos.

tokio/src/task/join_map.rs

Signed-off-by: Eliza Weisman <[email protected]>

hawkw · 2022-03-11T18:01:54Z

@nylonicious thanks for catching those --- that was a find-and-replace mistake, I definitely wouldn't have noticed it otherwise! :)

hawkw · 2022-03-11T18:18:09Z

Hmm, the Hyper CI job is failing, on Windows only: https://github.com/tokio-rs/tokio/runs/5514991976?check_suite_focus=true. I'm gonna restart it.

turns out no doctests for unstable APIs were actually running lmao Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

carllerche · 2022-03-11T22:53:04Z

The patch itself looks good. Others have commented there.

I have already expressed my hesitation with adding JoinMap to Tokio proper in Discord. I will restate some here.

First, adding an explicit JoinMap requires us to match a "map" API. Which one? Should we copy the stdlib HashMap? That would be the obvious API to copy, but it has limitations. First, K: Borrow<Q> has limitations (I won't rehash here, but this is one reason why IndexMap moved away from this. What happens when the user hits these limitations or they are missing "Map" apis that we did not implement on JoinMapor they want to use something likeIndexMap` anyway? How can they solve these?

Separetely, having two types means the user has to pick between them. Do they pick JoinSet or JoinMap? It adds a new decision for them to make and the docs need to have a clear explanation of when to use each one.

If JoinSet generates an "identifier" with each spawned task. We can expose the identifier on the AbortHandle. We also can provide the identifier when a value is joined on JoinSet (e.g. add a join_one_with_id() method). These changes would let the user implement JoinMap logic themselves using whatever map structure they want, either std::HashMap or IndexMap.

My vote is to hold back on JoinMap and add the id concept to JoinSet. We can provide JoinMap in tokio-util.

That said, if everyone else thinks I am wrong, I wont force the issue. Worst case is, we deprecate JoinMap at a later time if we think it sin't the right option post stabilization.

@carllerche

## Motivation PR #4538 adds a prototype implementation of a `JoinMap` API in `tokio::task`. In [this comment][1] on that PR, @carllerche pointed out that a much simpler `JoinMap` type could be implemented outside of `tokio` (either in `tokio-util` or in user code) if we just modified `JoinSet` to return a task ID type when spawning new tasks, and when tasks complete. This seems like a better approach for the following reasons: * A `JoinMap`-like type need not become a permanent part of `tokio`'s stable API * Task IDs seem like something that could be generally useful outside of a `JoinMap` implementation ## Solution This branch adds a `tokio::task::Id` type that uniquely identifies a task relative to all currently spawned tasks. The ID is internally represented as a `NonZeroUsize` that's based on the address of the task's header. I thought that it was better to use addresses to generate IDs than assigning sequential IDs to tasks, because a sequential ID would mean adding an additional usize field to the task data somewhere, making it a word bigger. I've added methods to `JoinHandle` and `AbortHandle` for returning a task's ID. In addition, I modified `JoinSet` to add a `join_with_id` method that behaves identically to `join_one` but also returns an ID. This can be used to implement a `JoinMap` type. Note that because `join_with_id` must return a task ID regardless of whether the task completes successfully or returns a `JoinError` (in order to ensure that dead tasks are always cleaned up from a map), it inverts the ordering of the `Option` and `Result` returned by `join_one` --- which we've bikeshedded about a bit [here][2]. This makes the method's return type inconsistent with the existing `join_one` method, which feels not great. As I see it, there are three possible solutions to this: * change the existing `join_one` method to also swap the `Option` and `Result` nesting for consistency. * change `join_with_id` to return `Result<Option<(Id, T)>, (Id, JoinError)>>`, which feels gross... * add a task ID to `JoinError` as well. [1]: #4538 (comment) [2]: #4335 (comment)

@carllerche

## Motivation PR #4538 adds a prototype implementation of a `JoinMap` API in `tokio::task`. In [this comment][1] on that PR, @carllerche pointed out that a much simpler `JoinMap` type could be implemented outside of `tokio` (either in `tokio-util` or in user code) if we just modified `JoinSet` to return a task ID type when spawning new tasks, and when tasks complete. This seems like a better approach for the following reasons: * A `JoinMap`-like type need not become a permanent part of `tokio`'s stable API * Task IDs seem like something that could be generally useful outside of a `JoinMap` implementation ## Solution This branch adds a `tokio::task::Id` type that uniquely identifies a task relative to all currently spawned tasks. The ID is internally represented as a `NonZeroUsize` that's based on the address of the task's header. I thought that it was better to use addresses to generate IDs than assigning sequential IDs to tasks, because a sequential ID would mean adding an additional usize field to the task data somewhere, making it a word bigger. I've added methods to `JoinHandle` and `AbortHandle` for returning a task's ID. In addition, I modified `JoinSet` to add a `join_with_id` method that behaves identically to `join_one` but also returns an ID. This can be used to implement a `JoinMap` type. Note that because `join_with_id` must return a task ID regardless of whether the task completes successfully or returns a `JoinError` (in order to ensure that dead tasks are always cleaned up from a map), it inverts the ordering of the `Option` and `Result` returned by `join_one` --- which we've bikeshedded about a bit [here][2]. This makes the method's return type inconsistent with the existing `join_one` method, which feels not great. As I see it, there are three possible solutions to this: * change the existing `join_one` method to also swap the `Option` and `Result` nesting for consistency. * change `join_with_id` to return `Result<Option<(Id, T)>, (Id, JoinError)>>`, which feels gross... * add a task ID to `JoinError` as well. [1]: #4538 (comment) [2]: #4335 (comment)

@carllerche

## Motivation PR #4538 adds a prototype implementation of a `JoinMap` API in `tokio::task`. In [this comment][1] on that PR, @carllerche pointed out that a much simpler `JoinMap` type could be implemented outside of `tokio` (either in `tokio-util` or in user code) if we just modified `JoinSet` to return a task ID type when spawning new tasks, and when tasks complete. This seems like a better approach for the following reasons: * A `JoinMap`-like type need not become a permanent part of `tokio`'s stable API * Task IDs seem like something that could be generally useful outside of a `JoinMap` implementation ## Solution This branch adds a `tokio::task::Id` type that uniquely identifies a task relative to all other spawned tasks. Task IDs are assigned sequentially based on an atomic `usize` counter of spawned tasks. In addition, I modified `JoinSet` to add a `join_with_id` method that behaves identically to `join_one` but also returns an ID. This can be used to implement a `JoinMap` type. Note that because `join_with_id` must return a task ID regardless of whether the task completes successfully or returns a `JoinError`, I've also changed `JoinError` to carry the ID of the task that errored, and added a `JoinError::id` method for accessing it. Alternatively, we could have done one of the following: * have `join_with_id` return `Option<(Id, Result<T, JoinError>)>`, which would be inconsistent with the return type of `join_one` (which we've [already bikeshedded over once][2]...) * have `join_with_id` return `Result<Option<(Id, T)>, (Id, JoinError)>>`, which just feels gross. I thought adding the task ID to `JoinError` was the nicest option, and is potentially useful for other stuff as well, so it's probably a good API to have anyway. [1]: #4538 (comment) [2]: #4335 (comment) Closes #4538 Signed-off-by: Eliza Weisman <[email protected]>

## Motivation In many cases, it is desirable to spawn a set of tasks associated with keys, with the ability to cancel them by key. As an example use case for this sort of thing, see Tower's [`ReadyCache` type][1]. Now that PR #4530 adds a way of cancelling tasks in a `tokio::task::JoinSet`, we can implement a map-like API based on the same `IdleNotifiedSet` primitive. ## Solution This PR adds an implementation of a `JoinMap` type to `tokio_util::task`, using the `JoinSet` type from `tokio::task`, the `AbortHandle` type added in #4530, and the new task IDs added in #4630. Individual tasks can be aborted by key using the `JoinMap::abort` method, and a set of tasks whose key match a given predicate can be aborted using `JoinMap::abort_matching`. When tasks complete, `JoinMap::join_one` returns their associated key alongside the output from the spawned future, or the key and the `JoinError` if the task did not complete successfully. Overall, I think the way this works is pretty straightforward; much of this PR is just API boilerplate to implement the union of applicable APIs from `JoinSet` and `HashMap`. Unlike previous iterations on the `JoinMap` API (e.g. #4538), this version is implemented entirely in `tokio_util`, using only public APIs from the `tokio` crate. Currently, the required `tokio` APIs are unstable, but implementing `JoinMap` in `tokio-util` means we will never have to make stability commitments for the `JoinMap` API itself. [1]: https://github.com/tower-rs/tower/blob/master/tower/src/ready_cache/cache.rs Signed-off-by: Eliza Weisman <[email protected]>

github-actions bot added the R-loom Run loom tests on this PR label Feb 24, 2022

hawkw and others added 10 commits February 24, 2022 12:29

rt: add internal AbortHandle type

8992cdc

rt: add abort handle to task combination tests

48dbc38

Signed-off-by: Eliza Weisman <[email protected]>

rt: make AbortHandle::abort consume self

f18bb86

Signed-off-by: Eliza Weisman <[email protected]>

rt: expose JoinHandle with tokio-unstable

87fe715

Signed-off-by: Eliza Weisman <[email protected]>

task: add AbortHandle to JoinSet

2a61f48

Signed-off-by: Eliza Weisman <[email protected]>

fix missing tokio_unstable docs

87d4ed0

Signed-off-by: Eliza Weisman <[email protected]>

fix wasm tests i guess

107f78c

Signed-off-by: Eliza Weisman <[email protected]>

+rustfmt

9dac5b6

Signed-off-by: Eliza Weisman <[email protected]>

Update tokio/src/runtime/task/join.rs

b6dd2fd

Co-authored-by: Alice Ryhl <[email protected]>

cleanup feature flags, add abort module

d77c9be

Signed-off-by: Eliza Weisman <[email protected]>

hawkw force-pushed the eliza/abort-handle branch from fe8158f to d77c9be Compare February 24, 2022 20:29

hawkw added 4 commits February 24, 2022 12:30

rough draft JoinMap API

81a1658

Signed-off-by: Eliza Weisman <[email protected]>

don't leak panicked/canceled tasks; add task status

32c3a7c

Signed-off-by: Eliza Weisman <[email protected]>

add boilerplate hashmap capacity management

03d1bf9

Signed-off-by: Eliza Weisman <[email protected]>

add abort_matching

45baa74

Signed-off-by: Eliza Weisman <[email protected]>

hawkw force-pushed the eliza/join-map branch from 20d6416 to 45baa74 Compare February 24, 2022 20:30

hawkw requested review from carllerche and Darksonn February 24, 2022 20:31

hawkw self-assigned this Feb 24, 2022

hawkw requested a review from tobz February 24, 2022 20:31

hawkw added A-tokio-util Area: The tokio-util crate C-enhancement Category: A PR with an enhancement or bugfix. M-task Module: tokio/task labels Feb 24, 2022

hawkw mentioned this pull request Feb 24, 2022

task: stabilize JoinSet #4535

Closed

2 tasks

Base automatically changed from eliza/abort-handle to master February 24, 2022 21:08

Merge branch 'master' into eliza/join-map

ec7e142

Darksonn reviewed Feb 25, 2022

View reviewed changes

tokio-util/src/task/join_map.rs Outdated Show resolved Hide resolved

hawkw added 4 commits March 4, 2022 11:57

go back to always returning keys, as god intended

ec3b1f3

Signed-off-by: Eliza Weisman <[email protected]>

nicer naming

437897f

Signed-off-by: Eliza Weisman <[email protected]>

improve naming and add comments

c9fd8ab

Signed-off-by: Eliza Weisman <[email protected]>

hawkw requested a review from Darksonn March 4, 2022 23:03

hawkw added 3 commits March 4, 2022 15:05

use loom arc

f609a44

Signed-off-by: Eliza Weisman <[email protected]>

remove hashbrown dependency

69a1c80

Signed-off-by: Eliza Weisman <[email protected]>

add comments, use HashSet rather than HashMap

3bce69b

Signed-off-by: Eliza Weisman <[email protected]>

Noah-Kennedy requested changes Mar 11, 2022

View reviewed changes

tokio/src/task/join_map.rs Outdated Show resolved Hide resolved

nylonicious suggested changes Mar 11, 2022

View reviewed changes

tokio/src/task/join_map.rs Outdated Show resolved Hide resolved

tokio/src/task/join_map.rs Outdated Show resolved Hide resolved

tokio/src/task/join_map.rs Outdated Show resolved Hide resolved

docs cleanup

07f0122

Signed-off-by: Eliza Weisman <[email protected]>

hawkw requested a review from Noah-Kennedy March 11, 2022 18:01

Noah-Kennedy approved these changes Mar 11, 2022

View reviewed changes

hawkw added 4 commits March 11, 2022 10:25

add examples to docs

f1a01d0

fixup doctests

eab0f7a

turns out no doctests for unstable APIs were actually running lmao Signed-off-by: Eliza Weisman <[email protected]>

remove bonus backticks

a46ddc0

Signed-off-by: Eliza Weisman <[email protected]>

Merge branch 'master' into eliza/join-map

e02a605

nylonicious approved these changes Mar 13, 2022

View reviewed changes

hawkw mentioned this pull request Apr 20, 2022

task: add task IDs #4630

Merged

hawkw closed this in #4630 Apr 25, 2022

hawkw mentioned this pull request Apr 25, 2022

util: implement JoinMap #4640

Merged

Darksonn deleted the eliza/join-map branch June 28, 2022 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task: implement `JoinMap` #4538

task: implement `JoinMap` #4538

hawkw commented Feb 24, 2022 •

edited

Loading

hawkw commented Feb 24, 2022

hawkw commented Mar 4, 2022 •

edited

Loading

Noah-Kennedy left a comment

nylonicious left a comment •

edited

Loading

hawkw commented Mar 11, 2022

hawkw commented Mar 11, 2022

carllerche commented Mar 11, 2022

task: implement JoinMap #4538

task: implement JoinMap #4538

Conversation

hawkw commented Feb 24, 2022 • edited Loading

Motivation

Solution

Bikesheds & Future Work

hawkw commented Feb 24, 2022

hawkw commented Mar 4, 2022 • edited Loading

Noah-Kennedy left a comment

Choose a reason for hiding this comment

nylonicious left a comment • edited Loading

Choose a reason for hiding this comment

hawkw commented Mar 11, 2022

hawkw commented Mar 11, 2022

carllerche commented Mar 11, 2022

task: implement `JoinMap` #4538

task: implement `JoinMap` #4538

hawkw commented Feb 24, 2022 •

edited

Loading

hawkw commented Mar 4, 2022 •

edited

Loading

nylonicious left a comment •

edited

Loading