-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task: implement JoinMap
#4538
task: implement JoinMap
#4538
Conversation
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Co-authored-by: Alice Ryhl <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
fe8158f
to
d77c9be
Compare
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
20d6416
to
45baa74
Compare
Note that I still need to add some tests + more examples, but I wanted to go ahead and put the PR up so we could agree on the API. |
This rewrites `JoinMap` to use the same `IdleNotifiedSet` primitive as `JoinSet` directly, rather than wrapping a `JoinSet`. This way, we can just store the keys in the `IdleNotifiedSet`, along with the `JoinHandle`s, rather than having to move them into the task. This means that we now *always* get a key back when a task completes, regardless of whether that task was cancelled, panicked, or completed successfully. There are a couple good implications of always getting a key back: - It lets us change the API to always include the key in `join_one`. This way, the user can determine _which_ task completed, even if is cancelled or panics. This was not previously possible with the old API. - Because we always get the key back, we can always perform an O(1) `HashMap::remove` to clear out abort handles for dead tasks, rather than an O(_n_) `HashMap::retain`. This improves performance when tasks panic or are cancelled significantly. - We no longer need to expose task status APIs on `AbortHandle`, which we may not want to commit to. The tradeoff is that this code now has to live in `tokio` rather than `tokio-util`. I think that's probably acceptable, though.
This does, unfortunately, mean that keys cannot be returned with cancelled tasks, as they are only stored in the map and may have been evicted when replacing a task...however, we could change this behavior by not returning cancelled join errors for tasks that were aborted by replacing in the map. This would mean removing a task from the `IdleNotifiedSet` as well when it is replaced by `insert`. On the other hand, this does make the API closer to `JoinSet`'s, even if it's somewhat less nice.
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Okay, I've rewritten the implementation again based on Alice's comments in #4538 (comment). This way, we no longer need to clone the keys into the tasks, which should be much more efficient when cloning keys is relatively expensive (e.g. strings). Would love another look --- I think this is the best iteration so far. |
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one comment needs cleaned up a bit, otherwise this looks good imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 3 small typos.
Signed-off-by: Eliza Weisman <[email protected]>
@nylonicious thanks for catching those --- that was a find-and-replace mistake, I definitely wouldn't have noticed it otherwise! :) |
Hmm, the Hyper CI job is failing, on Windows only: https://github.com/tokio-rs/tokio/runs/5514991976?check_suite_focus=true. I'm gonna restart it. |
turns out no doctests for unstable APIs were actually running lmao Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
The patch itself looks good. Others have commented there. I have already expressed my hesitation with adding First, adding an explicit Separetely, having two types means the user has to pick between them. Do they pick If My vote is to hold back on That said, if everyone else thinks I am wrong, I wont force the issue. Worst case is, we deprecate |
## Motivation PR #4538 adds a prototype implementation of a `JoinMap` API in `tokio::task`. In [this comment][1] on that PR, @carllerche pointed out that a much simpler `JoinMap` type could be implemented outside of `tokio` (either in `tokio-util` or in user code) if we just modified `JoinSet` to return a task ID type when spawning new tasks, and when tasks complete. This seems like a better approach for the following reasons: * A `JoinMap`-like type need not become a permanent part of `tokio`'s stable API * Task IDs seem like something that could be generally useful outside of a `JoinMap` implementation ## Solution This branch adds a `tokio::task::Id` type that uniquely identifies a task relative to all currently spawned tasks. The ID is internally represented as a `NonZeroUsize` that's based on the address of the task's header. I thought that it was better to use addresses to generate IDs than assigning sequential IDs to tasks, because a sequential ID would mean adding an additional usize field to the task data somewhere, making it a word bigger. I've added methods to `JoinHandle` and `AbortHandle` for returning a task's ID. In addition, I modified `JoinSet` to add a `join_with_id` method that behaves identically to `join_one` but also returns an ID. This can be used to implement a `JoinMap` type. Note that because `join_with_id` must return a task ID regardless of whether the task completes successfully or returns a `JoinError` (in order to ensure that dead tasks are always cleaned up from a map), it inverts the ordering of the `Option` and `Result` returned by `join_one` --- which we've bikeshedded about a bit [here][2]. This makes the method's return type inconsistent with the existing `join_one` method, which feels not great. As I see it, there are three possible solutions to this: * change the existing `join_one` method to also swap the `Option` and `Result` nesting for consistency. * change `join_with_id` to return `Result<Option<(Id, T)>, (Id, JoinError)>>`, which feels gross... * add a task ID to `JoinError` as well. [1]: #4538 (comment) [2]: #4335 (comment)
## Motivation PR #4538 adds a prototype implementation of a `JoinMap` API in `tokio::task`. In [this comment][1] on that PR, @carllerche pointed out that a much simpler `JoinMap` type could be implemented outside of `tokio` (either in `tokio-util` or in user code) if we just modified `JoinSet` to return a task ID type when spawning new tasks, and when tasks complete. This seems like a better approach for the following reasons: * A `JoinMap`-like type need not become a permanent part of `tokio`'s stable API * Task IDs seem like something that could be generally useful outside of a `JoinMap` implementation ## Solution This branch adds a `tokio::task::Id` type that uniquely identifies a task relative to all currently spawned tasks. The ID is internally represented as a `NonZeroUsize` that's based on the address of the task's header. I thought that it was better to use addresses to generate IDs than assigning sequential IDs to tasks, because a sequential ID would mean adding an additional usize field to the task data somewhere, making it a word bigger. I've added methods to `JoinHandle` and `AbortHandle` for returning a task's ID. In addition, I modified `JoinSet` to add a `join_with_id` method that behaves identically to `join_one` but also returns an ID. This can be used to implement a `JoinMap` type. Note that because `join_with_id` must return a task ID regardless of whether the task completes successfully or returns a `JoinError` (in order to ensure that dead tasks are always cleaned up from a map), it inverts the ordering of the `Option` and `Result` returned by `join_one` --- which we've bikeshedded about a bit [here][2]. This makes the method's return type inconsistent with the existing `join_one` method, which feels not great. As I see it, there are three possible solutions to this: * change the existing `join_one` method to also swap the `Option` and `Result` nesting for consistency. * change `join_with_id` to return `Result<Option<(Id, T)>, (Id, JoinError)>>`, which feels gross... * add a task ID to `JoinError` as well. [1]: #4538 (comment) [2]: #4335 (comment)
## Motivation PR #4538 adds a prototype implementation of a `JoinMap` API in `tokio::task`. In [this comment][1] on that PR, @carllerche pointed out that a much simpler `JoinMap` type could be implemented outside of `tokio` (either in `tokio-util` or in user code) if we just modified `JoinSet` to return a task ID type when spawning new tasks, and when tasks complete. This seems like a better approach for the following reasons: * A `JoinMap`-like type need not become a permanent part of `tokio`'s stable API * Task IDs seem like something that could be generally useful outside of a `JoinMap` implementation ## Solution This branch adds a `tokio::task::Id` type that uniquely identifies a task relative to all other spawned tasks. Task IDs are assigned sequentially based on an atomic `usize` counter of spawned tasks. In addition, I modified `JoinSet` to add a `join_with_id` method that behaves identically to `join_one` but also returns an ID. This can be used to implement a `JoinMap` type. Note that because `join_with_id` must return a task ID regardless of whether the task completes successfully or returns a `JoinError`, I've also changed `JoinError` to carry the ID of the task that errored, and added a `JoinError::id` method for accessing it. Alternatively, we could have done one of the following: * have `join_with_id` return `Option<(Id, Result<T, JoinError>)>`, which would be inconsistent with the return type of `join_one` (which we've [already bikeshedded over once][2]...) * have `join_with_id` return `Result<Option<(Id, T)>, (Id, JoinError)>>`, which just feels gross. I thought adding the task ID to `JoinError` was the nicest option, and is potentially useful for other stuff as well, so it's probably a good API to have anyway. [1]: #4538 (comment) [2]: #4335 (comment) Closes #4538 Signed-off-by: Eliza Weisman <[email protected]>
## Motivation In many cases, it is desirable to spawn a set of tasks associated with keys, with the ability to cancel them by key. As an example use case for this sort of thing, see Tower's [`ReadyCache` type][1]. Now that PR #4530 adds a way of cancelling tasks in a `tokio::task::JoinSet`, we can implement a map-like API based on the same `IdleNotifiedSet` primitive. ## Solution This PR adds an implementation of a `JoinMap` type to `tokio_util::task`, using the `JoinSet` type from `tokio::task`, the `AbortHandle` type added in #4530, and the new task IDs added in #4630. Individual tasks can be aborted by key using the `JoinMap::abort` method, and a set of tasks whose key match a given predicate can be aborted using `JoinMap::abort_matching`. When tasks complete, `JoinMap::join_one` returns their associated key alongside the output from the spawned future, or the key and the `JoinError` if the task did not complete successfully. Overall, I think the way this works is pretty straightforward; much of this PR is just API boilerplate to implement the union of applicable APIs from `JoinSet` and `HashMap`. Unlike previous iterations on the `JoinMap` API (e.g. #4538), this version is implemented entirely in `tokio_util`, using only public APIs from the `tokio` crate. Currently, the required `tokio` APIs are unstable, but implementing `JoinMap` in `tokio-util` means we will never have to make stability commitments for the `JoinMap` API itself. [1]: https://github.com/tower-rs/tower/blob/master/tower/src/ready_cache/cache.rs Signed-off-by: Eliza Weisman <[email protected]>
Depends on #4530.
Motivation
In many cases, it is desirable to spawn a set of tasks associated with
keys, with the ability to cancel them by key. As an example use case for
this sort of thing, see Tower's
ReadyCache
type.Now that PR #4530 adds a way of cancelling tasks in a
tokio::task::JoinSet
, we can implement a map-like API based on thesame
IdleNotifiedSet
primitive.Solution
This PR adds an implementation of a
JoinMap
type totokio::task
,using the
IdleNotifiedSet
that backsJoinMap
, and aHashMap
of theAbortHandle
s added in #4530. Keys are stored in both theHashMap
and alongside the task's
JoinHandle
in theIdleNotifiedSet
inorder to identify tasks as they complete. This allows returning the key
in
join_one
regardless of whether a task was cancelled, panicked, orcompleted successfully, and removing the corresponding
AbortHandle
from the
HashMap
. Of course, this requires keys to beClone
.Overall, I think the way this works is pretty straightforward; much of
this PR is just API boilerplate to implement the union of applicable
APIs from
JoinSet
andHashMap
.Individual tasks can be aborted by key using the
JoinMap::abort
method, and a set of tasks whose key match a given predicate can be
aborted using
JoinMap::abort_matching
.When tasks complete,
JoinMap::join_one
returns their associated keyalongside the output from the spawned future, or the key and the
JoinError
if the task did not complete successfully.Bikesheds & Future Work
I used a mixture of
HashMap
andJoinSet
naming. Tasks arespawn
ed rather thaninsert
ed, andabort
ed rather thanremove
d,but other methods use map-inspired naming. Open to changing this, of
course!
We also could change the
join_one
method to only return the future'soutput, and not the task's key, but I thought it was nicer to do this
rather than not. In particular, this allows the user to determine
which task panicked or was cancelled when a
JoinError
is returned,which seems kind of important to me.