Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task: implement JoinMap #4538

Closed
wants to merge 43 commits into from
Closed

task: implement JoinMap #4538

wants to merge 43 commits into from

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Feb 24, 2022

Depends on #4530.

Motivation

In many cases, it is desirable to spawn a set of tasks associated with
keys, with the ability to cancel them by key. As an example use case for
this sort of thing, see Tower's ReadyCache type.

Now that PR #4530 adds a way of cancelling tasks in a
tokio::task::JoinSet, we can implement a map-like API based on the
same IdleNotifiedSet primitive.

Solution

This PR adds an implementation of a JoinMap type to tokio::task,
using the IdleNotifiedSet that backs JoinMap, and a HashMap of the
AbortHandles added in #4530. Keys are stored in both the HashMap
and alongside the task's JoinHandle in the IdleNotifiedSet in
order to identify tasks as they complete. This allows returning the key
in join_one regardless of whether a task was cancelled, panicked, or
completed successfully, and removing the corresponding AbortHandle
from the HashMap. Of course, this requires keys to be Clone.

Overall, I think the way this works is pretty straightforward; much of
this PR is just API boilerplate to implement the union of applicable
APIs from JoinSet and HashMap.

Individual tasks can be aborted by key using the JoinMap::abort
method, and a set of tasks whose key match a given predicate can be
aborted using JoinMap::abort_matching.

When tasks complete, JoinMap::join_one returns their associated key
alongside the output from the spawned future, or the key and the
JoinError if the task did not complete successfully.

Bikesheds & Future Work

  • I used a mixture of HashMap and JoinSet naming. Tasks are
    spawned rather than inserted, and aborted rather than removed,
    but other methods use map-inspired naming. Open to changing this, of
    course!

  • We also could change the join_one method to only return the future's
    output, and not the task's key, but I thought it was nicer to do this
    rather than not. In particular, this allows the user to determine
    which task panicked or was cancelled when a JoinError is returned,
    which seems kind of important to me.

@github-actions github-actions bot added the R-loom Run loom tests on this PR label Feb 24, 2022
@hawkw hawkw force-pushed the eliza/abort-handle branch from fe8158f to d77c9be Compare February 24, 2022 20:29
@hawkw hawkw requested review from carllerche and Darksonn February 24, 2022 20:31
@hawkw hawkw self-assigned this Feb 24, 2022
@hawkw hawkw requested a review from tobz February 24, 2022 20:31
@hawkw hawkw added A-tokio-util Area: The tokio-util crate C-enhancement Category: A PR with an enhancement or bugfix. M-task Module: tokio/task labels Feb 24, 2022
@hawkw
Copy link
Member Author

hawkw commented Feb 24, 2022

Note that I still need to add some tests + more examples, but I wanted to go ahead and put the PR up so we could agree on the API.

@hawkw hawkw mentioned this pull request Feb 24, 2022
2 tasks
Base automatically changed from eliza/abort-handle to master February 24, 2022 21:08
This rewrites `JoinMap` to use the same `IdleNotifiedSet` primitive as
`JoinSet` directly, rather than wrapping a `JoinSet`. This way, we can
just store the keys in the `IdleNotifiedSet`, along with the
`JoinHandle`s, rather than having to move them into the task. This
means that we now *always* get a key back when a task completes,
regardless of whether that task was cancelled, panicked, or completed
successfully.

There are a couple good implications of always getting a key back:

- It lets us change the API to always include the key in `join_one`.
  This way, the user can determine _which_ task completed, even if is
  cancelled or panics. This was not previously possible with the old
  API.

- Because we always get the key back, we can always perform an O(1)
  `HashMap::remove` to clear out abort handles for dead tasks, rather
  than an O(_n_) `HashMap::retain`. This improves performance when tasks
  panic or are cancelled significantly.

- We no longer need to expose task status APIs on `AbortHandle`, which
  we may not want to commit to.

The tradeoff is that this code now has to live in `tokio` rather than
`tokio-util`. I think that's probably acceptable, though.
hawkw added 4 commits March 4, 2022 11:57
This does, unfortunately, mean that keys cannot be returned with
cancelled tasks, as they are only stored in the map and may have been
evicted when replacing a task...however, we could change this
behavior by not returning cancelled join errors for tasks that were
aborted by replacing in the map. This would mean removing a task from
the `IdleNotifiedSet` as well when it is replaced by `insert`.

On the other hand, this does make the API closer to `JoinSet`'s, even if
it's somewhat less nice.
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw requested a review from Darksonn March 4, 2022 23:03
@hawkw
Copy link
Member Author

hawkw commented Mar 4, 2022

Okay, I've rewritten the implementation again based on Alice's comments in #4538 (comment). This way, we no longer need to clone the keys into the tasks, which should be much more efficient when cloning keys is relatively expensive (e.g. strings). Would love another look --- I think this is the best iteration so far.

hawkw added 3 commits March 4, 2022 15:05
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Copy link
Contributor

@Noah-Kennedy Noah-Kennedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one comment needs cleaned up a bit, otherwise this looks good imo

tokio/src/task/join_map.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@nylonicious nylonicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 3 small typos.

tokio/src/task/join_map.rs Outdated Show resolved Hide resolved
tokio/src/task/join_map.rs Outdated Show resolved Hide resolved
tokio/src/task/join_map.rs Outdated Show resolved Hide resolved
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw requested a review from Noah-Kennedy March 11, 2022 18:01
@hawkw
Copy link
Member Author

hawkw commented Mar 11, 2022

@nylonicious thanks for catching those --- that was a find-and-replace mistake, I definitely wouldn't have noticed it otherwise! :)

@hawkw
Copy link
Member Author

hawkw commented Mar 11, 2022

Hmm, the Hyper CI job is failing, on Windows only: https://github.com/tokio-rs/tokio/runs/5514991976?check_suite_focus=true. I'm gonna restart it.

hawkw added 4 commits March 11, 2022 10:25
turns out no doctests for unstable APIs were actually running lmao

Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@carllerche
Copy link
Member

The patch itself looks good. Others have commented there.

I have already expressed my hesitation with adding JoinMap to Tokio proper in Discord. I will restate some here.

First, adding an explicit JoinMap requires us to match a "map" API. Which one? Should we copy the stdlib HashMap? That would be the obvious API to copy, but it has limitations. First, K: Borrow<Q> has limitations (I won't rehash here, but this is one reason why IndexMap moved away from this. What happens when the user hits these limitations or they are missing "Map" apis that we did not implement on JoinMapor they want to use something likeIndexMap` anyway? How can they solve these?

Separetely, having two types means the user has to pick between them. Do they pick JoinSet or JoinMap? It adds a new decision for them to make and the docs need to have a clear explanation of when to use each one.

If JoinSet generates an "identifier" with each spawned task. We can expose the identifier on the AbortHandle. We also can provide the identifier when a value is joined on JoinSet (e.g. add a join_one_with_id() method). These changes would let the user implement JoinMap logic themselves using whatever map structure they want, either std::HashMap or IndexMap.

My vote is to hold back on JoinMap and add the id concept to JoinSet. We can provide JoinMap in tokio-util.

That said, if everyone else thinks I am wrong, I wont force the issue. Worst case is, we deprecate JoinMap at a later time if we think it sin't the right option post stabilization.

hawkw added a commit that referenced this pull request Apr 20, 2022
## Motivation

PR #4538 adds a prototype implementation of a `JoinMap` API in
`tokio::task`. In [this comment][1] on that PR, @carllerche pointed out
that a much simpler `JoinMap` type could be implemented outside of
`tokio` (either in `tokio-util` or in user code) if we just modified
`JoinSet` to return a task ID type when spawning new tasks, and when
tasks complete. This seems like a better approach for the following
reasons:

* A `JoinMap`-like type need not become a permanent part of `tokio`'s
  stable API
* Task IDs seem like something that could be generally useful outside of
  a `JoinMap` implementation

## Solution

This branch adds a `tokio::task::Id` type that uniquely identifies a
task relative to all currently spawned tasks. The ID is internally
represented as a `NonZeroUsize` that's based on the address of the
task's header. I thought that it was better to use addresses to generate
IDs than assigning sequential IDs to tasks, because a sequential ID
would mean adding an additional usize field to the task data
somewhere, making it a word bigger. I've added methods to `JoinHandle`
and `AbortHandle` for returning a task's ID.

In addition, I modified `JoinSet` to add a `join_with_id` method that
behaves identically to `join_one` but also returns an ID. This can be
used to implement a `JoinMap` type.

Note that because `join_with_id` must return a task ID regardless of
whether the task completes successfully or returns a `JoinError` (in
order to ensure that dead tasks are always cleaned up from a map), it
inverts the ordering of the `Option` and `Result` returned by `join_one`
--- which we've bikeshedded about a bit [here][2]. This makes the
method's return type inconsistent with the existing `join_one` method,
which feels not great. As I see it, there are three possible solutions
to this:

* change the existing `join_one` method to also swap the `Option` and
  `Result` nesting for consistency.
* change `join_with_id` to return `Result<Option<(Id, T)>, (Id,
  JoinError)>>`, which feels gross...
* add a task ID to `JoinError` as well.

[1]: #4538 (comment)
[2]: #4335 (comment)
@hawkw hawkw mentioned this pull request Apr 20, 2022
hawkw added a commit that referenced this pull request Apr 22, 2022
## Motivation

PR #4538 adds a prototype implementation of a `JoinMap` API in
`tokio::task`. In [this comment][1] on that PR, @carllerche pointed out
that a much simpler `JoinMap` type could be implemented outside of
`tokio` (either in `tokio-util` or in user code) if we just modified
`JoinSet` to return a task ID type when spawning new tasks, and when
tasks complete. This seems like a better approach for the following
reasons:

* A `JoinMap`-like type need not become a permanent part of `tokio`'s
  stable API
* Task IDs seem like something that could be generally useful outside of
  a `JoinMap` implementation

## Solution

This branch adds a `tokio::task::Id` type that uniquely identifies a
task relative to all currently spawned tasks. The ID is internally
represented as a `NonZeroUsize` that's based on the address of the
task's header. I thought that it was better to use addresses to generate
IDs than assigning sequential IDs to tasks, because a sequential ID
would mean adding an additional usize field to the task data
somewhere, making it a word bigger. I've added methods to `JoinHandle`
and `AbortHandle` for returning a task's ID.

In addition, I modified `JoinSet` to add a `join_with_id` method that
behaves identically to `join_one` but also returns an ID. This can be
used to implement a `JoinMap` type.

Note that because `join_with_id` must return a task ID regardless of
whether the task completes successfully or returns a `JoinError` (in
order to ensure that dead tasks are always cleaned up from a map), it
inverts the ordering of the `Option` and `Result` returned by `join_one`
--- which we've bikeshedded about a bit [here][2]. This makes the
method's return type inconsistent with the existing `join_one` method,
which feels not great. As I see it, there are three possible solutions
to this:

* change the existing `join_one` method to also swap the `Option` and
  `Result` nesting for consistency.
* change `join_with_id` to return `Result<Option<(Id, T)>, (Id,
  JoinError)>>`, which feels gross...
* add a task ID to `JoinError` as well.

[1]: #4538 (comment)
[2]: #4335 (comment)
@hawkw hawkw closed this in #4630 Apr 25, 2022
hawkw added a commit that referenced this pull request Apr 25, 2022
## Motivation

PR #4538 adds a prototype implementation of a `JoinMap` API in
`tokio::task`. In [this comment][1] on that PR, @carllerche pointed out
that a much simpler `JoinMap` type could be implemented outside of
`tokio` (either in `tokio-util` or in user code) if we just modified
`JoinSet` to return a task ID type when spawning new tasks, and when
tasks complete. This seems like a better approach for the following
reasons:

* A `JoinMap`-like type need not become a permanent part of `tokio`'s
  stable API
* Task IDs seem like something that could be generally useful outside of
  a `JoinMap` implementation

## Solution

This branch adds a `tokio::task::Id` type that uniquely identifies a
task relative to all other spawned tasks. Task IDs are assigned
sequentially based on an atomic `usize` counter of spawned tasks.

In addition, I modified `JoinSet` to add a `join_with_id` method that
behaves identically to `join_one` but also returns an ID. This can be
used to implement a `JoinMap` type.

Note that because `join_with_id` must return a task ID regardless of
whether the task completes successfully or returns a `JoinError`, I've
also changed `JoinError` to carry the ID of the task that errored, and 
added a `JoinError::id` method for accessing it. Alternatively, we could
have done one of the following:

* have `join_with_id` return `Option<(Id, Result<T, JoinError>)>`, which
  would be inconsistent with the return type of `join_one` (which we've
  [already bikeshedded over once][2]...)
* have `join_with_id` return `Result<Option<(Id, T)>, (Id, JoinError)>>`,
  which just feels gross.

I thought adding the task ID to `JoinError` was the nicest option, and
is potentially useful for other stuff as well, so it's probably a good API to
have anyway.

[1]: #4538 (comment)
[2]: #4335 (comment)

Closes #4538

Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw mentioned this pull request Apr 25, 2022
hawkw added a commit that referenced this pull request Apr 26, 2022
## Motivation

In many cases, it is desirable to spawn a set of tasks associated with
keys, with the ability to cancel them by key. As an example use case for
this sort of thing, see Tower's [`ReadyCache` type][1].

Now that PR #4530 adds a way of cancelling tasks in a
`tokio::task::JoinSet`, we can implement a map-like API based on the
same `IdleNotifiedSet` primitive.

## Solution

This PR adds an implementation of a `JoinMap` type to
`tokio_util::task`, using the `JoinSet` type from `tokio::task`, the
`AbortHandle` type added in #4530, and the new task IDs added in #4630.

Individual tasks can be aborted by key using the `JoinMap::abort`
method, and a set of tasks whose key match a given predicate can be
aborted using `JoinMap::abort_matching`.

When tasks complete, `JoinMap::join_one` returns their associated key
alongside the output from the spawned future, or the key and the
`JoinError` if the task did not complete successfully.

Overall, I think the way this works is pretty straightforward; much of
this PR is just API boilerplate to implement the union of applicable
APIs from `JoinSet` and `HashMap`. Unlike previous iterations on the
`JoinMap` API (e.g. #4538), this version is implemented entirely in
`tokio_util`, using only public APIs from the `tokio` crate. Currently,
the required `tokio` APIs are unstable, but implementing `JoinMap` in
`tokio-util` means we will never have to make stability commitments for
the `JoinMap` API itself.

[1]: https://github.com/tower-rs/tower/blob/master/tower/src/ready_cache/cache.rs

Signed-off-by: Eliza Weisman <[email protected]>
@Darksonn Darksonn deleted the eliza/join-map branch June 28, 2022 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-task Module: tokio/task R-loom Run loom tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants