Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: executor design for futures 0.2 #3

Merged
merged 5 commits into from
Feb 28, 2018

Conversation

aturon
Copy link
Contributor

@aturon aturon commented Feb 10, 2018

This RFC proposes a design for futures-executors, including both executor traits and built-in executors. In addition, it sets up a core expectation for all tasks that they are able to spawn additional tasks, while giving fine-grained control over what executor that spawning is routed to.

NOTE: this RFC assumes that RFC #2 is accepted.

Rendered

@aturon
Copy link
Contributor Author

aturon commented Feb 10, 2018

cc @carllerche

executors.md Outdated
where F: Future, S: Spawn;

// a future that resolves when *all* spawned futures have resolved
fn all_done(&self) -> impl Future<Item = (), Error = ()>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's a little bit confusing right now to figure out how to actually run all_done-- it almost looks like you'd need a separate LocalPool for it. The trick is that you can do pool.run_until(pool.all_done()). It might be nice to offer a run_until_all_done() method that offers this same functionality in a more visible way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this design might be a bit too clever for its own good.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably being pedantic, but technically, it should be:

let pool = LocalPool::new();
pool.spawn_local(...); // start some future
pool.run_until(pool.all_done(), pool); // this doesn't seem possible, since spawn is by value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedcharles I'd imagine there'd be an impl of Spawn for &LocalPool as @carllerche suggested above, which would let you do pool.run_until(pool.all_done(), &pool);. Still, that seems pretty unergonomic to me-- I think it'd be good to offer a version of run_until which uses &self as the spawner, so that you could write pool.run_until(my_fut); rather than pool.run_until(my_fut, &pool);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Makes sense. I'm fine with the current design, since it's easy to add a function which does the common thing later.

executors.md Outdated
where F: Future<Item = (), Error = ()> + Send + 'static;

// Run the given closure with a context using a new default executor
fn with_spawn<S, F, R>(&mut self, spawn: S, f: F) -> R

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why with_spawn is a fn directly on context?

My initial thought on the matter is that you would want to set the spawn at context creation time (perhaps a builder) in the executor.

If one wants to swap a spawn in the midst of a task being polled, would it be possible to do that with a free function? This would add a bit of API pepper and also remove noise from the Context API docs.

I also haven't seen in the RFCs how with_notify will be handled. I would guess it would be similar to whatever strategy with_spawn goes with.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why tasks and executors are decoupled, but I'm still not sure why spawning is something a task does.

I know it kinda seems silly but I feel the same way about this situation as I do about regex libraries which have match, search, etc as methods on a regex object, unlike C++, which has match, search, etc as free functions.

And I'm not sure why the comment says that it's using a default executor... isn't spawn the executor? The comment would imply that there are now two executors involved somehow, spawn and the default one.

It's also not obvious why the spawn function takes a future and with_spawn takes a function (which will inevitably call poll with the context on a captured future).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had envisioned with_notify and with_spawn working the same way. I personally think it makes sense to have them directly on Context since they provide core functionality for modifying Context, and having them available as methods makes them easily discoverable.

If one wants to swap a spawn in the midst of a task being polled, would it be possible to do that with a free function?

Sorry, can you give an example of what you mean? It should be possible to do something like this:

fn poll(...) -> ... {
    try_ready!(self.first_future.poll(ctx));
    ctx.with_spawn(self.my_spawn, |ctx| self.second_future.poll(ctx))
}

Is that what you had in mind?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slowly beginning to understand the overall design.

@cramertj You didn't address why the functions have different parameters, both of which are isomorphic (given the existence of poll_fn, and the lambda you wrote above).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedcharles Sorry, I'm not sure I understand-- I'd expect that with_spawn and with_notify would have the same parameters, e.g.:

fn poll(...) -> ... {
    try_ready!(self.first_future.poll(ctx));
    ctx.with_notify(self.my_custom_notify, |ctx| self.second_future.poll(ctx));
    ctx.with_spawn(self.my_custom_spawn, |ctx| self.third_future.poll(ctx))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context::spawn and Context::with_spawn have different parameters. I wasn't comparing it to with_notify.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedcharles To clarify-- spawn spawns a future onto the existing Spawn, while with_spawn runs a closure inside a new context with a new Spawn.

executors.md Outdated

```rust
impl LocalPool {
// runs the executor until `f` is resolved, spawning subtasks onto `spawn`
Copy link

@carllerche carllerche Feb 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't entirely follow this. Are you saying that subtasks are then spawned onto the Spawn instance provided by Context? If so, this seems good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the idea.

executors.md Outdated
fn all_done(&self) -> impl Future<Item = (), Error = ()>;

// spawns a possibly non-Send future, possible due to single-threaded execution.
fn spawn_local<F>(&self, F)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for a future currently being run on a LocalPool to be able to spawn !Send futures on the current LocalPool but futures that are send on the spawn set to context?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 'spawn set to context' mean? I know contexts have to know which executor is currently required to be notified, but that behavior isn't specified by this document. Should it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllerche Does pool.run_until(my_fut, &cpu_pool) look like what you had in mind? This would be possible with the run_until API specified above, although as I commented I think there should be a method which defaults to spawning onto the LocalPool itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how run_until can do what I think he's asking. Here's code (cause code makes the world better, or something):

struct MixedFuture<L, P> {
    local: L,
    pool: P,
}

impl<L, P, LT, PT, E> Future for MixedFuture<L, P>
where L: Future<Item = LT, Error = E>, P: Future<Item = PT, Error = E> + Send {
    type Item = (LT, PT);
    type Error = E;
    fn poll(&mut self, ctx: &mut task::Context) -> Poll<(LT, PT), E> {
        // What should this poll function do? And which spawn is associated with ctx when it's passed here?
    }
}

Copy link
Collaborator

@cramertj cramertj Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct SpawnBoth<L, P> {
    future_to_spawn_on_local_pool: Option<L>,
    future_to_spawn_on_ctx_pool: Option<P>,
    local_pool_ref: Option<Rc<LocalPool>>,
}

impl<L, P> Future for SpawnBoth<L, P>
    where L: Future, P: Future + Send
{
    type Item = ();
    type Error = ();
    fn poll(&mut self, ctx: &mut task::Context) -> Poll<(), E> {
        match (
            self.future_to_spawn_on_local_pool.take(), 
            self.future_to_spawn_on_ctx_pool.take(),
            self.local_pool_ref.take(),
        ) {
            (
                Some(future_to_spawn_on_local_pool),
                Some(future_to_spawn_on_ctx_pool),
                Some(local_pool_ref),
            ) => {
                local_pool_ref.spawn_local(future_to_spawn_on_local_pool);
                ctx.spawn(future_to_spawn_on_ctx_pool);
            }
            _ => { panic!("polled SpawnBoth after Completion"); }
        }
        Async::Ready(())
    }
}

@carllerche
Copy link

Over all, a very solid proposal. I am happy to see how this turned out. I like the decoupling between the generic Spawn being tracked on Context and the local pool.

This proposal makes me wonder about the global Tokio reactor being tracked with a thread local. If the ecosystem moves to an explicit context, then it seems sensible for the Tokio reactor handle to be tracked by the context object in a similar way as the Spawn handle.

If the strategy used by Spawn on context could be generalized, this would make enable Tokio and other futures based libs to behave similarly.

I thought briefly about it, and if it is assumed that extensions (like the Tokio reactor) define themselves when a Context is first created (i.e. when an executor is started), it should be plausible to allow extending Context with minimal overhead.

If this is something you are interested in pursuing, I can try to write up my thoughts in more detail. If it isn't, I'd be interested in hearing how you imagine something like the Tokio reactor providing a default in the presence of an explicit context argument.

@carllerche
Copy link

carllerche commented Feb 10, 2018

I'd also offer for consideration that this:

Add a core assumption that tasks are always able to spawn additional tasks, avoiding the need to reflect this at the API level.

isn't always true. At least in my prior experience, it is common to shutdown a thread pool by preventing any new tasks from being spawned but allowing existing ones to complete.

In this case, what would a Spawn implementation do if it is no longer permitted to spawn new tasks and how would the caller of spawn be aware that something went wrong during spawn?

edit: Another similar situation is a bounded thread pool.

executors.md Outdated
```rust
impl task::Context {
// Spawn onto the current default executor
fn spawn<F>(&self, F)
Copy link

@carllerche carllerche Feb 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: I commented on the wrong line.

However, I think that the same comment as above applies here. This should probably take &mut self since &mut Context is being passed around.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this function were not part of the RFC (it's functionality should be possible with the with_spawn function, right?). I think it would be a good idea to get experience with people having to pick a default before specifying one, especially since specifying one creates friction when it needs to be changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedcharles A major benefit of this RFC is to provide the guarantee that all std-only Future implementations have the ability to spawn subtasks. This removes the current separation that exists between functions that take an Executor handle and those that do not. When a deeply nested future wants to spawn tasks as part of its internal implementation, it currently needs to have an Executor handle threaded down through arguments. This means that whether a Future spawns or not is "leaked" all throughout its call stack, which makes it hard to change down the line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so now I'm confused about what default means. What you're suggesting is that this function spawns with the 'current' executor (which I'd be fine with), but not with the 'default' executor.

'current' meaning, the executor which is currently associated with context.
'default' meaning, the executor which is some form of global default.

I'm against a function using a global default. I'm fine with a function using whatever executor is currently stored inside context, which by definition, can't be forced into a single global default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedcharles ctx.spawn(fut) will spawn on the Spawn currently associated with the context. my_local_pool.spawn_local(fut) will spawn onto the local pool. There is no global executor in this proposal.

executors.md Outdated

```rust
trait Spawn {
fn spawn(&self, f: Box<Future<Item = (), Error = ()> + Send>);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this isn't &mut self? I've been finding that using &self ended up hitting problems in a number of cases. Also, since this is stored on Context, you have mutable access anyway.

Spawn instances that do require &self can always impl Spawn for &Self, like TcpStream and co. w/ Read.

@ahmedcharles
Copy link

My (perhaps naive) initial model for executors was: Executor -> Task -> Future. However that implies that tasks don't get to move from executor to executor and that's desirable.

Given that, I can see why the task context has spawn functionality. However, I don't see how the original context is created. Could that be added to the RFC?

executors.md Outdated
fn all_done(&self) -> impl Future<Item = (), Error = ()>;

// spawns a possibly non-Send future, possible due to single-threaded execution.
fn spawn_local<F>(&self, F)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the Spawn trait for LocalPool is as easy as:

impl Spawn for LocalPool {
    fn spawn(&self, f: Box<Future<Item = (), Error = ()> + Send>) {
        self.spawn_local(f);
    }
}

I.e. spawn_local is a superset of the requirements of Spawn::spawn.

@cramertj
Copy link
Collaborator

@carllerche

At least in my prior experience, it is common to shutdown a thread pool by preventing any new tasks from being spawned but allowing existing ones to complete.

Would it be enough to say that they can spawn those new tasks, but that those tasks might not necessarily be allowed to complete? For this behavior, you could do something like pool.run_until(pool.all_done(), short_lived_pool). This would run all tasks on pool until completion, but new tasks would be spawned onto the short_lived_pool which would not be completed.

@carllerche
Copy link

@cramertj I'm not following the proposal. Before we deal with specifics, perhaps we can talk at a high level w/o code.

As mentioned above, I believe it is common for executors to enter states that prevent them from accepting a spawned task. In those cases, what should happen?

  • Panic
  • Err returned
  • Task is silently dropped.

Two cases are an executor in the process of shutting down, and an executor that has reached a pre-configured capacity.

@cramertj
Copy link
Collaborator

cramertj commented Feb 11, 2018

@carllerche

Two cases are an executor in the process of shutting down, and an executor that has reached a pre-configured capacity.

The first case here seems dubious to me: if an executor in the process of shutting down is trying to close out the remaining tasks, it may be the case that the remaining tasks need to be able to spawn new tasks in order to complete themselves-- how would you handle this situation? It seems like you would need a way to signal to the existing tasks that they are in entering a "shutdown state", in which new tasks should not be spawned. If you have that ability, it seems fine to allow them to start new tasks, as long as the tasks don't actually do so.

I'm interested in what use-cases you're imagining for having a pre-configured task capacity. I don't see a way to make this work-- if your existing tasks need to spawn new tasks in order to complete, then they will deadlock.

Perhaps you're imagining an invariant like "no task shall require spawning new tasks in order to complete"? This seems appealing since it would guide you towards fn spawn(...) -> Result<(), CantSpawn>, and tasks could error or shutdown on a failed spawn. Still, I think this is an overly burdensome requirement. I believe that, much like starting a thread, spawning a task should be an assumed capability of all tasks. Without this guarantee, libraries and their users would have to concern themselves with the details of what routines required spawning, and which did not. This make spawning a "leaky" behavior and discourages its use.

Overall, I feel that it's reasonable for all std-only Futures to assume the ability to spawn. Do you disagree? If not, do you have specific executors or platforms in mind which would support spawning, but which would need to enforce a fixed-size pool of tasks?

executors.md Outdated
fn new() -> ThreadPool;

// starts up the worker threads
fn run();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intended way of using ThreadPool, the following:

let f = ...; // some arbitrarily complex future.
let tp = ThreadPool::new();
tp.run();
let lp = LocalPool::new();
lp.run_until(f, tp);

Currently, there's no way to run stuff on the thread pool without blocking the current thread. And run seems to make the situation worse, since forgetting to call it here will result in run_until never returning as long as something tries to execute another task using spawn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that run is necessary-- it seems like you could have ThreadPool configured to start up when the first task is spawned.

Copy link
Collaborator

@cramertj cramertj Feb 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT running on a LocalPool, I think that unless you need to spawn !Sync tasks, you shouldn't need to create a LocalPool to wait on the result of a Future. I had imagined modifying wait to take a Spawn so that you could do something like this:

let future  ...; // some arbitrarily complex future.
let thread_pool = ThreadPool::new();
let res = future.wait(&thread_pool);

This would run future on the current thread, but any spawned tasks would go onto the thread_pool.

If, on the other hand, you need to spawn !Sync tasks, you could do it like this:

let local_pool = LocalPool::new();
let future = async || {
    ...
    local_pool.spawn_local(sub_future);
    ...
};

let thread_pool = ThreadPool::new();
let res = local_pool.run_until(future, &thread_pool);

This would spawn sub_future onto the LocalPool, but any normally-spawned tasks would go onto thread_pool.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that code look like without using async?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And about Future::wait, that's just a convenience function that does what I suggested. Hence it doesn't really answer the underlying question, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedcharles

What does that code look like without using async?

let local_pool = LocalPool::new();
let future = create_some_future_that_holds_a_reference_to_local_pool_and_uses_it_to_spawn{&local_pool);

let thread_pool = ThreadPool::new();
let res = local_pool.run_until(future, &thread_pool);

And about Future::wait, that's just a convenience function that does what I suggested. Hence it doesn't really answer the underlying question, right?

Your original question was:

Currently, there's no way to run stuff on the thread pool without blocking the current thread. And run seems to make the situation worse, since forgetting to call it here will result in run_until never returning as long as something tries to execute another task using spawn.

How do you want to receive the result of your data onto the main thread without blocking it? Do you want to poll for the data? Do you want to receive some kind of thread wakeup? All of these can be accomplished via normal thread-communication mechanisms (channels, mutexes, condition variables, etc).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the day, all programs have to block (for user input) or exit, so in a sense, you're right. There are nuances however.

The C++ Networking TS, for example has 4 groups of functions that progress the executor:

run(timeout) - blocking, runs until no work remains or the timeout expires. (this seems to be what we currently have, without the timeout)
run_one(timeout) - blocking, runs until at most one handler has completed or the timeout expires.
poll - non-blocking, runs all ready handlers and then returns.
poll_one - non-blocking, runs at most one ready handler and returns.

This is enough flexibility that it's possible to mix these options along with a GUI that isn't future aware without much inefficiency. The alternative, which works either way, is to use a future which understands how to bridge the gap between the non-future aware GUI and the rest of the system. With poll/poll_one, the interaction could be as simple as calling poll when the GUI is idle every hundred milliseconds or so. With run(timeout) you could block the GUI thread for at most a hundred milliseconds or so before starting to respond to input again. With run_one (no timeout), you could have a future which returns when a GUI input is received.

With only run, it's all futures, all the time, for everything, which doesn't seem very incremental and hence would be harder to adopt for sufficiently complex applications. Or is there something I'm missing?

@alkis
Copy link

alkis commented Feb 11, 2018

How can I build the following future with this design (without using channels)?

  1. start some IO on thread A
  2. on IO readyness, run on A and update some data structure without synchronization
  3. immediately spawn N cpu heavy futures in a common thread pool (perhaps the default)
  4. when all N futures are done, resume on A and update the data structure without synchronization
  5. immediately spawn a future on a specific thread (the UI thread for example)

Bonus points on being able to do the above with steps 2 and 4 able to run on any thread (on which ever thread they happened to be woken up into), with the guarantee to have their code not running concurrently on multiple threads. Basically this means 4 can run on the same thread the last of the N futures finished as long as there is no 2 or 4 running on any thread - effectively saving a context switch on each transition.

@cramertj
Copy link
Collaborator

@alkis This proposal is specifically a way of handling task-spawning onto a shared executor. Is there a specific concern or thought you have about the design being proposed here?

WRT your question-- in step 4, how do you want A to wake up once N futures have completed without using channels or synchronization primitives? You'll need some sort of AtomicTask that A will register on so that the last of the N channels completing can wake up A. The rest of the operations seem like they can all be run on the main thread in serial order.

@ahmedcharles
Copy link

One question I have is, if I have a GUI app with a 'main' thread and a thread pool, how do I execute something on the thread pool without blocking the main thread? (I suppose this boils down to whether the Spawn::spawn trait method blocks or not, which isn't documented.)

@ahmedcharles
Copy link

Concerning executor shutdown, I decided to look at prior art (ASIO/C++ Networking TS).

The only requirement when spawning a task (ASIO has dispatch/post/defer options) is that allocation succeeds. Generally speaking, allocation failures just panic and aren't even documented in Rust. So, I don't see a reason to require that spawning be prevented.

That said, ASIO supports stopping all currently running tasks. Note, this doesn't preempt them, it just doesn't start running any new tasks. The tasks remain in the queues and the entire system can be restarted later. Alternatively, the destructor can run, which will clean up every unfinished task.

This combination seems to be supported by prior art/experience and solves the issue of spawning never explicitly failing (outside of memory exhaustion) while having support for things stopping efficiently.

The capacity problem seems like a hard one to solve, in general, but from an interface perspective, it seems easier. There's no guarantee that an executor make progress on tasks at a specified rate, so whether or not new tasks are spawned should be irrelevant to whether they get executed. The executor is always allowed to not execute them and delay that as much as required. Never scheduling a task may result in no one using the executor, but it certainly wouldn't break the contract.

@alkis
Copy link

alkis commented Feb 11, 2018

@cramertj Let me try to describe what I want to see from Futures+Tokio in Rust and perhaps this will answer why I think the API proposed here is insufficient.

Futures+Tokio should be the way to build scalable applications for Rust. A good overview on how do build scalable applications is found on Dmitry Vyukov's website. Futures/Tokio aside, the whole site is a highly recommended read. My premise is that it should be natural to build applications that will fall into that design through the APIs provided. I think it is understood that today this is not the case.

Dmitry presents the General Recipe of scalable architecture. This can be summarized to the following:

  1. threads are the wrong abstraction, they should be one level below application logic (this already hints to why this API is perhaps not what we want). This also means that we should not have threads doing a specific thing, like threads for I/O or threads for timers.
  2. we need some distribution/balancing mechanisms. The generally accepted solution today is work stealing.
  3. minimal use of mutexes.
  4. minimal use of mutable shared state.

What do we have so far?

  • Futures: they describe fragments of computation. They also describe asynchronous operations, where they signal they can't make progress and get parked until they are again ready to compute. Futures can be composed into DAGs of computation to make larger/more complex Futures.

What do we need?

  • An executor which drives futures and their children in parallel and schedules them in way to utilize the maximum parallelism out of the system. Let us assume the generally accepted solution of a work-stealing thread pool, with LIFO scheduling. The number of threads across the whole application should be constant otherwise we either over or under subscribe, causing loss of efficiency or loss of utilization, respectively (see General Recipe). The current proposal promotes this by exposing the thread pool and also giving APIs to drive an Executor to completion (executor should complete when an application is complete).
  • Some way to minimize uses of mutexes and mutable shared state.
  • To make it a bit more complex, we also need a way to say a Future can only run on a specific thread. This is useful when working with frameworks that have hard requirements on where things run: GUI frameworks and their UI thread are a prime example.

How can we achieve the above?

  • provide a common work stealing pool as described above
  • instead of controlling where we are going to spawn, control how futures are scheduled by defining constraints:
    • unconstrained: can run anywhere (by default on common thread pool)
    • singlethread: can run anywhere but only one at a time can run. If another
    • pinned: can run on a specific thread only (UI)

The above solves 3 + 4 above as well. When we want to share mutable data, we spawn futures that mutate said data with a singlethreaded constraint. This way we know that future is going to run on a single thread only. In addition, when other futures try to run with the same constrain and find some other thread is running something with it, we can park this future into the thread that is currently mutating the data. This way when the parked future is unparked, it will be on the same thread avoiding cache migrations that would otherwise need to happen between the threads.

How do we express these constraints?

Spawn can stay as is but the executors should go. Spawn will provide constraints as described above.

@ahmedcharles
Copy link

ahmedcharles commented Feb 12, 2018

I noticed that rayon provides a thread pool. Does the thread pool proposed here result in applications using both having two thread pools which conflict or will there be some sort of bridge so that this isn't the case?

The one thing that this brings to mind is that the executor abstraction doesn't have anything to do with futures. Executors could be implemented to run arbitrary tasks with notifications. Then futures and rayon could be implemented on top. (And on systems (like Windows and Mac), you would want the thread pool to use the primary pool on that platform rather than creating another one, which would decrease the chances of oversubscribing the system's resources.)

@aturon
Copy link
Contributor Author

aturon commented Feb 12, 2018

@carllerche Thanks much for the feedback! I'll wait until you've had a chance to reply to @cramertj's questions before talking more about shutdown.

In the meantime, I am interested in exploring how the "pattern" of this RFC could be applied to other things like the Tokio global event loop.

This approach is essentially task-local storage (which will also live on the Context), with two differences:

  • Rather than providing a universal "default" value, executors are required to provide their own value when creating a Context.
  • The data is stored directly in the Context, rather than via a typemap. (I don't think this is a significant point).

So I think the main thing is the first bullet-- but that's also one that would only make sense for truly universal assumptions that we want to provide. I don't see a way to meaningfully do it with event loops.

But I'm also wondering if that first bullet is all that important even for executors. An alternative would be to use pure task-local storage, with a global default of a generic thread pool. Executors would retain the ability to optionally set up the Context with a different default if they desire. This would also allow us to do everything within the futures-executor crate, rather than baking it in to the task system at a core level -- which means we'd retain more flexibility to try different approaches over time.

wdyt?

@carllerche
Copy link

@alkis

Let us assume the generally accepted solution of a work-stealing thread pool, with LIFO scheduling

Work-stealing yes, but definitely not LIFO for multiplexed computations. LIFO works where a single work stealing pool is being used to obtain a single end computation, i.e. the rayon use case.

In the Tokio case, you are multiplexing many unrelated computations. LIFO would result in significant unfairness, potentially even critical tasks being 100% starved and never executing.

@carllerche
Copy link

@alkis I also do not follow how "Spawn can stay as is but the executors should go" follows from your comment.

Spawn is the only trait being proposed in the futures-core.

@alkis
Copy link

alkis commented Feb 12, 2018

@carllerche It is likely that I misunderstand the proposal and more specifically the API and implementation of LocalPool/ThreadPool. It seems that LocalPool is targetting the non-shared state case I am talking about above. How does it solve the singlethread and pinned scheduling constraints? It seems that it can only support pinning. Also how do I get a LocalPool from task::Context?

@carllerche
Copy link

@alkis

Also how do I get a LocalPool from task::Context?

You don't. LocalPool is provided as an optional utility that you can use in your app. Basically all the points you are mentioning can be explored / solved out of crate. The only fundamental thing proposed here is to say that, given a Context, you can spawn futures that are Send.

@carllerche
Copy link

@cramertj Given that spawn is located on Context, I don't see a proposed way to spawn from a combinator only context, i.e. the following:

my_listener.incoming().for_each(|socket| {
    spawn(process(socket));
    Ok(())
})

Of course, one option would be for the spawn function to return a future that completes when the task is spawned... that would let you get rid of the Ok(()).

fn spawn<F: Future<...>>(f: F) -> impl Future<Item = (), Error = ()> {
    // ....
}

my_listener.incoming().for_each(|socket| {
    spawn(process(socket))
})

@cramertj
Copy link
Collaborator

@carllerche Yes, the proposed change would require that spawn in combinators return a Future so that it would have access to the ctx. This comes at a slight loss of ergonomics, but I personally think the tradeoff is worth it-- the ergonomics are no worse than using any other combinator-style function that returns a Future. What do you think?

@alkis
Copy link

alkis commented Feb 12, 2018

@carllerche Thanks for clarifying. Is it fair to say that:

  • futures-core gets Spawn which we believe is the API we want
  • futures-executors gets ThreadPool and LocalPool which should be treated like stubs because it is not clear they are the right tools for the job

@aturon
Copy link
Contributor Author

aturon commented Feb 12, 2018

@carllerche So to clarify: are you thinking that all executor errors should be treated as "catastrophic", i.e. as saying that the executor will never accept tasks again?

@carllerche
Copy link

@aturon No, definitely not.

At a high level, I see an executor failing to spawn due to two potential scenarios.

  1. No capacity -- this is a transient error.
  2. Shutdown -- This is permanent (well, unless the executor can start back up again!).

I was saying that, if executors are modeled as Sink, then "no capacity" results in poll_ready() -> Async::Pending and shutdown -> poll_ready() -> Err.

In that specific case, poll_ready returning an error is "catastrophic" and the executor will never spawn again.

@aturon
Copy link
Contributor Author

aturon commented Feb 12, 2018

@carllerche Right, that's how I was seeing things too. But I'm trying to square this with the idea that poll_ready is problematic due to notification queuing. The only way I can make sense of this is if we deem it OK to have a transient error with no notification mechanism.

@carllerche
Copy link

IMO transient errors with no notification is fine in this situation. Ideally one never hits it due to more controllable limits. If this error is hit, it is a load shedding situation where the caller terminates work as it can and moves on.

@aturon
Copy link
Contributor Author

aturon commented Feb 13, 2018

@carllerche in that case, if we did use Sink, we'd want to do so with an error variant that allows us to signal a transient error for poll_ready.

@tailhook's point about race conditions does hold, though in that case we'd just get the error at the start_send point.

@cramertj
Copy link
Collaborator

cramertj commented Feb 13, 2018

"No capacity" errors can also be permanent, too, because they can result in deadlock.That is why I was arguing above that failure to spawn should be considered a failure of the task currently underway. Otherwise, there's no guarantee that progress can ever be made.

@carllerche
Copy link

I'm personally inclined to not have executors be modeled by Sink for the reasons described above.

On a bike shedding level, I would suggest naming the trait Executor as this has a lot of prior art and people are generally familiar with the concept. The trait fn can still be spawn:

trait Executor {
    fn spawn<T: Future<...>>(&mut self, future: T) -> Result<(), SpawnError>;
}

This would also free up Spawn to be used as the free fn return value:

fn spawn<T: Future<...>>(future: T) -> Spawn<T> {
    // ...
}

@ahmedcharles
Copy link

I suppose my position on spawn failure is that it seems that the only reason you want to do this is because Linux doesn't let you bound memory in a graceful way. On a system (like windows) which does that, running out of memory will result in a panic. So, shouldn't all resource exhaustion resulting in a panic be fine?

The fact that Linux never fails to allocate seems to give people a license to pretend allocations never fail and therefore, Rust panics or aborts on memory allocation failure. But now, that same root cause means that we're considering making an API change, where the only failure mode is memory allocation (I suppose you could run out of other resources but none of those are required to simply spawn a task) and instead of going with tried and true panic, we want something else?

@carllerche
Copy link

So, shouldn't all resource exhaustion resulting in a panic be fine

A panic is not graceful handling of the failure in my situations (resilient networking services).

where the only failure mode is memory allocation

If the process swaps, this is probably even worse than a panic as it will delay the process getting restarted.

@ahmedcharles
Copy link

I suppose you're missing my point. Platforms exist where one can guarantee memory doesn't swap and that if physical memory isn't available, that allocation will fail. Platforms even exist that have API's that tell you when there's memory pressure so that you can release memory used for caching back to the OS so other applications can run. But memory allocation in Rust (using your words) doesn't fail gracefully, because it uses panic. If panic is fine for platforms with deterministic memory subsystems, then why is it not good enough for a bounded queue trying to work around a nondeterministic memory subsystem?

@ahmedcharles
Copy link

And for the record, my point is not that spawn should not return Result. My point is that allocation failure and failures which result from allocation failure should be treated the same way and consistently. I'd prefer a world where every potential allocation failure returned Result, but that isn't this world. I don't see why spawn is special.

@aturon
Copy link
Contributor Author

aturon commented Feb 13, 2018

@carllerche

re: Sink, I agree that if we don't want transient failures to involve notification, Sink doesn't make sense -- it's not buying you anything in that world.

And I think I buy the argument around notification too. The point, AIUI, is that reaching capacity on an executor is a pretty extreme situation representing system distress, and at that point you want to shed load as quickly as possible, so there's really no use in queuing up notifications (not to mention the extra cost that entails).

My main remaining concern is just that this use of Result will wind up a bit like lock poisoning: where the vast majority of people do an unwrap or equivalent, so it's effectively a tax on everyone that only people writing very specific kinds of services benefit from. I think that's probably acceptable, but an alternative would be to have the people who care about spawning failure use panics and catch_unwind to handle it.

@alkis
Copy link

alkis commented Feb 13, 2018

@alkis Can you elaborate on LocalPool not being the right tool for the job? It doesn't seem like there is an other possible implementation.

In order to implement singlethreaded I need different LocalPool per shared state (much like a mutex).
In order to implement pinned I need a LocalPool for a specific system thread.

LocalPool seems to fit only the specific usecase of spawning children Future on the current thread and joining on them.

On a bike shedding level, I would suggest naming the trait Executor as this has a lot of prior art and people are generally familiar with the concept. The trait fn can still be spawn:

Dispatcher and dispatch are also good candidates with similar prior art.

@carllerche
Copy link

@aturon

The point, AIUI, is that reaching capacity on an executor is a pretty extreme situation representing system distress, and at that point you want to shed load as quickly as possible, so there's really no use in queuing up notifications (not to mention the extra cost that entails).

Correct. The goal is to allow libs to shed load as quickly as possible as gracefully as possible (i.e. not random failure, swap, etc...)

My main remaining concern is just that this use of Result will wind up a bit like lock poisoning: where the vast majority of people do an unwrap or equivalent, so it's effectively a tax on everyone that only people writing very specific kinds of services benefit from.

I empathize with this concern, and I am not 100% confident that adding a Result is the right trade off (though I am leaning heavily that way personally). However, I do not think the analogy to lock poisoning is correct.

With Mutex, lock poisoning is not fundamentally part of mutex and isn't strictly required for using a mutex. It was originally thought that it would be a nice feature add. However, if lock poisoning had not been added, it would have been possible to layer it after the fact on top of Mutex when desired: Mutex<PoisonOnPanic<T>>. So, the poison capability could have been opted in. (Incidentally, while I always just unwrap, I think it is the right thing for me to do. I think in most cases, propagating the panic is correct, but that is for another discussion).

In our case, we are discussing a fundamental abstraction: spawning tasks. There will be a number of implementations of the trait with different behaviors. Some implementations will fail to spawn under certain conditions. This is a fact. The question now is, how should this failure be exposed to the end user.

So there are two questions.

  1. How much should the potential failure of spawning be exposed to the end user?
  2. What should the default handling of a failed spawn be?

My initial thought re: 2) was to silently drop and log the failure. Then, I considered that this would result in very unexpected behavior, so perhaps the better strategy would be to panic on spawn failure. This matches with Erlang's default to crash when an unhandled error happens. A panic on runtime error can only be sanely accomplished if executors can isolate the panic to that single task. I haven't thought this through yet.

Re 1), again I'm not sure what the right spot for this is. If a good default failure handling strategy is picked and we assume that a spawn failure is an exceptional circumstance, then the Result return value of spawn can be tucked away out of sight to most users while still being available to callers that wish to handle a spawn failure differently.

Here is one way to achieve this:

impl Context {
    pub fn spawn<T: Future<...>>(&mut self, future: T) {
        self.executor().spawn(Box::new(future)).unwrap();
    }

    pub fn executor(&mut self) -> &mut Executor { ... }
}

trait Executor {
    /// Provides a best effort **hint** to whether or not `spawn` will succeed.
    ///
    /// This allows a caller to avoid creating the task if the call to `spawn` will fail. This is
    /// similar to `Sink::poll_ready`, but does not provide any notification when the state changes
    /// nor does it provide a **guarantee** of what `spawn` will do.
    fn status(&self) -> Result<(), SpawnError> {
        Ok(())
    }

    fn spawn<T: Future<...>>(&mut self, future: T) -> Result<(), SpawnError>;
}

So, this makes Context::spawn not return Result and unwraps internally, but if a caller wishes to handle failure differently, they can call spawn on executor() directly.

Another advantage here is that it allows a caller to get a &mut reference to the executor directly, so additional capability can be provided and it allows attempting to downcast to a concrete executor.

I added status as an illustration of what additional capability could be. The Executor trait is not going to be super common to implement. Whether or not status is included is pretty minor. I doubt many people would actually use it, but 🤷‍♀️

@aturon
Copy link
Contributor Author

aturon commented Feb 13, 2018

Thanks @carllerche, this looks great, and I had been thinking along similar lines (including the best-effort status piece). I'll update the RFC.

@aturon
Copy link
Contributor Author

aturon commented Feb 13, 2018

I've pushed an update incorporating @carllerche's latest suggestion. I think the remaining surface-level API questions are probably best settled via implementation and usage.

Copy link

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM.

I added inline comments where updates were probably omitted, but the plan looks good to me.

executors.md Outdated

The design in this RFC has the following goals:

- Add a core assumption that tasks are *always* able to spawn additional tasks,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that spawn returns Result, this probably should be reworded.

executors.md Outdated

# Proposed design

## The core executor abstraction: `Spawn`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn is now Executor.

executors.md Outdated
impl task::Context {
// A convenience for spawning onto the current default executor,
// **panicking** if the executor fails to spawn
fn spawn<F>(&self, F) -> Result<(), SpawnError>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably intended to remove the Result here.

executors.md Outdated

```rust
struct ThreadPool { ... }
impl Spawn for ThreadPool { ... }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn -> Executor.

executors.md Outdated
```rust
// Note: not `Send` or `Sync`
struct LocalPool { ... }
impl Spawn for LocalPool { .. }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn -> Executor

executors.md Outdated
especially with borrowng + async/await, most spawning should go through the
default executor, which will usually be a thread pool.

Finally, the `Spawn` trait is more restrictive than the futures 0.1 executor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn -> Executor.

@aturon
Copy link
Contributor Author

aturon commented Feb 13, 2018

@cramertj wdyt?

@cramertj
Copy link
Collaborator

@aturon I think it looks great!

@aturon
Copy link
Contributor Author

aturon commented Feb 14, 2018

Initial PR is here, which includes some API changes. I'll update the RFC to match tomorrow.

@crlf0710
Copy link

ping @aturon, any news?

@aturon
Copy link
Contributor Author

aturon commented Feb 28, 2018

Oy, I totally neglected to update this RFC with the APIs that actually landed for 0.2, and which have subsequently been tweaked a bit further.

However, the APIs are all very much in the spirit of this RFC, with just minor changes, so I'm going to merge as-is for expediency.

You can see the latest code on the 0.2 branch, which is currently at Release Candidate stage (announcement coming soon)!

@aturon aturon merged commit a3ca882 into rust-lang-nursery:master Feb 28, 2018
@aturon aturon added the 0.2 label Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants