Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yield closures #49

Closed
samsartor opened this issue Aug 28, 2020 · 8 comments
Closed

Yield closures #49

samsartor opened this issue Aug 28, 2020 · 8 comments
Labels
disposition-close The FCP starter wants to close this final-comment-period The FCP has started, most (if not all) team members are in agreement major-change Major change proposal T-lang to-announce Not yet announced MCP proposals

Comments

@samsartor
Copy link
Contributor

Proposal

Summary and problem statement

Rust has the ability to yield and resume function calls by transforming functions into a state machines. However, this ability is currently available to users in a very limited fashion (async blocks, functions) because of the complex design choices required in generalizing the capability. I believe that we have now found a very simple version of "stackless coroutines" which will resolve this.

In short, ordinary closures should be allowed to yield in addition to return. For example, to skip alternate elements of an iterator:

iter.filter(|_| {
    yield true;
    false
})

As expected, arguments can be moved by the closure at any point. If an argument is not moved prior to yield or return, it will be dropped. When the closures is resumed after either yield or return, all arguments are reassigned:

|x| {
    // <-- x gets (re)assigned
    let y = x;
    yield;
    // <-- x gets (re)assigned
    dbg!(x, y);
}

From the outside yield closures work the same as non-yield closures: they implement any applicable Fn* traits. Since a yield-closure must at least mutate a discriminant within the closure state, it would not implement Fn. Yield closures which require stack-pinning would additionally be !FnMut, instead implementing a new FnPin trait. Note that all FnMut + Unpin should also implement FnPin.

pub trait FnPin<Args>: FnOnce<Args> {
    extern "rust-call" fn call_pin(self: Pin<&mut Self>, args: Args) -> Self::Output;
}

Motivation, use-cases, and solution sketches

Yield closures would act as the fundamental "coroutine" in the Rust language which in-language sugars and user-defined macros could use to build futures, iterators, streams, sinks, etc. However, those abstractions should not be the focus of this proposal. Yield closures should be justified as a language feature based on its own merits. To that end, below are some example use-cases.

Since yield closures are simply functions, they can be used with existing combinators. Here a closure is used with a char iterator to decode string escapes:

escaped_text.chars().filter_map(|c| {
    if c != '\\' {
        // Not escaped
        return Some(c);
    }

    // Go past the \
    yield None;

    // Unescaped-char
    Some(match c {
        // Hexadecimal
        'x' => {
            yield None; // Go past the x
            let most = c.to_digit(16);
            yield None; // Go past the first digit
            let least = c.to_digit(16);
            // Yield the decoded char if valid
            char::from_u32(most? << 4 | least?)
        },
        // Simple escapes
        'n' => '\n',
        'r' => '\r',
        't' => '\t',
        '0' => '\0',
        '\\' => '\\',
        // Unnecessary escape
        _ => c,
    })
})

Here is a similar pushdown parser utility which assists in base64 decoding:

|sextet, output| {
    let a = sextet;
    yield;
    let b = sextet;
    output.push(a << 2 | b >> 4); // aaaaaabb
    yield;
    let c = sextet;
    output.push((b & 0b1111) << 4 | c >> 2); // bbbbcccc
    yield;
    output.push((c & 0b11) << 6 | sextet) // ccdddddd
}

Since yield closures are a very consise way of writing state-machines, they could be very useful to describe agent behavior in games and simulations:

|is_opponent_near, my_health| loop {
    // Find opponent
    while !is_opponent_near {
        yield Wander;
    }

    // Do battle!
    let mut min_health = my_health;
    while my_health > 1 && is_opponent_near {
        yield Attack;
        if my_health < min_health {
            min_health = my_health;
            yield Evade;
        }
    }

    // Recover
    if my_health < 5 {
        yield Heal;
    }
}

And of course, yield closures make it easy to write all kinds of async primatives which are difficult to describe with async/await. Here is a async reader → byte stream combinator:

pub fn read_to_stream(read: impl AsyncRead) -> impl TryStream<Item=u8> {
    stream::poll_fn(move mut |ctx: &mut Context| {
        let mut buffer = [0u8; 4096];
        pin_mut!(read);

        loop {
            let n = await_with!(AsyncRead::poll_read, read.as_mut(), ctx, &mut buffer)?;

            if n == 0 {
                return Ready(None);
            }

            for &byte in buffer.iter().take(n) {
                yield Ready(Some(Ok(byte)));
            }
        }
    })
}

Once closures

Some closures consume captured data and thus can not be restarted. Currently such closures avoid restart by exclusively implementing FnOnce. However, a FnOnce-only yield closure is useless even if unrestartable, since it still might be resumed an arbitrary number of times. Thankfully, there is a different way to prevent restart: a closure could enter a "poisoned" state after returning or panicking. This behavior is generally undesirable for non-yield closures but could be switched-on when needed. I recommend a mut modifier for this purpose since it is A. syntactically unambiguous and B. invokes the idea that a FnMut implementation is being requested:

fn clousure(only_one_copy: Foo) -> impl FnMut() {
    move mut || {
        yield;
        drop(only_one_copy);
    }
}

Alternatively, all yield closures could be poisoned by default and opt-out with loop:

|| loop {
    yield true;
    yield false;
}

Poisoned-by-default is closer to the current behavior of generators but breaks the consistency between yield and non-yield closures. I believe the better consistency of the mut modifier will make the behavior of yield dumber and less surprising. However, that trade-off should be discussed further.

GeneratorState-wrapping

A try { } block produces a Result by wrapping outputs in Ok/Err. An async { } block produces a Future by wrapping outputs in Pending/Ready. Similarly a iterator! { } block could produce an Iterator by wrapping outputs in Some/None and a stream! { } block could produce a Stream by wrapping outputs in Pending/Ready(Some)/Ready(None).

However, there is a common pattern here. Users often want to discriminate values output by yield (Pending, Some, etc) from values output by return (Ready, None, etc). Because of this, it may make sense to have all yield-closures automatically wrap values in a GeneratorState enum in the same way as the existing, unstable generator syntax.

Although this should be discussed, I believe that enum-wrapping is a separate concern better served by higher-level try/async/iterator/stream blocks.

Async closures

There is an open question regarding the behavior of async + yield closures. The obvious behavior of such a closure is to produce futures, in the same way that a non-yield async closure produces a future. However, the natural desugaring of async || { yield ... } into || async { yield ... } doesn't make a whole lot of sense (how should a Future yield anything other than Pending?) and it is not clear if an alternate desugar along the lines of || yield async { ... } is even possible.

For now I would recommend disallowing such closures since async closures are unstable anyway.

Prioritization

In addition to the general ergonomic wins for all kinds of tasks involving closures, a general way of accessing coroutines allows users a far less frustrating way to implement more complex futures and streams. It will also allow crates like async-stream and propane to implement useful syntax sugars for all kinds of generators or iterators, sinks, streams, etc.

Links and related work

The effort to generalize coroutines has been going on ever since the original coroutines eRFC. This solution is very closely related to Unified coroutines a.k.a. Generator resume arguments (RFC-2781). Further refinement of that proposal with the goal of fully unifying closures and generators can be found under a draft RFC authored by @CAD97.

Initial people involved

What happens now?

This issue is part of the experimental MCP process described in RFC 2936. Once this issue is filed, a Zulip topic will be opened for discussion, and the lang-team will review open MCPs in its weekly triage meetings. You should receive feedback within a week or two.

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

@samsartor samsartor added T-lang major-change Major change proposal labels Aug 28, 2020
@rustbot
Copy link
Collaborator

rustbot commented Aug 28, 2020

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

@rustbot rustbot added the to-announce Not yet announced MCP proposals label Aug 28, 2020
@programmerjake
Copy link
Member

I have a concern that changing the closure argument variables after each yield is very unintuitive (action at a distance) and resume arguments should instead be retrieved from the yield expression like so:

let my_generator = |arg1: i64| {
    // can get initial resume arg by instead starting
    // with something like (syntax up for bikeshedding):
    // |arg1: i64| initial_resume: &'static str = yield {
    //
    dbg!(&arg1);
    let yielded1 = 123;
    // key part of concern:
    // yields yielded1 to caller, gets resume_arg back,
    // does *NOT* modify arg1 since arg1 wasn't mentioned
    let resume_arg: &'static str = yield yielded1;
    dbg!(&arg1); // always prints same thing as previous dbg!
    123.45f64
};
// exact generator type up for debate
let _: Fn(i64) -> (FnPin(&'static str) -> YieldOrReturn<i32, f64>) = my_generator;

@samsartor
Copy link
Contributor Author

@programmerjake I get it. I was in the yield-expression camp for a long time too. But trust me, assign-on-yield really is the better option! I'll list a few reasons here but if anyone is unconvinced, they should drop by Zulip and duke it out with me there.

  1. yield is more like return than like anything else. In fact, under this proposal the only difference between the two is where the closure resumes: after the statement or at the beginning. return reassigns arguments so yield does too.

  2. No action at a distance! People are afraid of the "magic mutation" but Rust as a language is really good at dealing with unexpected mutation. Imagine some code:

|items| {
    for _ in items {
        yield;
    }
}

This will error out with something like:

error[E0506]: cannot pass new `items` because it is borrowed
 --> src/lib.rs:3:9
  |
2 |     for _ in &items {
  |              ------
  |              |
  |              borrow of `items` occurs here
  |              borrow later used here
3 |         yield;
  |         ^^^^^ assignment to borrowed `items` occurs here
  |
  = help: consider moving `items` into a new binding before borrowing

So even if a user totally misunderstands the behavior of yield, their data still can't change out from under them.

  1. Assign-on-yield handles the common case far better and is almost always more ergonomic. In my prototyping, it is rare that I want to save a resume argument from reassignment and use it later. Take a look at my examples above! In all of them there are only two places where I choose not to discard the previous resume argument: let mut min_health = my_health; and let most = c.to_digit(16);. And even in those places, a yield expression gets in the way more than anything else. let most = (yield None).to_digit(16) isn't great but the only way I can get the AI example to work at all is by pointlessly re-inventing assign-on-yield:
|mut is_opponent_near, mut my_health| loop {
    // Find opponent
    while !is_opponent_near {
        let state = yield Wander;
        is_opponent_near = state.0;
        my_health = state.1;
    }

    // Do battle!
    let mut min_health = my_health;
    ...

@nikomatsakis
Copy link
Contributor

Discussed in the rust-lang meeting.

There is some amount of excitement and enthusiasm for taking this approach eventually, however we feel like we don't presently have the design bandwidth to oversee a project of this kind, so we're going to put this into final comment period with disposition close.

We do expect at some point in the future to be talking about the ability to have built-in syntax for iterators or streams, and it would make sense to revisit this discussion at that time.

To that end, one thing that would really be appreciated is if someone wanted to try and capture the state of the design discussion here, including unknowns and challenges. We could put it under the lang-team.rust-lang.org website "design notes" section.

@nikomatsakis nikomatsakis added disposition-close The FCP starter wants to close this final-comment-period The FCP has started, most (if not all) team members are in agreement labels Aug 31, 2020
@nikomatsakis
Copy link
Contributor

Placing in final comment period and we'll revisit next triage meeting.

@samsartor
Copy link
Contributor Author

Sounds good! Thanks for hearing me out.

one thing that would really be appreciated is if someone wanted to try and capture the state of the design discussion here

I'll go ahead and put something comprehensive together in the next week when I have some free time.

@Mark-Simulacrum
Copy link
Member

#52 is up.

@joshtriplett
Copy link
Member

Closing this issue. People should review #52 asynchronously, and we can talk about it for a future roadmap item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-close The FCP starter wants to close this final-comment-period The FCP has started, most (if not all) team members are in agreement major-change Major change proposal T-lang to-announce Not yet announced MCP proposals
Projects
None yet
Development

No branches or pull requests

6 participants