-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A yield
construct in the vein of C#
#388
Comments
I believe |
@errordeveloper: It's not implemented with a compile-time state machine in Python and Ruby. C# in the title is definitely appropriate, because (AFAIK) there isn't another language with a comparable feature. The proposal is not about generators / coroutines built on context switches. |
@thestinger sure, I didn't fully realise that. |
A decently in depth discussion of |
@sfackler great find! |
If Am I understanding correct that someone needs to make a pull-request with an actual RFC write-up, and if so, has anyone here already started on that or at least intending to do so? |
@errordeveloper I have started thinking about the implications of a 'yield' construct for borrow checking, which is something that past proposals have sort of glossed over. I haven't been too anxious to write up an RFC, because there is no chance that it will be considered before 1.0. |
Is the fact that it's implemented as a compile-time state machine significant? The usage pattern to the developer is the same as in any other language, albeit with explicit typing. |
It is, as it makes this feature most useful. State machines generated at On Wed, 19 Nov 2014 20:46 Lucretiel [email protected] wrote:
|
That is an interesting bunch of libraries there! It looks like your monad macro has bind as a primitive like Haskell's. I wonder if performance could be improved by using map and join alone as primitives to remove some "administrative lambdas". Either way (as pipes will contain |
Here the latest Kohlhoff paper "Resumable Expressions", a new lower-level c++ alternative to await that allows implementing yield at the library level and combines stackless and stackfull coroutines: |
I can't claim to be much of a fan of this proposal. He completely ignores
what I think is the major advantage of the await/async (in python: yield
from/coroutine) model: explicit control flow shifts. One of the major side
benefits to single-threaded asynchronous design, as opposed to
multithreaded synchronous, is that the control flow is only ever in one
place at once. This means that a lot of the state-sync and atomicity issues
in multithreaded design can be avoided. Compounding this advantage is the
fact that, with await/yield from, all context shifts are explicit. In
effect this means the developer can guarantee that between this await and
that one, no other coroutine will touch any shared state.
As an example, I used this to my advantage when writing an IRC-like chat
room using python asyncio, where I had the list of logged in users as a
dict mapping user ids to connection objects. Because I could guarantee
exclusive access to this object in a given coroutine' between yield froms,
I didn't have to worry about locking it, creating a local copy of it, or
anything like that.
Unfortunately, in the proposed resumable model, this advantage is lost. Any
function could conceivably result in a context switch. The situation is
slightly improved by the fact that such functions are explicitly tagged
"resumable", but because merely calling the function can cause it to begin
running or suspending, there's no way to separate the act of instantiating
the function with that of stepping through it.
|
Worth noting: Python's PEP process just accepted an async/await PEP: https://www.python.org/dev/peps/pep-0492/ |
PEP 492 is pretty cool, I'm excited to try the next pre-release of Python 3.5. In addition to generators and async functions in Javascript, there's also some work being done on async generators as well: https://github.com/jhusain/asyncgenerator. The author gave a talk about some related concepts in this talk, though the syntax is hidden on the last slide and not really explained: https://www.youtube.com/watch?v=gawmdhCNy-A. |
@Lucretiel i find that being able to reuse existing functions_as is_ is a huge advantage of the "resumable expressions" proposal. The other proposals split the world and that is in my opinion a very bad thing. |
The trouble is that you then lose what I think is one of the primary benefits of async programming: explicit context switch points. When literally any function can cause a context switch, you're basically not any better than mutlithreading- constantly having to use locks and other primitives to ensure all your state is safe. God help you if you get it wrong and have to try to debug it. With explict async, you know that no other coroutine will execute between one yield and the next, so you're safe to do whatever needs to be done, and be guaranteed that it will appear atomic to other coroutines. I'll grant that being able to reuse everything has some surface appeal, but I think it's a problem waiting to happen. The reality is that concurrency requires a fundamentally different model of thinking- look at all the classic multithreading problems to see why- and explicit async reflects that model. Just my two cents, though. It could be that rust, with its emphasis on static memory safety, doesn't need to worry as much as, say, C++. |
Just to provide a counterpoint, introducing async/await keywords is (IMO) a growing language smell. It means that you're effectively partitioning all code into two competing classes (blocking/async), and that really causes some significant problems for large codebases. Please see http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ for more. |
Yes, that basically splits the world. On Thu, Sep 17, 2015 at 7:17 AM, JT Olds [email protected] wrote:
|
I completely understand your point of view, having debated it a great deal myself, but I have to disagree. It is a totally different model with different semantics, and you cannot reasonably pretend to do async work synchronously and have be something that a mediocre programmer, such as myself, can reason about. While a simple function is just a function, an async function isn't a function at all. The flow is like a function, but the The fact that the later we have explicit async and await the more we will have libraries that end up with two versions of different functions, one synchronous and one asynchronous, is an indication that it's important to solve this sooner rather than later, so that library authors can have the tools they need to make that decision sooner rather than later. Then maybe they can choose to have only one version, whichever is appropriate for their work. Because of Rust's memory safety guarantees, it's easy to conclude that we don't need explicit suspension points, and that Rust's ownership rules will sort out this mess, as it does for memory access. But there are plenty of race conditions that aren't memory-access data races, and programmers need as much help with those as they do with memory races. So far, I haven't seen how Rust's ownership could make that part of explicit async programming unnecessary. If somebody can solve that, then I'd probably consider further whether explicit |
This seems like a surprising position to take. This doesn't strike me as a very good reason, but more akin to rationalization. As long as there's other threads running, you definitely don't have explicit suspension points. Your argument against yourself here is a good one. Rust is designed to make explicit suspension points unnecessary, which is a good thing. Single-threaded event loops don't make good use of multi-core parallelism.
Again, a surprising position to take. If the language handles async IO for you, then every library always works for you, and there's no bifurcation. Imagine the following scenario: you're writing some small app at low scale. It's a killer app, it does well, and you need to scale it up. Bam, you need to switch to an async IO model (as indicated by https://news.ycombinator.com/item?id=10232131). Well shoot, that sucks. Okay, so you start porting everything, but you start hitting the long-tail of libraries. There's only one Rust UPNP library perhaps, and its author didn't need to scale like you do, and it blocks. Great, so now you either get to write a new library or make a threadpool. But it's not just the UPNP library, there's 20 other little libraries like this. You switch to the async version of some database library and everything starts looking good. A few weeks later you notice your app just keeps blocking up and no longer responding. Turns out one small error case in the database library uses the blocking version of the library and now your entire event loop is hosed. This entire problem could be avoided completely. |
Implementation-wise, it is not hard to convert an This would give people the benefits of Go-like concurrency, without taking away the control and flexibility of |
Heh, parts of this rant have been a long time coming :) To those that want to combine everything, I am sympathetic, but it's very important to remember that these different implementation strategies have different tradeoffs, and that matter for a language like Rust. Much talk of continuation, threads, etc misses out that. For example, we can use:
The solution is not to try to paper over these distinction, but provide ways to be agnostic/polymorphic to which is used. For a first step, it's good to make higher-order functions agnostic in their arguments. For example a suspended thread in my forked libfringe can be treated as a normal closure if the current continuation is prepended as the first argument. Even better however, would to write polymorphic code that itself can be specialized to one of those types. We can go either the monadic or the algebraic affects route for this. For those that care, I'd argue the root of the problem is that Rust (and C) implicitly manipulate the call stack, unlike a hypothetical typed assembly language. In such a typed assembly language, a contiguous call stack, a segmented call stack, webs of dynamically dispatched closures, finite state machines, etc, can all be on equal footing. Then everything we want here could be left to libraries and not the compiler, and all would be well. It's infeasible to ditch LLVM and get really fancy dependent types in the at-all-near future (let me not suggest otherwise), so we will need compiler extensions/plugins. But I think with enough cleverness, we can integrate that with langauge-level abstractions like monad trait / effect system. I have some hunches on how that might be done, but they're pretty WIP, and I've already said some odd stuff, so I'll call it quits here. |
This ability to run async code from synchronous code is somewhat assumed. The split the world issue still exists, though, and this doesn't do anything to solve it, unfortunately. The link (shared twice above by different people) titled What Color is Your Function?, explains that this correspondence exists. Though it could be a neat language feature if we did end up with
Using threads implies implicit suspension points. That doesn't mean there aren't some explicit suspension points in there too, but you have to consider both complexities together when you do that, so it's something you likely wouldn't want to do without serious consideration. Rust isn't designed to make explicit suspension points unnecessary AFAIK, it simply doesn't have the feature. A post-1.0 feature. Perhaps there's some future enhancement that would make it unnecessary, but I haven't heard of it yet.
You are correct that single threaded event loops can't use multi-core parallelism. That's the whole point. And if you're using them, they don't need to, because the CPU isn't your bottleneck, IO is.
bifurcation. And I learn a new word, thank you. 😃 It depends, of course, on what you mean by "handles it for you". Rust's type system can make things memory-safe, but there are other conditions that you have to solve yourself, because Rust does not solve it for you. Consider the dual lock deadlock. In single-threaded explicit-suspension-points world, handling such a situation is trivial (python-ish pseudo-code, I'm overlooking some problems Rust saves you from to show you one it doesn't): import get_big_thing # A big IO bound task.
# Both locks are required simultaneously to safely run get_big_thing
mutex1, mutex2 = Mutex(), Mutex()
async def this_way():
lock1 = mutex1.lock()
lock2 = mutex2.lock()
return (await get_big_thing())
# Locks released at end of scope
async def that_way():
lock2 = mutex2.lock()
lock1 = mutex1.lock()
return (await get_big_thing())
# Locks released at end of scope
# Some run loop runs these two coroutines The problem here is a dead-lock. If a synchronous version of this code is run on multiple threads to make it parallel, it would be easy for the the first lock to be obtained by Using a single thread to run these coroutines (because our bottleneck is IO, not CPU) works really well, because the explicit It's important to note that the purpose of this is not to make it easy to convert synchronous code to async code. The purpose is to allow the author to reason about async code. The explicit suspension points help the author to do that. This point is so important that Python chose not to implement the
Yes, the long-tail effect really exists. The longer we wait before providing the choice, the worse it gets, because there's more code that has to be considered retroactively. I expect this to be a big burden in the Python community going forward. They still added the feature. Unless there's another alternative that can make these deadlocks and race conditions that the borrow checker does not solve easier to reason about, this solution does indeed help programmers reason about what will happen in their program. Unless there is a better solution to the problem, all we would do by waiting is exacerbate the problem. |
Not necessarily. Event loop/promise processing adds overhead, and there's a maximum amount of overhead a single core can handle, which limits the number of (possibly small) IO operations that can be handled. This isn't academic; the reason my company switched from Python+Twisted to Go was this reason. We were CPU bound when we should have been IO bound. That said, it was Python + Twisted, so I admit the overhead was unnecessarily large.
In this example, I assume you mean to have lines like
Not to belabor the point but so far I've only seen more confusion.
The initial release of Twisted was 12 years ago. I know they're finally adding deferreds and event loops to the standard lib, but there's been prior art and experience with this stuff for a long time.
I know this isn't news or anything, but it is impossible to solve all deadlocks, in a mathematical proof sort of way. Anyway, I guess I should point out I understand the reasons why Rust decided to not do green threads and why that results in people discussing async IO, but it's just super disappointing. When something is so close to perfect you really care about those last few blemishes. |
Also, I want to point out that I totally give you that the async style does allow the programmer to see when she has full control of the thread of execution. That is a benefit. But everything is tradeoffs, and if I had to order my priorities that benefit would be very far down the priority list, certainly underneath being able to use all the third party libraries I need. Additionally, I have seen subtle synchronization bugs introduced when someone has some ordered logic depending on complete control of the thread of execution, and then some later refactor comes in and turns a synchronous CPU-bound call into an async call, and then the synchronization logic isn't correspondingly updated. I think relying on having full control of the CPU is the wrong kind of thing to rely on, and personally I'd rather not fool people into thinking about it. But yes, it is all about tradeoffs. |
@erickt There are two reasons why a port is unnecessary/undesirable:
|
How are recursive generators handled? In languages that aren't Rust, the generator can make another instance of itself and then yield from it using a for loop. I think lifetimes can be used to prevent it, but maybe it's something we want to allow. Python has a @erickt |
@camlorn Such recursion would become a type recursion and require boxing to avoid the state having an "infinite" size. |
@eddyb |
@camlorn: stateful doesn't do something like I bet a sufficiently smart optimizer could inline these generators to get you down to O(1) jumps. |
Considering that Rust is built on LLVM, have you been following the coroutine discussion on LLVM-Dev? |
Sorry to double comment, but this page might be more useful as to LLVM 4.0 coroutines: http://llvm.org/docs/Coroutines.html |
@l0calh05t / @camlorn: Yeah, I'm somewhat aware of llvm's proposal. As best as I can tell, under the covers they're effectively doing the same transformations that stateful is doing - compiling down functions into state machines and a reified stack to store stack variables. One thing that's slightly dodgy from my first glance is that they seem to require that the coroutine frame might need to be allocated, but that's not required for stateful (once we get impl trait). Really it comes down where we'd get the best advantage. If LLVM implement some crazy optimizations that we don't want to duplicate, it might make sense to use LLVM's implementation. However, it might be faster, or integrate better with our type system and borrow checker if we lower this transformation on the Rust side. I'm not sure what'd be best. |
The proposal includes a "heap allocation elision" optimization (CoroElide) pass that will store the coroutine in its parent stackframe when possible. So if you only create such coroutines you never get heap allocations. It is also planed to allow this kind of optimizations across ABI boundaries. If anything it could be worth it to chim in there and clarify a bit more under which conditions heap allocation might be necessary |
@gnzlbg That only works for OS stuff if you can guarantee that the heap allocation will never happen (and get a compile-time error when it cannot be elided). |
Do we want to be able to put DSTs on a coroutines stack frame?(I want to) Do we want to store coroutine stack frames in e.g. a Vec? (I want to) If So while I agree it is useful to be able to ensure that in some cases On Monday, 3 October 2016, Demi Marie Obenour [email protected]
|
@gnzlbg Boxing the generator is as efficient as having the generator be a pointer to a stack frame, as far as I'm aware. Boxing the dsts shouldn't be that much of an overhead either. Can you provide an example of code that puts a dst on the stack that you think might be both important and problematic? To be honest, I find the points about avoiding allocation to be more convincing than otherwise; if you don't care about such things, use a higher level language with GC and whatnot and call it a day. If being able to allocate dsts next to each other for cache friendliness is important, I can think of ways to write libraries that try to do this, as well. They are slightly less convenient than it just working, but maintain the same sorts of advantages. |
In kernel code, or hard real-time code, you might not be allowed to allocate On Oct 4, 2016 17:09, "Austin Hicks" [email protected] wrote:
|
If I need to If I cannot use DSTs inside a coroutine, you are essentially splitting the language into two parts, the part that can be used within a coroutine and the part that cannot be used within a coroutine. I think this is also bad.
I think that avoiding allocation is very important but in case somebody has a coroutine that must allocate, that should be a zero-cost abstraction as well. I think it would be bad to have to come up with a completely different coroutine system to address this use case, and hence why I am raising this issue.
Then don't use language features that allocate memory (like I am sure we can find ways of disabling coroutines that allocate memory from kernel code such that coroutines that do not allocate can be used safely. |
@DemiMarie @gnzlbg The thing is that someone has to know the size eventually. Even with Box, the thing that goes into |
I'm not understanding why you can't know the size of a generator/coroutine at compile time? Assuming it's non-recursive, of course. If it makes function calls, those can just live on the normal call stack since they'll only be invoked when control is inside the generator. If it invokes other generators/coroutines, those would just live in the static memory of the coroutine itself. Circular dependencies would be detected the same way they're detected in structs. The size of a coroutine would just be pessimistically the minimum size needed to hold all the variables in the local control flow stack, excluding function calls. My understanding is it'd look something like this:
In this example, the storage required for this generator (before compiler optimizations, and ignoring alignments) would be:
Plus a small, fixed amount of extra space to store where the last yield was so we can resume iteration. So I don't understand the argument that we can't know the size of a generator at compile time. It's true that the size of a generator might be large at compile time, to account for deeply-nested but non-recursive generators, but that's no different than the deeply nested struct that would be required to implement the same pattern. I've elided ownership & lifetime concerns here, but I don't see any obvious reasons in principle that it would be different than a lambda or a struct. |
@Lucretiel Now, the really fun part: work is going on to eliminate unused variables. I think your example might actually only have to include one of I'm starting to think I should throw an RFC out which does what we can do and leaves room for streaming in future whenever we get HKT. I'm also looking at LLVM, and probably it's best to do this as LLVM coroutines if we can. There's a pretty long todo list, however, including a few things which might be deal breakers (specifically alignment being ignored by the allocation/freeing intrinsics). This documentation isn't currently very good, but it looks like the pointer for the dynamically allocated frame is supplied by you, not just made by LLVM. So probably it could be worked out such that Rust generates a struct with a |
Actually I take that back. I bet you can't move the frame once you allocate it, which would sadly make this heap only in some cases. The only saving grace is that you can tell beforehand. |
Sure, because of dangling pointers, but that's how everything in Rust is, right? Like, that's the whole point of the ownership / borrow / mutable borrow system. It's no different than being unable to move a stack-allocated struct once you've allocated it on the stack; you can move it by bitwise copy, but only when there are no living borrows of that struct in existence. The point is that a generator "frame" is essentially the same thing as a struct, in terms of ownership, lifetime, and sizing constraints. |
The only problem I can see is that there might be some interior references, like this:
Which would make it impossible to move the struct in a way that the borrow will accept (again, before compiler optimizations or a potentially more relaxed borrow checker). But it's still essentially syntax sugar over a struct. The challenge will be communicating borrow checker violations to the user in a useful way, but I don't see any reasons in principle that it's different than a struct frame. |
Yikes. Sounds like any borrows of stack allocated variables at a yield Obviously, this would be a major usability problem. On Oct 6, 2016 3:35 PM, "Lucretiel" [email protected] wrote:
|
I'm betting that you can't even move LLVM coroutines by bitwise copy safely. The implementation assumes that you allocate the pointer for it in some cases, but doesn't seem to allow you to set the pointer if it changes. I.e. if you use This is fixable if you just allocate the frame on the heap, but having Rust language features allocate implicitly is way against the language's philosophy, imho. In terms of references, What doesn't work is specifically borrowing a variable on the "stack" of the generator and then yielding the reference. You can borrow them all you want, you just can't yield the reference. This is the streaming iterator problem, and the latest RFC is #1598. As for borrow checker errors, you do that first as well. If the coroutine tries to yield anything with a lifetime less than the return type of the coroutine, you error. And then you make the lifetime elision rules apply like it's a function. I.e. you're allowed to yield references to/from any argument to the generator, but not from inside the generator. In general, you do any safety checks before lowering the generator to a struct and iterator implementation. We might need to wait for #1598, though, if only because hitting the non-streaming case and the streaming case at the same time might be necessary. I sort of favor dedicated syntax for a streaming generator: we can implicitly detect that it needs to be streaming but the difference would be one tiny change in the code, anywhere at all. This would lead to cryptic errors that are hard to fix. |
offtopic: The cppcon2016 video about LLVM coroutines is up: https://www.youtube.com/watch?v=8C8NnE1Dg4A |
I want to breathe some fresh air into this RFC, with my yield function to state machine (as an iterator) concept. Gist. It's not very good - it was quickly hacked together, but it actually compiles - you can play around with it! Explanation: The commented-out function I didn't follow this RFC much, but it appears that the current discussion is mostly about LLVM's built in coroutine support. I guess my concept could be considered an opposite to LLVM coroutines, since my concept could be implemented at a very high compiler level, possibly even procedural macros. Feedback very welcome! |
Experimental RFC #2033 has been accepted and implemented in nightly. |
Closing in favor of rust-lang/rust#43122 |
+1
…On Sun, 7 Oct 2018, 8:42 am Mazdak Farrokhzad, ***@***.***> wrote:
Closing in favor of rust-lang/rust#43122
<rust-lang/rust#43122>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#388 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPWS80F_HlVsD55ir-nBzB5NcqsVwDQks5uibBagaJpZM4CtyWG>
.
|
Issue by bstrie
Friday Jul 12, 2013 at 13:46 GMT
For earlier discussion, see rust-lang/rust#7746
This issue was labelled with: A-an-interesting-project, E-hard, I-wishlist in the Rust repository
@thestinger has mentioned this several times as a wishlist item. This is a feature that greatly simplifies the implementation of external iterators by allowing them to be compiled to state machines at compile time (did I get that right?). See http://msdn.microsoft.com/en-us/library/vstudio/9k7k7cf0.aspx for reference.
While we shouldn't expect this for 1.0, it might be prescient to reserve the keyword right now.
Nominating for far-future milestone.
The text was updated successfully, but these errors were encountered: