-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incompatibility of Rust's stdlib with Coroutines #33368
Comments
You could add a yet another way to panic as per rust-lang/rfcs#1513, in a way which circumvents any |
If |
@nagisa Would that work across crates? I'm also not sure if that would be easier than simply splitting up @arielb1 Yeah it would solve the reference problem, but then it would set the |
The RFC requires only one panicking mechanism to be used in the final binary.
If we want to think about that at all we gotta act fast, because 1.9 with a stable
Something towards this would be my preferred solution, but keep in mind that |
I don't think another panic behaviour is the right choice here @nagisa, since changing the "obversable" behaviour of panic isn't really what coroutine implementations need. Rather the API or the implementation has to change. Maybe I misunderstood your idea though... Hooking into Apart from that I think that @arielb1's idea is quite good and could be implemented right now without causing any regressions. |
I still cannot get the idea behind the If we can remove the I don't know why we need to accept panic in |
|
@sfackler Yeah making it a bool wouldn't solve it probably. But why is this implemented in the "runtime" instead of doing it like C++ with it's EDIT: Thanks GitHub for putting the close button right beside the comment button, without adding any confirmation dialogue. UI Design? Anyone? 😐 |
@sfackler Well, yes. Changing it to bool won't solve all the problems. But, at least, thread_local! { pub static IS_PANICKING: Cell<bool> = Cell::new(false); }
// Here is a function that will be called when panic! happens
// Just like the one in https://github.com/rust-lang/rust/blob/master/src/libstd/panicking.rs#L198
fn on_panic(obj: &(Any+Send), file: &'static str, line: u32) {
let is_panicking = IS_PANICKING.with(|s| {
let orig = s.get();
s.set(true);
orig
});
if is_panicking {
// Abort right here
util::dumb_print(format_args!("thread panicked while processing \
panic. aborting.\n"));
unsafe { intrinsics::abort() }
}
// ...
}
// https://github.com/rust-lang/rust/blob/master/src/libstd/sys/common/unwind/mod.rs#L159
pub fn panicking() -> bool {
IS_PANICKING.with(|s| s.get())
}
// The catch_panic
// https://github.com/rust-lang/rust/blob/master/src/libstd/sys/common/unwind/mod.rs#L131
unsafe fn inner_try(f: fn(*mut u8), data: *mut u8)
-> Result<(), Box<Any + Send>> {
if panicking() {
// It should not be allowed (to catch panic while panicking)
unsafe { intrinsics::abort() }
}
let mut payload = imp::payload();
let r = intrinsics::try(f, data, &mut payload as *mut _ as *mut _);
// Clear the flag because we already caught the panic
IS_PANICKING.with(|s| s.set(false));
if r == 0 {
Ok(())
} else {
Err(imp::cleanup(payload))
}
}
I think this solution can also fulfill this purpose and |
Because unwind in Rust is always a cold path and we do not want to generate extra drop glue just to accomodate for unwinds from drop glue.
Because it has perfectly sensible use-cases. |
Hmm... Your statement sounds reasonable, but is that based on actual benchmarking?
I kind of don't understand that though, because if your destructor can panic, it will one time correctly unwind the thread and one time it will crash the whole program, because the destructor was called as part of a ongoing unwind. I can't say I like that "wonkyness". If you have time: Would you care to point out a valid use case? It's purely optional and out of interest though. |
@lhecker The compiler's own rust/src/libsyntax/errors/mod.rs Lines 354 to 364 in 3157691
|
Implementing nounwind drop glue would simply double the size of binary code used by drop glues. There’s nothing to benchmark here.
One case, as pointed out by @jonas-schievink, would be ensuring that all objects go out of scope in a valid state. You cannot drop a You still do not want these to just Basically, IME, disallowing panicking in |
Aaaah... That makes sense! That's actually a really good argument for panicking drops. Thanks! 😊 |
I edited the issue because @alexcrichton finally brought me to my senses regarding the widespread use of TLS in the stdlib. There is actually a specific reason why I always thought that it's not an issue and easily solvable but I'm too emberassed to disclose that dumb idea. 😐 |
How much code is used by drop glue? It might even be worth it to even do RTTI-based drop glue for unwinding. |
@arielb1 I’m not sure if I did it correctly, but it seems like librustc has about 53591 bytes of drop glue. libsyntax = 33031B. (command used: |
That's small potatoes. |
@arielb1 I wouldn’t call librustc drop-glue intensive, though. Far from it. |
I still do wonder why the binary size would blow up that much though... Since all the "nounwind drop glue" would do is to literally call a single method. Isn't that just one EH entry plus one Well if the exception handling in Rust would be a bit less "cumbersome" as it is now it would already solve most of the problems with coroutines. All that would be left after that is afaik I think the only other option is to add a compile time option akin to RFC 1513. But if we did that I might as well write a RFC to add full opt-in coroutine support to Rust because the difference in effort is probably negligible. (In fact |
If the function (e.g. drop glue) marked as nounwind actually begins unwinding, you have undefined behaviour, therefore you must replace all the occurences of panicking in the nounwind glue with some other non-terminating side effect (e.g. This basically means the compiler would have to generate two kinds of drop glue:
This is basically a 100% or close to 100% increase in drop glue size. |
Uhm... Why don't you just call abort in You can see it here as LLVM IR: http://llvm.org/docs/ExceptionHandling.html#new-exception-handling-instructions |
The problem is that then, any panic in a destructor will cause the program to OTOH, with the current situation, if there is a |
I think they mean we should abort on unwind edges from drop glue that is executed as part of unwinding already.
That’s a fair point. I guess there’s a few factors here:
|
But then won't we need the double drop glue? Either
Doubling up all drop-glue will probably require Rust 2.0 - because the destructor for |
I'm unfortunately not that familiar with the terminology in this community so I don't fully understand what you meant with that. (I'm sorry for that.) But I think @arielb1 understood me correctly. To say it as plain as possible again: I do like C++'s exception rules for destructors more than Rust's complicated system in
Yeah... I understand why panicking destructors are a thing that can be quite useful. So it's not like I don't understand it. But it still just feels so incredibly "wrong" that it's possible in a "safe" language for a program to one time survive a thread unwind (after which it could spawn it again) and one time where the whole process crashes. And all of that is decided on a "whim" over wether a destructor is run as part of the regular flow or as part of a active unwind. You know? That's just doesn't deterministic and I personally think that determinism should be something very valuable in programming languages. |
I do believe that this is a hard change, but depending on the solution for this it's not a "substantial" one. The only thing right now that might require a RFC would be "abort on panic inside destructors", since it does change semantics (where you can safely panic). (I don't think that this will affect many people though). I do think that I should have opened this issue in the RFC repo though.
Which is why it'd be cool if @rust-lang/libs could (finally) say something or at least give some hints about this. 😕
It's only
Thanks! Didn't knew that. |
cc @rust-lang/libs |
I'm just gonna take a moment to point out that 64-bit Windows has UMS threads, which give you proper OS threads that support TLS and such, but using a user mode scheduler so you effectively have coroutines, and also have other advantages like returning control to your scheduler whenever you block in a syscall. Doesn't solve the problem on other platforms though. |
Well, obviously this is not a good solution. I am happy to make any PRs for this, but what I am asking for is Rust's team to tell me which solution would be acceptable. |
I'd be in favor of either
Again related to 'compilation scenarios' you could easily imagine a global switch that turned on 'slow' TLS and did the right thing everywhere. |
Is it really so unlikely that coroutine aware TLS could be achieved? I can see UMS becoming a thing going forward which would make this much easier so maybe it's not the worst idea in the world to investigate this? |
What's the "dependency graph" of TLS within the standard lib? That is:
Without this information explicitly stated it is very hard to know which way is the best way forward. In my opinion the best solution to this problem is not to fork the standard library, but to:
Depending on how big is the dependency graph of TLS in the standard lib, we might be able to split that out in a EDIT: the objective should not be to encourage people to fork libstd to implement green threading, but to make it really easy to implement different green threading solutions as a library that can work "as seamlessly as possible" with libstd. |
@gnzlbg TLS is not a global variable, it's very much local. You could think of it like a hidden parameter that is passed to all functions. |
Isn't the TLS variable available during the whole lifetime of a thread? (independently of in which function your are)? (in C++ is at least so). Need to read more on Rust thread local variables but I assumed they would work the same. |
@gnzlbg global variables have to deal with concurrency, TLS does not. TLS gets away with a |
Is a closure using a thread local variable EDIT: and if that is the case, does the compiler treat TLS as volatile (i.e. does it reload the address of the TLS variable on every access)? |
@gnzlbg Right now TLS does not have to be treated as volatile because during execution of a function you'll never suddenly be on a different thread. If a closure is sent to a different thread then it'll see the TLS of the thread it was sent to and is running on. |
@retep998 Right, but if a coroutine yields before finishing, and gets sent to another thread where it resumes execution, when it access TLS it will see the values of the thread it was sent from (which might not longer exist! and is unsafe!). I can only think of two different situations involving coroutines and TLS, and none of them make sense to me in practice.
In Case A, when execution is resumed, the TLS variables still refer to the storage of the thread the coroutine was migrated from. There are two options, either we update them to refer to the TLS the coroutine was migrated to, or we prevent sending coroutines that access TLS between threads. I think that updating TLS to refer to completely different storage on resumption is a recipe for disaster since it makes very hard to reason about what is going on. So in my opinion, the best thing would be to forbid Case A completely by forbidding coroutines that access TLS from being migrated between threads. That probably means making thread local storage Case B makes no sense to me either, since in this case the variable should not be thread local in the first place. Still, Case B is safe as long as the thread the coroutine was migrated from outlives the thread the coroutine has been migrated to. So this could be allowed if the lifetimes can be enforced. |
Well, it is not a wise move to forbid all coroutines from using TLS values. Adding that constraint on TLS value will make users confused and it also becomes an obstacle for users to migrate their programs to use coroutines. For your first case, there is a good solution: Make TLS to be volatile. If a coroutine is switched out and migrated to another thread, then when execution is resumed in another thread, all subsequence TLS value access will go to the TLS that coroutine was migrated to. We can do that by adding a flag to compiler. |
I suggested adding volatile above as well but I don't like that myself: at every suspension point of a coroutine, every time the corrutine is resumed, all the values of all TLS might have changed. I think this will make it really hard to reason about what a coroutine is going to do since it is impossible to reason about its current state (think about a generator, suspending inside of a loop, where at every loop iteration you might have different TLS values). I am pretty sure that there are valid use cases for wanting to silently update TLS on coroutine resumption, but I think that forbidding it will prevent hard to debug user errors (the scheduler might migrate coroutines at will), and when the user really wants the coroutine state to change on resumption, there are more explicit alternatives (e.g. get a handler to the thread in which the coroutine is currently running and use that to access some "thread local" state). Another reason to be against making all TLS volatile, even when implemented as opt in via a compiler flag, is that it pessimizes all the code that access TLS and that is not migrated between threads. Still, the most important reason is being able to reason about the code. In C, C++, and D, migrating a coroutine/fiber that access thread local storage between threads is undefined behavior, and they cannot catch it at compile time. Is there any low level language that allows migrating coroutines that access thread local storage between threads? What semantics do they chose? |
As far as I know, non of them (system programming languages) allows migrating coroutines that access thread local storage between threads. Take Go as an example, the only way you can use TLS is by cgo, and they run FFI calls in a separate thread (can't be sure). Your case about suspending inside of a loop is very convincing! In that case, making TLS volatile is not a good choice, or say, it may make the result of program wrong completely! Back to this issue, is there a way to get rid of those TLS usages in libstd? If we want to make TLS |
So I just checked and in C++'s Coroutines Technical Specification reading a And no, You initiate the coroutine on a particular thread. The coroutine runs on that thread until its first suspension point. Then it gets suspended. When you resume the coroutine (in whatever thread you decide to do so), resuming the coroutine is just a function call that calls the system scheduler. The system scheduler then "possibly" migrates the coroutine to a different thread, which resumes the coroutine by calling a function that continues after the suspension point of the coroutine. When the coroutine after this point reads a For this to work the compiler only needs to avoid caching / reordering reads of |
It seems that someone is going to add coroutine support directly to LLVM, which means that it is possible to tell LLVM not to inline TLS calls between context swaps. https://internals.rust-lang.org/t/llvm-coroutines-to-bring-awarness |
I think that Rust should never support M:N goroutines as Go implements them. This was decided a long time ago. Kernel-assisted UMS-style solutions should be fine, however. |
@pcwalton Your comment literally left @zonyitoo and me speachless... P.S.: Just call them suspend-down coroutines. |
@pcwalton I can totally understand the reason Rust's team does not want to support coroutines officially. But could you please open a door for us to give it a try? Or could you please give us a chance to implement anything like I admit |
To be clear, the door isn't closed to non-callback style. A lot of folks do want JS-style generators in the language. That, and/or @pcwalton's comment was about the Go model specifically. The Go model does have plenty of costs associated with it that all programs will have to pay. Rust can solve the same problems without implementing the Go model. I personally am hoping for generator syntax or async/await to clean this up. The door is open here; you need a better proposal than "stop using TLS in the standard library" (or "mark all TLS as !Send", which is not backwards compatible). This is very similar to the issues we had with libgreen in the first place; folks had to pay an extra cost for it even if they didn't need green threads, which is antithetical to Rust's zero-cost-abstraction philosophy. Something like Brian's proposal would work (#33368 (comment)). There are other proposals in this thread (some of them yours) that might work as well. I suggest making a comment listing all the viable proposals with their pros and cons, and perhaps making a discussion post on internals.rust-lang.org to figure out what folks like best. Then, make an RFC. (Discussion of proposals on Rust issues rarely gets anywhere, Rust issues do not have that kind of visibility. This issue tracker is for tracking implementation work that needs to be done on rustc itself, where the user-facing design decisions have already been made.) |
That's the point I don't get. You can't implement 1:1 scheduling on top of N:M anyways, so the discussion if Go's model is fit for Rust is out of the window anyways. This is only about N:M scheduling and coroutines specifically and can be implemented as a library on top of 1:1 scheduling without hurting the performance of anyone else whatsoever.
@Manishearth Can you give me an idea what that might be? The Rust stdlib comes prebuilt, which makes it impossible to fix the TLS problem in a way that's comfortable for Rust users. Is there possibly anything I missed (seriously)? P.S.: |
You could possibly have your own marker trait that works similar to A pluggable TLS is a viable solution, and you can try to flesh that out into a pre-RFC. The problem with an "alternate" stdlib is that it can end up being incompatible with large parts of the ecosystem, which we don't want. Even a flag for volatile TLS sounds like it could work here, though I'm not sure. The change has to be one that can't affect existing libraries, and if it's a flag or pluggable solution the only effect it can have on existing libraries is a performance difference. That's the standard anything of this form has to go through. There are probably other solutions this thread hasn't explored. |
Triage: we have generators as an experimental RFC, implemented in nightly. |
Should we close this issue for now?
What do you think? |
The issue
thread_local!
is used in the stdlib, which does not work well with Coroutines. Repeated access to a TLS variable inside a method might be cached and optimized by LLVM. If a coroutine is transferred from one thread to another this can lead to problems, due to a coroutine still using the cached reference to the TLS storage from the previous thread.What is not the issue
TLS being incompatible with coroutines for the most part (e.g. here) is well known and not an issue per se. You want to use
rand::thread_rng()
with coroutines? Just userand::StdRng::new()
instead! Most of the time it's just quite easy to circumvent the TLS by simply using something different. This is not true for the stdlib though. One way or the other you're using it somewhere probably.Possible solutions
thread_local!
. I think that this could be hard to achieve in a performant way though.PANIC_COUNT
and it's wonky implementation and still make entirely sure that a stack is unwound twice. Other uses of TLS inside the stdlib could be wrapped insideinline(never)
without causing large overheads.I hope we can find a solution for this as this is really a huge problem for using stackful coroutines with Rust and who doesn't want "Go" but with Rust's syntax, eh? 😉
The text was updated successfully, but these errors were encountered: