Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility of Rust's stdlib with Coroutines #33368

Closed
lhecker opened this issue May 3, 2016 · 60 comments
Closed

Incompatibility of Rust's stdlib with Coroutines #33368

lhecker opened this issue May 3, 2016 · 60 comments
Labels
C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@lhecker
Copy link

lhecker commented May 3, 2016

The issue

thread_local! is used in the stdlib, which does not work well with Coroutines. Repeated access to a TLS variable inside a method might be cached and optimized by LLVM. If a coroutine is transferred from one thread to another this can lead to problems, due to a coroutine still using the cached reference to the TLS storage from the previous thread.

What is not the issue

TLS being incompatible with coroutines for the most part (e.g. here) is well known and not an issue per se. You want to use rand::thread_rng() with coroutines? Just use rand::StdRng::new() instead! Most of the time it's just quite easy to circumvent the TLS by simply using something different. This is not true for the stdlib though. One way or the other you're using it somewhere probably.

Possible solutions

  • Add a option akin to RFC 1513 to replace the stdlib with one with a builtin "libgreen". This might actually be much more practicable than it sounds at first, since the overhead of implementating this is not much larger than of the other options.
  • Add a option akin to RFC 1513 to control inlining of TLS access at compile time.
  • Make it possible to hook into thread_local!. I think that this could be hard to achieve in a performant way though.
  • Reduce the usage of TLS inside the stdlib and instead let crates use it as they please. Panic and unwind semantics could for instance be changed to match C++. This would obviate PANIC_COUNT and it's wonky implementation and still make entirely sure that a stack is unwound twice. Other uses of TLS inside the stdlib could be wrapped inside inline(never) without causing large overheads.
  • Possibly some other way? I read that TLS variables are rendered as globals inside LLVM. If we mark the suspension function as "can write to any memory location", we could make LLVM stop caching the TLS access...

I hope we can find a solution for this as this is really a huge problem for using stackful coroutines with Rust and who doesn't want "Go" but with Rust's syntax, eh? 😉

@nagisa
Copy link
Member

nagisa commented May 3, 2016

You could add a yet another way to panic as per rust-lang/rfcs#1513, in a way which circumvents any catch_unwind.

@arielb1
Copy link
Contributor

arielb1 commented May 3, 2016

@lhecker

If catch_unwind did not run user code within the with, would it be better? In that case you still could not switch coroutines during a panic (which you can't do currently anyway, because of PANIC_COUNT), but you would not have the reference problem.

@lhecker
Copy link
Author

lhecker commented May 3, 2016

@nagisa Would that work across crates? I'm also not sure if that would be easier than simply splitting up catch_unwind into it's 2 atomic parts.

@arielb1 Yeah it would solve the reference problem, but then it would set the prev to the wrong thread. I think this could be the easiest solution though as long as you make sure that thread::panicking() is false before you call catch_unwind(). If no other alternative gets accepted I'll submit a PR for that.

@nagisa
Copy link
Member

nagisa commented May 3, 2016

Would that work across crates?

The RFC requires only one panicking mechanism to be used in the final binary.

I'm also not sure if that would be easier than simply splitting up catch_unwind into it's 2 atomic parts.

If we want to think about that at all we gotta act fast, because 1.9 with a stable catch_unwind is being released in 2016-05-26.

Make it possible to hook into thread_local!, thus solving the problem entirely. I think that this could be hard to achieve in a performant way though.

Something towards this would be my preferred solution, but keep in mind that thread_local! itself is a wrapper over many layers of abstractions over TLS (any of which could be used anywhere and by anyone).

@lhecker
Copy link
Author

lhecker commented May 3, 2016

I don't think another panic behaviour is the right choice here @nagisa, since changing the "obversable" behaviour of panic isn't really what coroutine implementations need. Rather the API or the implementation has to change. Maybe I misunderstood your idea though...

Hooking into thread_local! would also be my preferred choice if it can be done in a way that doesn't hinder performance. To get a good performance it could be done like rust-lang/rfcs#1513 though, right? (I.e. replacing the TLS lookup function at compile time.)

Apart from that I think that @arielb1's idea is quite good and could be implemented right now without causing any regressions.

@zonyitoo
Copy link

zonyitoo commented May 4, 2016

I still cannot get the idea behind the PANIC_COUNT, because it just allows people to panic inside Drop::drop while panicking. But why is that?

If we can remove the PANIC_COUNT, and just store a boolean flag in TLS to indicate that whether this thread is panicking, this issue could be solved. In the catch_unwind, you can restore the flag so that you can panic again.

I don't know why we need to accept panic in drop?

@sfackler
Copy link
Member

sfackler commented May 4, 2016

PANIC_COUNT doesn't "allow" people to panic inside drop while panicking - it allows the runtime to determine that has happened so it can abort instead of re-unwinding.

PANIC_COUNT is a number stored in TLS, and I'm not sure how changing that to a bool would materially affect the issues here.

@lhecker
Copy link
Author

lhecker commented May 4, 2016

@sfackler Yeah making it a bool wouldn't solve it probably. But why is this implemented in the "runtime" instead of doing it like C++ with it's noexcept destructors? Is there any technical reason for this? And what about the other solutions I listed in the initial comment?

EDIT: Thanks GitHub for putting the close button right beside the comment button, without adding any confirmation dialogue. UI Design? Anyone? 😐

@lhecker lhecker closed this as completed May 4, 2016
@lhecker lhecker reopened this May 4, 2016
@zonyitoo
Copy link

zonyitoo commented May 4, 2016

@sfackler Well, yes. Changing it to bool won't solve all the problems. But, at least, catch_panic will work:

thread_local! { pub static IS_PANICKING: Cell<bool> = Cell::new(false); }

// Here is a function that will be called when panic! happens
// Just like the one in https://github.com/rust-lang/rust/blob/master/src/libstd/panicking.rs#L198
fn on_panic(obj: &(Any+Send), file: &'static str, line: u32) {
    let is_panicking = IS_PANICKING.with(|s| {
        let orig = s.get();
        s.set(true);
        orig
    });

    if is_panicking {
        // Abort right here
        util::dumb_print(format_args!("thread panicked while processing \
                                       panic. aborting.\n"));
        unsafe { intrinsics::abort() }
    }

    // ...
}

// https://github.com/rust-lang/rust/blob/master/src/libstd/sys/common/unwind/mod.rs#L159
pub fn panicking() -> bool {
    IS_PANICKING.with(|s| s.get())
}

// The catch_panic
// https://github.com/rust-lang/rust/blob/master/src/libstd/sys/common/unwind/mod.rs#L131
unsafe fn inner_try(f: fn(*mut u8), data: *mut u8)
                    -> Result<(), Box<Any + Send>> {
    if panicking() {
        // It should not be allowed (to catch panic while panicking)
        unsafe { intrinsics::abort() }
    }

    let mut payload = imp::payload();
    let r = intrinsics::try(f, data, &mut payload as *mut _ as *mut _);

    // Clear the flag because we already caught the panic
    IS_PANICKING.with(|s| s.set(false));

    if r == 0 {
        Ok(())
    } else {
        Err(imp::cleanup(payload))
    }
}

it allows the runtime to determine that has happened so it can abort instead of re-unwinding.

I think this solution can also fulfill this purpose and catch_panic can be used in coroutines now.

@nagisa
Copy link
Member

nagisa commented May 4, 2016

But why is this implemented in the "runtime" instead of doing it like C++ with it's noexcept destructors?

Because unwind in Rust is always a cold path and we do not want to generate extra drop glue just to accomodate for unwinds from drop glue.

I don't know why we need to accept panic in drop?

Because it has perfectly sensible use-cases.

@lhecker
Copy link
Author

lhecker commented May 4, 2016

Because unwind in Rust is always a cold path and we do not want to generate extra drop glue just to accomodate for unwinds from drop glue.

Hmm... Your statement sounds reasonable, but is that based on actual benchmarking?

Because it has perfectly sensible use-cases.

I kind of don't understand that though, because if your destructor can panic, it will one time correctly unwind the thread and one time it will crash the whole program, because the destructor was called as part of a ongoing unwind. I can't say I like that "wonkyness". If you have time: Would you care to point out a valid use case? It's purely optional and out of interest though.

@jonas-schievink
Copy link
Contributor

@lhecker The compiler's own DiagnosticBuilder can panic in its destructor:

impl<'a> Drop for DiagnosticBuilder<'a> {
fn drop(&mut self) {
if !panicking() && !self.cancelled() {
self.emitter.borrow_mut().emit(&MultiSpan::new(),
"Error constructed but not emitted",
None,
Bug);
panic!();
}
}
}

@nagisa
Copy link
Member

nagisa commented May 4, 2016

Hmm... Your statement sounds reasonable, but is that based on actual benchmarking?

Implementing nounwind drop glue would simply double the size of binary code used by drop glues. There’s nothing to benchmark here.

If you have time: Would you care to point out a valid use case?

One case, as pointed out by @jonas-schievink, would be ensuring that all objects go out of scope in a valid state. You cannot drop a DiagnosticBuilder without either emitting or cancelling it, because its a compiler bug otherwise. Similar could apply to something like GuaranteedFlushBufWriter which only allows dropping in case it is properly flushed. There’s many more things one could think of.

You still do not want these to just abort because somewhere down the drop chain there might be a drop which would write some restore state/debug logs/user configuration/etc (but it is fine if the application aborts in case those drops panic, because saving such state when panicking is a last ditch effort anyway).

Basically, IME, disallowing panicking in Drop::drop would be a weird and unpleasant restriction. Whereas, an application panicking twice and then aborting as a result of double-panic won’t make the application any more broken, because it is already broken.

@lhecker
Copy link
Author

lhecker commented May 4, 2016

One case, as pointed out by @jonas-schievink, would be ensuring that all objects go out of scope in a valid state. You cannot drop a DiagnosticBuilder without either emitting or cancelling it, because its a compiler bug otherwise.

Aaaah... That makes sense! That's actually a really good argument for panicking drops. Thanks! 😊

@lhecker lhecker changed the title Incompatibility of std::panic with coroutines Incompatibility Rust's stdlib with Coroutines May 6, 2016
@lhecker lhecker changed the title Incompatibility Rust's stdlib with Coroutines Incompatibility of Rust's stdlib with Coroutines May 6, 2016
@lhecker
Copy link
Author

lhecker commented May 6, 2016

I edited the issue because @alexcrichton finally brought me to my senses regarding the widespread use of TLS in the stdlib. There is actually a specific reason why I always thought that it's not an issue and easily solvable but I'm too emberassed to disclose that dumb idea. 😐

@arielb1
Copy link
Contributor

arielb1 commented May 7, 2016

Implementing nounwind drop glue would simply double the size of binary code used by drop glues. There’s nothing to benchmark here.

How much code is used by drop glue? It might even be worth it to even do RTTI-based drop glue for unwinding.

@nagisa
Copy link
Member

nagisa commented May 7, 2016

@arielb1 I’m not sure if I did it correctly, but it seems like librustc has about 53591 bytes of drop glue. libsyntax = 33031B.

(command used: objdump -t libblah.rlib | grep '$10drop.' | awk '{ sum += strtonum(gensub(/^0*/,"0x","g",$5)) } END { print sum }')

@arielb1
Copy link
Contributor

arielb1 commented May 7, 2016

That's small potatoes.

@nagisa
Copy link
Member

nagisa commented May 7, 2016

@arielb1 I wouldn’t call librustc drop-glue intensive, though. Far from it.

@lhecker
Copy link
Author

lhecker commented May 7, 2016

I still do wonder why the binary size would blow up that much though... Since all the "nounwind drop glue" would do is to literally call a single method. Isn't that just one EH entry plus one call in the binary? Compared to the size of drop itself I can't imagine this resulting in a 100% size increase.

Well if the exception handling in Rust would be a bit less "cumbersome" as it is now it would already solve most of the problems with coroutines. All that would be left after that is afaik LOCAL_STDOUT.

I think the only other option is to add a compile time option akin to RFC 1513. But if we did that I might as well write a RFC to add full opt-in coroutine support to Rust because the difference in effort is probably negligible. (In fact coio is much faster than the old libgreen anyways already.)

@nagisa
Copy link
Member

nagisa commented May 7, 2016

I still do wonder why the binary size would blow up that much though...

If the function (e.g. drop glue) marked as nounwind actually begins unwinding, you have undefined behaviour, therefore you must replace all the occurences of panicking in the nounwind glue with some other non-terminating side effect (e.g. abort, exit, unreachable).

This basically means the compiler would have to generate two kinds of drop glue:

  • One as we do currently, which allows unwinding from it, for various reasons (backcompat being one of them);
  • Another with nounwind which would get called only as a handler for unwinds from other drop glue (i.e. the drops during PANIC_COUNT=1 case).

This is basically a 100% or close to 100% increase in drop glue size.

@lhecker
Copy link
Author

lhecker commented May 7, 2016

Uhm... Why don't you just call abort in drop()'s landing pad? C++ does that as well and it (or rather "C++ compilers") doesn't replace anything. It just emits call _std_terminate (or whatever the symbol is called) in the landing pad and that's it. That's one instruction plus the exception handling data for the landing pad.

You can see it here as LLVM IR: http://llvm.org/docs/ExceptionHandling.html#new-exception-handling-instructions

@arielb1
Copy link
Contributor

arielb1 commented May 7, 2016

@lhecker

The problem is that then, any panic in a destructor will cause the program to abort.

OTOH, with the current situation, if there is a DiagnosticEmitter on the stack during a panic, the program will abort anyway, so maybe it's not that bad.

@nagisa
Copy link
Member

nagisa commented May 7, 2016

The problem is that then, any panic in a destructor will cause the program to abort.

I think they mean we should abort on unwind edges from drop glue that is executed as part of unwinding already.

Why don't you just call abort in drop()'s landing pad?

That’s a fair point. I guess there’s a few factors here:

  • PANIC_COUNT is/was easier to implement overall and stays all within the library which also provides the unwinding implementation;
  • I guess that would mean making the whatever abort we use for that a nounwind, which is some extra work.

@arielb1
Copy link
Contributor

arielb1 commented May 7, 2016

I think they mean we should abort on unwind edges from drop glue that is executed as part of unwinding already.

But then won't we need the double drop glue? Either

  1. All destructors are nounwind - DiagnosticEmitter aborts when abandoned.
  2. Double-unwinding aborts using TLS - this uses TLS which interacts badly with coroutines
  3. Unwinding pushes a stack frame that aborts when unwound past - potential for infinite recursion?
  4. Double-panicking works somehow - must figure out how

Doubling up all drop-glue will probably require Rust 2.0 - because the destructor for Vec etc. also needs to be doubled to call the right drop glue.

@lhecker
Copy link
Author

lhecker commented May 7, 2016

I think they mean we should abort on unwind edges from drop glue that is executed as part of unwinding already.

I'm unfortunately not that familiar with the terminology in this community so I don't fully understand what you meant with that. (I'm sorry for that.) But I think @arielb1 understood me correctly.

To say it as plain as possible again: I do like C++'s exception rules for destructors more than Rust's complicated system in std::panic (and the unwind module). This means that any exception inside a destructor will unwind the stack but be catched at the destructor's method boundary and lead to an process abort. It's thus impossible for a stack to be unwound twice.

The problem is that then, any panic in a destructor will cause the program to abort.

Yeah... I understand why panicking destructors are a thing that can be quite useful. So it's not like I don't understand it.

But it still just feels so incredibly "wrong" that it's possible in a "safe" language for a program to one time survive a thread unwind (after which it could spawn it again) and one time where the whole process crashes. And all of that is decided on a "whim" over wether a destructor is run as part of the regular flow or as part of a active unwind.

You know? That's just doesn't deterministic and I personally think that determinism should be something very valuable in programming languages.

@lhecker
Copy link
Author

lhecker commented Jun 1, 2016

Choosing a solution for non-trivial problems is exactly what the RFC process is for.

I do believe that this is a hard change, but depending on the solution for this it's not a "substantial" one. The only thing right now that might require a RFC would be "abort on panic inside destructors", since it does change semantics (where you can safely panic). (I don't think that this will affect many people though). I do think that I should have opened this issue in the RFC repo though.

I’m pretty sure a proposal to reimplement standard library parts which currently use TLS to use
something else would be accepted provided thread-safety and speed are not compromised on.
If you're not writing that RFC, I have to and that means it's filled with "Okay boys... Now you tell me what to write here, because I can't possibly decide it. 😉".

The question is what you’d replace TLS with? I, personally, have no idea provided the requirements.

Which is why it'd be cool if @rust-lang/libs could (finally) say something or at least give some hints about this. 😕

libstd also uses thread locals in ways which aren’t related to panicking

It's only LOCAL_STDOUT, THREAD_RNG_KEY and THREAD_INFO. Should be doable.

--extern std=yourlib.rlib

Thanks! Didn't knew that.

@nagisa nagisa added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jun 1, 2016
@nagisa
Copy link
Member

nagisa commented Jun 1, 2016

cc @rust-lang/libs

@retep998
Copy link
Member

retep998 commented Jun 1, 2016

I'm just gonna take a moment to point out that 64-bit Windows has UMS threads, which give you proper OS threads that support TLS and such, but using a user mode scheduler so you effectively have coroutines, and also have other advantages like returning control to your scheduler whenever you block in a syscall. Doesn't solve the problem on other platforms though.

@zonyitoo
Copy link

zonyitoo commented Jun 2, 2016

Doesn't solve the problem on other platforms though.

Well, obviously this is not a good solution.

I am happy to make any PRs for this, but what I am asking for is Rust's team to tell me which solution would be acceptable.

@brson
Copy link
Contributor

brson commented Jun 7, 2016

I'd be in favor of either

  • Pursuing anew a std built on green threads. Just because we discarded it once doesn't mean we can't do it again, and I personally always had intentions to look into it. This would be another sort of global scenario that crates could opt into.
  • Making minimally-invasive changes to std to be coroutine compatible.

Again related to 'compilation scenarios' you could easily imagine a global switch that turned on 'slow' TLS and did the right thing everywhere.

@mitsuhiko
Copy link
Contributor

If you have an idea how to get rid of all TLS in the standard library without compromising on what TLS provides us, the fastest way is to submit relevant pull requests against this repository containing said changes.

Is it really so unlikely that coroutine aware TLS could be achieved? I can see UMS becoming a thing going forward which would make this much easier so maybe it's not the worst idea in the world to investigate this?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2016

What's the "dependency graph" of TLS within the standard lib? That is:

  • where exactly is TLS used (list all the places), and
  • which other std lib functionality use these directly or indirectly (list all the places).

Without this information explicitly stated it is very hard to know which way is the best way forward.

In my opinion the best solution to this problem is not to fork the standard library, but to:

  • implement a way to detect usage of Thread Local Storage in functions/methods/types. TLS is essentially a global variable (at the thread level), so anything using it should not be Send. Maybe we need a CoroutineSend trait that detects this. That way you can constrain your library API and prevent users from using TLS. The main issues I see here is that Traits do not apply to functions, only to types, but a function using TLS is no different than a closure (it has some state at the thread level).
  • offer your own Coroutine Local Storage (CLS) type so that your users can use that instead, e.g., Boost.Fiber in Boost 1.62 provides Fiber Local Storage (FLS) that can be used in place of TLS and it explicitly allows migrating coroutines with FLS between threads (migrating coroutines using TLS is undefined behavior though).

Depending on how big is the dependency graph of TLS in the standard lib, we might be able to split that out in a std::tls module and abstract around it, so that users can avoid it by using your coroutine::cls equivalents instead interfacing with the rest of std. I think we need more info here to asses this properly.

EDIT: the objective should not be to encourage people to fork libstd to implement green threading, but to make it really easy to implement different green threading solutions as a library that can work "as seamlessly as possible" with libstd.

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Jun 29, 2016

@gnzlbg TLS is not a global variable, it's very much local. You could think of it like a hidden parameter that is passed to all functions.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2016

Isn't the TLS variable available during the whole lifetime of a thread? (independently of in which function your are)? (in C++ is at least so). Need to read more on Rust thread local variables but I assumed they would work the same.

@mitsuhiko
Copy link
Contributor

@gnzlbg global variables have to deal with concurrency, TLS does not. TLS gets away with a RefCell and without unsafe. A global variable however needs a mutex or something comparable.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2016

Is a closure using a thread local variable Send ? (I suppose so)

EDIT: and if that is the case, does the compiler treat TLS as volatile (i.e. does it reload the address of the TLS variable on every access)?

@retep998
Copy link
Member

@gnzlbg Right now TLS does not have to be treated as volatile because during execution of a function you'll never suddenly be on a different thread. If a closure is sent to a different thread then it'll see the TLS of the thread it was sent to and is running on.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2016

@gnzlbg Right now TLS does not have to be treated as volatile because during execution of a function you'll never suddenly be on a different thread.

@retep998 Right, but if a coroutine yields before finishing, and gets sent to another thread where it resumes execution, when it access TLS it will see the values of the thread it was sent from (which might not longer exist! and is unsafe!).

I can only think of two different situations involving coroutines and TLS, and none of them make sense to me in practice.

  • Case A: when a coroutine is migrated, and execution is resumed, the coroutine wants to access the TLS of the thread it has been migrated to.
  • Case B: when a coroutine is migrated, and execution is resumed, the coroutine wants to access the TLS of the thread it was migrated from.

In Case A, when execution is resumed, the TLS variables still refer to the storage of the thread the coroutine was migrated from. There are two options, either we update them to refer to the TLS the coroutine was migrated to, or we prevent sending coroutines that access TLS between threads.

I think that updating TLS to refer to completely different storage on resumption is a recipe for disaster since it makes very hard to reason about what is going on. So in my opinion, the best thing would be to forbid Case A completely by forbidding coroutines that access TLS from being migrated between threads. That probably means making thread local storage !Send such that closures/types that refer or contain them also become !Send.

Case B makes no sense to me either, since in this case the variable should not be thread local in the first place. Still, Case B is safe as long as the thread the coroutine was migrated from outlives the thread the coroutine has been migrated to. So this could be allowed if the lifetimes can be enforced.

@zonyitoo
Copy link

zonyitoo commented Jun 29, 2016

Well, it is not a wise move to forbid all coroutines from using TLS values. Adding that constraint on TLS value will make users confused and it also becomes an obstacle for users to migrate their programs to use coroutines.

For your first case, there is a good solution: Make TLS to be volatile. If a coroutine is switched out and migrated to another thread, then when execution is resumed in another thread, all subsequence TLS value access will go to the TLS that coroutine was migrated to.

We can do that by adding a flag to compiler.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2016

For your first case, there is a good solution: Make TLS to be volatile. If a coroutine is switched out and migrated to another thread, then when execution is resumed in another thread, all subsequence TLS value access will go to the TLS that coroutine was migrated to.

I suggested adding volatile above as well but I don't like that myself: at every suspension point of a coroutine, every time the corrutine is resumed, all the values of all TLS might have changed. I think this will make it really hard to reason about what a coroutine is going to do since it is impossible to reason about its current state (think about a generator, suspending inside of a loop, where at every loop iteration you might have different TLS values).

I am pretty sure that there are valid use cases for wanting to silently update TLS on coroutine resumption, but I think that forbidding it will prevent hard to debug user errors (the scheduler might migrate coroutines at will), and when the user really wants the coroutine state to change on resumption, there are more explicit alternatives (e.g. get a handler to the thread in which the coroutine is currently running and use that to access some "thread local" state).

Another reason to be against making all TLS volatile, even when implemented as opt in via a compiler flag, is that it pessimizes all the code that access TLS and that is not migrated between threads.

Still, the most important reason is being able to reason about the code. In C, C++, and D, migrating a coroutine/fiber that access thread local storage between threads is undefined behavior, and they cannot catch it at compile time. Is there any low level language that allows migrating coroutines that access thread local storage between threads? What semantics do they chose?

@zonyitoo
Copy link

As far as I know, non of them (system programming languages) allows migrating coroutines that access thread local storage between threads. Take Go as an example, the only way you can use TLS is by cgo, and they run FFI calls in a separate thread (can't be sure).

Your case about suspending inside of a loop is very convincing! In that case, making TLS volatile is not a good choice, or say, it may make the result of program wrong completely!

Back to this issue, is there a way to get rid of those TLS usages in libstd? If we want to make TLS !Send, all of them has to be gone.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2016

So I just checked and in C++'s Coroutines Technical Specification reading a thread_local from a coroutine returns the value that the thread_local has on the thread currently running the coroutine. It is defined behavior.

And no, thread_local variables are not volatile. What happens is the following.

You initiate the coroutine on a particular thread. The coroutine runs on that thread until its first suspension point. Then it gets suspended.

When you resume the coroutine (in whatever thread you decide to do so), resuming the coroutine is just a function call that calls the system scheduler. The system scheduler then "possibly" migrates the coroutine to a different thread, which resumes the coroutine by calling a function that continues after the suspension point of the coroutine. When the coroutine after this point reads a thread_local variable, it reads the variable from the current thread running the coroutine (in C++ that value might not be initialized in that thread, so doing so might introduce undefined behavior due to a read of uninitialized memory, but not due to a read of a thread_local variable per se).

For this to work the compiler only needs to avoid caching / reordering reads of thread_local variables across suspension points in a coroutin, volatile is not needed. C++ coroutines are actually implemented as state machines internally.

@zonyitoo
Copy link

It seems that someone is going to add coroutine support directly to LLVM, which means that it is possible to tell LLVM not to inline TLS calls between context swaps.

https://internals.rust-lang.org/t/llvm-coroutines-to-bring-awarness

@pcwalton
Copy link
Contributor

pcwalton commented Jan 13, 2017

I think that Rust should never support M:N goroutines as Go implements them. This was decided a long time ago.

Kernel-assisted UMS-style solutions should be fine, however.

@lhecker
Copy link
Author

lhecker commented Jan 13, 2017

@pcwalton Your comment literally left @zonyitoo and me speachless...
First the stdlib is deliberately crippled to make it impossible to implement userland coroutines and then it's made impossible to add the functionality back. Oh man...

P.S.: Just call them suspend-down coroutines.
P.P.S.: x64 userland context switching takes consistently about 7ns. Not 179ns like LPC. Or even in the order of µs. Rust's implementation back then was laughable at best and stands to no comparison.

@zonyitoo
Copy link

@pcwalton I can totally understand the reason Rust's team does not want to support coroutines officially. But could you please open a door for us to give it a try? Or could you please give us a chance to implement anything like tokio for comparison?

I admit futures-rs is a big step in Rust world for asynchronous programming, but I am sure you can see that not everyone prefers this callback style. Both coroutines and futures require lots of work for building the supporting libraries, such as tokio. Why not just give a shot? Is it because the last libgreen implementation made you feel coroutines will run very slow? Could you please take a glance at coio-rs? It is now stuck in a very early stage due to this issue, but works fine for some benchmarks.

@Manishearth
Copy link
Member

To be clear, the door isn't closed to non-callback style. A lot of folks do want JS-style generators in the language. That, and/or async/await syntax, may happen to make this easier. Given that the ecosystem is converging on tokio and futures, that's probably the direction that we'll take.

@pcwalton's comment was about the Go model specifically. The Go model does have plenty of costs associated with it that all programs will have to pay. Rust can solve the same problems without implementing the Go model. I personally am hoping for generator syntax or async/await to clean this up.


The door is open here; you need a better proposal than "stop using TLS in the standard library" (or "mark all TLS as !Send", which is not backwards compatible). This is very similar to the issues we had with libgreen in the first place; folks had to pay an extra cost for it even if they didn't need green threads, which is antithetical to Rust's zero-cost-abstraction philosophy. Something like Brian's proposal would work (#33368 (comment)). There are other proposals in this thread (some of them yours) that might work as well. I suggest making a comment listing all the viable proposals with their pros and cons, and perhaps making a discussion post on internals.rust-lang.org to figure out what folks like best. Then, make an RFC.

(Discussion of proposals on Rust issues rarely gets anywhere, Rust issues do not have that kind of visibility. This issue tracker is for tracking implementation work that needs to be done on rustc itself, where the user-facing design decisions have already been made.)

@lhecker
Copy link
Author

lhecker commented Jan 14, 2017

@pcwalton's comment was about the Go model specifically. The Go model does have plenty of costs associated with it that all programs will have to pay.

That's the point I don't get. You can't implement 1:1 scheduling on top of N:M anyways, so the discussion if Go's model is fit for Rust is out of the window anyways. This is only about N:M scheduling and coroutines specifically and can be implemented as a library on top of 1:1 scheduling without hurting the performance of anyone else whatsoever.

The door is open here; you need a better proposal than "stop using TLS in the standard library" (or "mark all TLS as !Send", which is not backwards compatible).

@Manishearth Can you give me an idea what that might be? The Rust stdlib comes prebuilt, which makes it impossible to fix the TLS problem in a way that's comfortable for Rust users.
Literally the only thing I can imagine is to make the TLS part pluggable like the allocator has been made (which is one thing Brian recommended right?). I mean what else is possible? Nothing right?
But if you make TLS usage in the stdlib pluggable you might as well provide an entire alternative stdlib, providing a "Go-like" environment, because that'd be ironically easier to implement and to use.
And in that case I already got you covered here.

Is there possibly anything I missed (seriously)?

P.S.:
I'm sorry, but I have to insist on it, because I don't want to deal with the negative connotation associated with Go here… "Go's model" is commonly called the suspend-down version of coroutines. This can be identified by methods not having an explicit declaration of them being async. Suspension also happens seemingly automatically inside the called method. Furthermore it's possible to have specialized user-land schedulers here.
The opposite - "async/await" - is called suspend-up, because here you explicitly suspend the callee (await) before executing the async action. This can be implemented as "zero overhead", because suspend-up coroutines can be 1:1 translated by the compiler to simple state machines (which allows function inlining). Since it's zero-overhead I believe this has a proper right to be officially supported by Rust. The "Goroutine" on the other hand is really just a marketing term and nothing new.

@Manishearth
Copy link
Member

Manishearth commented Jan 14, 2017

Can you give me an idea what that might be?

You could possibly have your own marker trait that works similar to Send but bounds the closures used in your coroutine library, and make sure it's not implemented on the TLS keys. This marker trait could be part of the stdlib. It's restrictive, but a solution (I'm not sure if it's actually feasible, just an idea).

A pluggable TLS is a viable solution, and you can try to flesh that out into a pre-RFC. The problem with an "alternate" stdlib is that it can end up being incompatible with large parts of the ecosystem, which we don't want.

Even a flag for volatile TLS sounds like it could work here, though I'm not sure.

The change has to be one that can't affect existing libraries, and if it's a flag or pluggable solution the only effect it can have on existing libraries is a performance difference. That's the standard anything of this form has to go through.

There are probably other solutions this thread hasn't explored.

@steveklabnik
Copy link
Member

Triage: we have generators as an experimental RFC, implemented in nightly.

@lhecker
Copy link
Author

lhecker commented May 27, 2020

Should we close this issue for now?

  • There haven't been any comments for a long while
  • async/await is stable (and it's hella nice 😍)
  • I don't want to clutter this poor project with my poorly crafted issues 😄

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests