Undefined behavior invoked by moving stacks between threads #6

vorner · 2017-12-23T20:53:47Z

A coroutine's stack can be moved from one thread to another. However, a stack may always contain things that are not Send, breaking things.

This is one example:

extern crate may;

use std::cell::RefCell;
use std::rc::Rc;
use std::thread::ThreadId;

use may::coroutine;
use coroutine::yield_now;

thread_local!(static ID: RefCell<Option<Rc<ThreadId>>> = RefCell::new(None));

fn main() {
    may::config().set_io_workers(60);
    may::config().set_workers(60);
    let h = coroutine::spawn(move || {
        let v = (0..10000)
            .map(|i| {
                coroutine::spawn(|| {
                    let handle = Rc::new(std::thread::current().id());
                    ID.with(|id| {
                        *id.borrow_mut() = Some(Rc::clone(&handle));
                    });
                    for _ in 0..10000 {
                        if *handle != std::thread::current().id() {
                            println!("Access to Rc content without a mutex from a different thread, {:?} vs {:?}",
                                     *handle, std::thread::current().id());
                        }
                        yield_now();
                    }
                })
            })
            .collect::<Vec<_>>();
        for i in v {
            i.join().unwrap();
        }
    });
    h.join().unwrap();
}

If something else was accessing the Rc (like making copis of it) from the original thread (which it could, because it is accessible in the thread local storage), it would be undefined behaviour ‒ both the coroutine, that moved, and the thing in that thread (possibly other coroutine) could be accessing the counters in the Rc at the same time, or the data inside, which could be for example a RefCell.

Now, suggesting not to use thread local storage doesn't solve anything, because:

I might not use thread local storage myself, but I can't certainly be expected to audit all the libraries I use not to use thread local storage. Even the standard library uses thread local storage internally.
There are things that are not Send for other reasons. One example might be Zero-MQ sockets, which explode the whole application if ever touched from a different thread then they were created in.

I believe this problem is fundamental to any attempt to move stacks between threads in Rust. Such thing just breaks the Rust contract.

So my only suggestion is to create the coroutine in one thread (it is possible to check nothing Send crosses a closure boundary, so the closure can be safely sent to another thread) and then pin it there to that thread.

The text was updated successfully, but these errors were encountered:

Xudong-Huang · 2017-12-24T07:31:59Z

I didn't see UB by running the following code.
you can ref this blog: Don’t visit thread local storage in coroutine context

#[macro_use]
extern crate may;

use std::cell::RefCell;
use std::rc::Rc;
use std::thread::ThreadId;

use may::coroutine;
use coroutine::yield_now;

coroutine_local!(static ID: RefCell<Option<Rc<ThreadId>>> = RefCell::new(None));

fn main() {
    may::config().set_io_workers(4).set_workers(4);
    let h = coroutine::spawn(move || {
        let v = (0..10000)
            .map(|i| {
                coroutine::spawn(|| {
                    let handle = Rc::new(std::thread::current().id());
                    ID.with(|id| {
                        *id.borrow_mut() = Some(Rc::clone(&handle));
                    });
                    for _ in 0..10000 {
                        if *handle != std::thread::current().id() {
                            println!("Access to Rc content without a mutex from a different thread, {:?} vs {:?}",
                                     *handle, std::thread::current().id());
                        }
                        yield_now();
                    }
                })
            })
            .collect::<Vec<_>>();
        for i in v {
            i.join().unwrap();
        }
    });
    h.join().unwrap();
}

zonyitoo · 2017-12-24T08:12:44Z

Unfortunately, you cannot bypass the TLS problem in Rust. Because libstd uses thread_local every where.

My coio project implemented exactly the same scheduling strategy and APIs as may, and I have found this problem long ago. As far as I can tell, there is no way to bypass.

vorner · 2017-12-24T08:29:17Z

What do you mean, by „seeing“ UB?

No, this code doesn't contain UB itself, only points to a place where it could happen. But if I, for example, added ID.with(|id| Rc::clone(id.borrow())); Rc::clone(handle) into the inner loop, just above the conditional, that would already be UB. I'd would modifying the reference counts inside the Rc from multiple threads without synchronisation, possibly making it wrong and freeing the insides too early or something.

Note that UB can do anything. That includes pretending to work correctly when you watch. That's the tricky part about UB ‒ you don't have to see it, it can be lurk in the program and do arbitrarily bad things only from time to time (when the customer watches).

And I know I don't want to use TLS from coroutines. The problem is, a blog post is not the appropriate solution. TLS isn't the only problem, there are others ‒ system calls that are expected to happen on specific thread, FFI libraries, etc. You never know if a library you use might be using TLS and not telling you. You are not supposed to have to watch for these things in Rust if you're not using unsafe.

If your library exposes a „safe“ interface which allows your user to invoke UB by it, the library is broken and dangerous to use. So you can either ensure the UB can never happen (by eg. not moving coroutines between threads after they are created), or mark the spawn method as unsafe, because it's the exactly correct mechanism to tell the user of the library he has to check, that there's something not covered by the compiler/library.

Xudong-Huang · 2017-12-24T08:31:18Z

@zonyitoo I see this problem now. I have some questions about the issue:

what kind of std API internally using thread_local?
Is it safe to finish calling the API just before yield for another scheduling?
do you have any sensible code that can reproduce the issue?

zonyitoo · 2017-12-24T08:50:01Z

For instance, panic! and println!
No. It is not safe. Because compiler don't know that you are going to switch this execution context to another thread, so compiler will cache the pointer of TLS in register without actually reload it from memory, which is safe in non-coroutine environment. Reloading from TLS every time you want to access it will have very poor performance.
Well, it is simple, just call println! many many times, and you will see the crash. Because it is a UB, so I can't tell you when it happens, but it will happen. Issue

Also, because Rust is a system programming language, libraries that contains FFI calls to some existing C/C++ libraries dominated most of the Rust's world. You may can control all your libraries to be "thread_local free", but you cannot control all those external libraries in dependencies.

At least, you have no idea when libstd add another thread_local global state in future releases. And you have no way to control it.

I have worked on stackful coroutine with work-stealing scheduler for almost 2 years, and I have already found that this is a dead end. I have tried to

contact one of the LLVM's maintainer to help with generating IRs that can "tell" compiler to re-fetch from TLS, but they refused.
contact alex, the maintainer of Rust's standard library. And we also have many discussion about how libstd can support this kind of stackful coroutines library, and the answer is Rust's libstd won't help.

I am not damping your enthusiasm of making coroutine libraries, but I have to tell you that the way you are going is not very promising. Feel free to contact me, I think we will have lots of common topics.

KiChjang · 2017-12-24T08:50:02Z

@Xudong-Huang We've actually been discussing your crate and its issues in the Rust China community QQ group. We'd like to invite you to join our discussion, please add this QQ group 303838735.

Xudong-Huang · 2017-12-24T09:00:40Z

@KiChjang thanks! will see you there.

Xudong-Huang · 2017-12-24T13:37:15Z

I use following test and can't see any problem, maybe the test is too simple.
I use the library for sometime and didn't see problem when calling libstd API.

extern crate may;

use std::sync::Arc;
use may::sync::Semphore;

fn main() {
    let sem = Arc::new(Semphore::new(10000));
    may::config().set_workers(100);
    loop {
        sem.wait();
        let s = sem.clone();
        may::coroutine::spawn(move || ->() {
            s.post();
            may::coroutine::yield_now();
            panic!("i'm dying");
        });
    }
}

Xudong-Huang · 2017-12-24T14:15:20Z

Access TLS in coroutine is not safe unless:

Before schedule another coroutine the value of the TLS is a fixed one, so that schedule coroutine at any thread has the same TLS value, which is safe.

Just like calling panic, before you schedule another coroutine you must have catch the panic, and at that time the panic count(TLS) will be always 0.

Xudong-Huang · 2017-12-26T05:34:28Z

I upload may-caveat doc. I think calling libstd API is safe within may coroutine. and tests like calling println!() and panic!() massively all passed.

The basic goal of MAY is support all libstd APIs be safely called in coroutine. I think this goal is accomplished.

I will close this issue. But if you find some libstd API that is not safely called in coroutine, welcome to reopen the issue.

@vorner @zonyitoo

vorner · 2017-12-26T07:56:29Z

I'd like to point out that this is not about just libstd (but I'll try to reproduce it with libstd).

Anyway, if you really want to go the way to tell user „don't do this“ and „do this“, that's fine, but you should make corresponding methods unsafe, to ensure they read and follow the instructions. That's the purpose of unsafe, it tells people they need to be careful. Unlike C++ people don't expect to have to watch out for things.

osa1 · 2017-12-26T08:10:15Z

@Xudong-Huang is it fair to say that we don't have to avoid TLS use as long as we run may with one thread only? Can we update the may_caveat document with this info?

Another question (sorry this is kind of unrelated) is, when calling libraries from a MAY coroutine, I have to make sure the library uses MAY's IO functions rather than std IO functions, correct (e.g. should use may::net::TcpStream instead of std::net::TcpStream) ?

Xudong-Huang · 2017-12-26T08:17:25Z

@osa1

is it fair to say that we don't have to avoid TLS use as long as we run may with one thread only?

I'm afraid not, MAY schedule coroutines at least on one worker thread and one io_worker thread. that is at least two threads, change the scheme would hurt IO performance.

when calling libraries from a MAY coroutine, I have to make sure the library uses MAY's IO functions rather than std IO functions, correct (e.g. should use may::net::TcpStream instead of std::net::TcpStream)?

correct, and this is the first rule that list in may_caveat

Xudong-Huang · 2017-12-26T08:21:07Z

@vorner

Anyway, if you really want to go the way to tell user „don't do this“ and „do this“, that's fine, but you should make corresponding methods unsafe, to ensure they read and follow the instructions. That's the purpose of unsafe, it tells people they need to be careful. Unlike C++ people don't expect to have to watch out for things.

this would mark the core APIs as unsafe, which would

lost optimization by compiler
very annoying to use the API

and I don't have a plan to change API specs.

vorner · 2017-12-26T09:10:59Z

Your API already is unsafe. Pretending it's safe won't make it so, it'll only hurt users.

Very good explanation about unsafe in rust was publish just recently: https://manishearth.github.io/blog/2017/12/24/undefined-vs-unsafe-in-rust/.

To cite:

Basically, in Rust a bit of code is “safe” if it cannot exhibit undefined behavior under all circumstances of that code being used.

Xudong-Huang · 2017-12-26T09:23:39Z

@vorner You are right. Thanks for the sharing

Restioson · 2017-12-26T12:22:58Z

very annoying to use the API

It's also very annoying to encounter UB in 'safe' code...

Xudong-Huang · 2017-12-26T13:31:01Z

thanks all, I'm considering change the api spec to add unsafe declaration for spawning a coroutine, since it's in an early stage, it's a good time to do that. Apply the rust unsafe rules is important for users.

osa1 mentioned this issue Dec 24, 2017

Adding other chat backends? osa1/tiny#47

Open

Xudong-Huang mentioned this issue Dec 26, 2017

add unsafe property for coroutine creation #8

Closed

Xudong-Huang closed this as completed Dec 27, 2017

Xudong-Huang added the question label Dec 28, 2017

matklad mentioned this issue Feb 3, 2018

May is mentioned in the docs but is not used rust-crates/ergo_sync#5

Open

jmaargh mentioned this issue Jun 1, 2024

The go! macro is unsound #109

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undefined behavior invoked by moving stacks between threads #6

Undefined behavior invoked by moving stacks between threads #6

vorner commented Dec 23, 2017

Xudong-Huang commented Dec 24, 2017

zonyitoo commented Dec 24, 2017

vorner commented Dec 24, 2017 •

edited

Loading

Xudong-Huang commented Dec 24, 2017 •

edited

Loading

zonyitoo commented Dec 24, 2017 •

edited

Loading

KiChjang commented Dec 24, 2017

Xudong-Huang commented Dec 24, 2017 •

edited

Loading

Xudong-Huang commented Dec 24, 2017 •

edited

Loading

Xudong-Huang commented Dec 24, 2017

Xudong-Huang commented Dec 26, 2017 •

edited

Loading

vorner commented Dec 26, 2017

osa1 commented Dec 26, 2017

Xudong-Huang commented Dec 26, 2017 •

edited

Loading

Xudong-Huang commented Dec 26, 2017

vorner commented Dec 26, 2017

Xudong-Huang commented Dec 26, 2017

Restioson commented Dec 26, 2017

Xudong-Huang commented Dec 26, 2017

Undefined behavior invoked by moving stacks between threads #6

Undefined behavior invoked by moving stacks between threads #6

Comments

vorner commented Dec 23, 2017

Xudong-Huang commented Dec 24, 2017

zonyitoo commented Dec 24, 2017

vorner commented Dec 24, 2017 • edited Loading

Xudong-Huang commented Dec 24, 2017 • edited Loading

zonyitoo commented Dec 24, 2017 • edited Loading

KiChjang commented Dec 24, 2017

Xudong-Huang commented Dec 24, 2017 • edited Loading

Xudong-Huang commented Dec 24, 2017 • edited Loading

Xudong-Huang commented Dec 24, 2017

Xudong-Huang commented Dec 26, 2017 • edited Loading

vorner commented Dec 26, 2017

osa1 commented Dec 26, 2017

Xudong-Huang commented Dec 26, 2017 • edited Loading

Xudong-Huang commented Dec 26, 2017

vorner commented Dec 26, 2017

Xudong-Huang commented Dec 26, 2017

Restioson commented Dec 26, 2017

Xudong-Huang commented Dec 26, 2017

vorner commented Dec 24, 2017 •

edited

Loading

Xudong-Huang commented Dec 24, 2017 •

edited

Loading

zonyitoo commented Dec 24, 2017 •

edited

Loading

Xudong-Huang commented Dec 24, 2017 •

edited

Loading

Xudong-Huang commented Dec 24, 2017 •

edited

Loading

Xudong-Huang commented Dec 26, 2017 •

edited

Loading

Xudong-Huang commented Dec 26, 2017 •

edited

Loading