Thread cancellation, asynchronous unwinding exceptions and their interaction with drops #211

nagisa · 2018-03-29T11:10:38Z

So during the all-hands we identified a use-case that would be good to be documented in the unsafe code guidelines.

There are at least three sources of "cancellation" which might end up "removing" (in Taylor’s words) a value without "dropping" it, which in turn results in unsoundness for e.g. rayon, crossbeam, &pin stuff. These sources are:

longjmp/setjmp (used in practice for error handling by rust lua and perhaps many other embedded languages/interpreters);
pthread_cancel: which can run either an asynchronous unwinding exception (might occur at any program point) or raise an unwindingexception at well specified points such as sleep; exact behaviour is specified by pthread_setcancelstate.
pthread_exit, pthread_kill: which will "stop" the thread, potentially executing some arbitrary code and cleaning the thread up (freeing thread’s stack).

There’s no question that these functions may be useful in one scenario or the other, so it would be good if we figured out scenarios in which these functions are sound to use (e.g. if the thread stack contains only Copy types) and encoded this information into our unsafe code guidelines book.

cc @nikomatsakis @cramertj

The text was updated successfully, but these errors were encountered:

Amanieu · 2018-03-29T11:29:42Z

To be clear, pthread_kill sends a signal to a specific thread. If that signal is not handled by a signal handler then the whole process is killed, not just that thread. I think this is outside the scope of this discussion.

Windows has a TerminateThread function which kills a thread without performing any cleanup. Use of this function is strongly discouraged because it can lead to memory leaks and/or deadlocks.

Forking a multi-threaded process also results in something which strongly resembles killing a thread: the child process after forking will only have a single thread, effectively making it look like all other threads have been killed. This has the same downsides as TerminateThread: thread stacks and thread-local storage are leaked, and any held locks are not unlocked.

Since both forking and TerminateThread do not run any cleanup code, there is no unsafety: they are equivalent to simply suspending the threads indefinitely. Memory leaks and deadlocks are not considered to be memory-unsafe.

nagisa · 2018-03-29T11:45:01Z

According to the documentation of TerminateThread "the system frees thread's initial stack" which is exactly what constitutes a violation of "must not remove before drop" and was decided to be unsound. If the thread stack was leaked, such function would have been fineish to use from the soundness perspective.

Amanieu · 2018-03-29T11:51:44Z

pthread_exit is more complicated since it frees the thread stack but does not unwind the stack. This could cause issues if, for example, you were to pass a reference to stack-local data to another thread, and then kill the current thread:

let mut var = 0;
crossbeam::scope(|scope| {
   scope.spawn(|| loop { var += 1; } );
   unsafe { pthread_exit(); }
});

Using pthread_exit can probably be used safely as long as you ensure that exist no pointers back into the stack or TLS of the killed thread (or if such pointers exist then you must make sure to never dereference them).

Note that this includes the standard library and any external crates that you might be using. The case for external crates is simple: you are responsible for auditing them (with no help from Rust's type system). However for the standard library this is a bit more complicated: do we want to stable-guarantee that all of the libstd code involved in spawning a thread (thread::spawn) will also be safe with regards to exiting the thread?

According to the documentation of TerminateThread "the system frees
thread's initial stack"

Ah, I seem to have missed that. In that case TerminateThread should be treated roughly the same way as pthread_exit.

Amanieu · 2018-03-29T12:12:39Z

Asynchronous cancellation using pthread_cancel is a bit tricky since it may involve unwinding from places that may not expect unwinding to occur:

If PTHREAD_CANCEL_DEFERRED is used (this is the default type), then libc will attempt to unwind from syscalls such as sleep, which are currently assumed to be nounwind. Have we decided whether this is UB or a guaranteed abort?
If a thread is already in the process of unwinding due to a panic, I am not exactly sure what will happen but it's probably not good.
Asynchronous unwinding (from an arbitrary instruction in a Rust function) is probably fine from the language point of view. DWARF & SEH64 have unwinding tables that are precise for each instruction, while SEH32 and SJLJ unwinding push/pop frames atomically with regards to signal handlers. However actual Rust code may not be designed with arbitrary unwinding in mind. The same question as pthread_exit applies here: which parts of libstd are we will to guarantee are "async-unwind-safe"?
Have we decide what the behavior of foreign exceptions unwinding through Rust frames is? Is it allowed or UB? Is drop called or not?

nagisa · 2018-03-29T13:13:03Z

Have we decide what the behavior of foreign exceptions unwinding through Rust frames is? Is it allowed or UB? Is drop called or not?

Plan is to have unwinding from FFI be fine if the functions are annotated with #[unwind]. It is still unclear whether the cleanups in the rust code would be run or not, though.

nagisa · 2018-03-29T13:13:39Z

Also, thanks for writing down all these things. Most of those questions have been floated around during the All-hands, but I ended up not remembering any of them :)

Amanieu · 2018-03-29T14:56:13Z

Regarding setjmp & longjmp. I think it's fine as long as setjmp is called from C code. setjmp isn't a normal function and there are hacks in C compilers to support it (__attribute__((returns_twice))). There are also special semantics for volatile local variables in a function that calls setjmp. Therefore I propose disallowing calling setjmp from Rust code.

However using longjmp to jump over Rust frames should be fine, as long as the user is aware that destructors will not be called, so they will be responsible for the safety implications at the library level.

nagisa · 2018-03-29T18:11:51Z

@Amanieu longjmp is implemented by unwinding on Windows. See rust-lang/rust#48251. Elsewhere, it happens to do the same "remove but do not run destructors" thing, which is something that was declared to be unsound (at least in presence of Drop types within "removed" stuff).

RalfJung · 2018-03-29T21:00:32Z

I don't think trying to make pthread_exit safe or anything else which kills a thread and its stack is reasonable. This makes e.g. Crossbeams' and Rayon's scoped threads unsound.

So, these will have to be unsafe operations. In terms of when they (and also e.g. setjmp/longjmp) are ~~safe~~ sound to use, I'd say they must not "skip over" any stack frame where any type has a Drop implementation. That means no code actually runs during "normal" unwinding. As a consequence, just deallocating these stackframes is behaviorally perfectly equivalent to a panic being raised (at the longjmp place) and then caught again (at the setjmp place) -- right?

RalfJung · 2018-03-29T21:04:02Z

Asynchronous unwinding (from an arbitrary instruction in a Rust function) is probably fine from the language point of view. DWARF & SEH64 have unwinding tables that are precise for each instruction, while SEH32 and SJLJ unwinding push/pop frames atomically with regards to signal handlers. However actual Rust code may not be designed with arbitrary unwinding in mind. The same question as pthread_exit applies here: which parts of libstd are we will to guarantee are "async-unwind-safe"?

Oh no this cannot possibly work. If you unwind e.g. during the execution of x = y, you could leave behind total garbage in x that is half-way the old value and half-way the new one. Rust is only safe to unwind in places where panics could be raised. I think such a restriction is necessary to write any kind of exception-safe unsafe code.

Amanieu · 2018-03-29T22:03:55Z

We know that none of these functions are safe. What we are trying to determine here is: under which conditions is using them (not) UB? Additionally, we are only look at this from the language point of view: everyone will agree that most Rust code assumes that panics don't come from arbitrary instructions and that destructors are executed while unwinding.

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

RalfJung · 2018-03-30T09:15:26Z

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

Ack.

under which conditions is using them (not) UB?

Right, we'll have to decide that. I said my 2 cents above.

jugglerchris · 2018-04-05T11:10:27Z

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

Could there could be some kind of annotation you could use to make this audit easier, at least locally? The compiler must know whether there are any Drop types on the stack at a particular point.

fn may_longjmp() {
    let foo = make_foo();

    // Fail to compile if any destructors would run on unwind (here for foo)
    #[no_drop_on_stack]
    ffi_function_which_may_longjmp();
}

You'd still need to annotate up your call stack to the point where setjmp or similar was called.

comex · 2019-10-16T17:42:57Z

What happens with -C panic=abort?

RalfJung · 2019-10-16T18:06:00Z

I don't think any of the examples in the OP are affected by that?

gnzlbg · 2019-10-17T10:34:40Z

@comex

What happens with -C panic=abort?

I haven't tried, but IIUC on Itanium-C++ ABI thread cancelation uses Unwind_ForceUnwind which means that the stack frames would be unwound even if -C panic=abort (whether that does consistently run destructors or not on C++, I don't know, and on Rust, I don't know either).

On windows, where a longjmp is implemented using exceptions, it does unwind frames even when -C panic=abort. @kyren tested this for rlua, which works with -C panic=abort, but rlua does not have types with destructors in the frames that get unwound. @kyren added types with destructors to it, and if the type is in the same frame that calls an extern "C" function that unwinds, currently, its destructor is not run, but if it is in a different frame, then its destructor is run. This might have to do with how we currently implement things in Rust, and does not necessary match what C++ does. So maybe this is a current implementation "bug" (*), our current tests for this are here and they do not test that the destructor is actually run - the destructors there do nothing, e.g., as opposed to trying to count how many times they are invoked.

Notice that when -C panic=abort we stamp nounwind everywhere, but LLVM says that it is ok for nounwind functions to unwind in certain specific ways.

(*) Note that this isn't a language bug, because we maybe don't provide any guarantees for what happens here at the language level - it'd be only a bug at the implementation level, were we want this to work in certain consistent ways, even if longjmp or thread cancellation over frames with destructors is UB. However, if we say that -C panic=abort code never unwinds, then this might be a language bug, because I don't see how we could enforce this guarantee properly.

RalfJung · 2019-10-17T10:38:43Z

Also see rust-lang/reference#693 (comment) for an approximate model of stack frame and their dropping by @gnzlbg.

comex · 2019-10-17T22:20:13Z

I checked:

On Darwin, pthread_cancel doesn't use unwinding at all; it uses a separate linked list to track pthread_cleanup handlers. As such, it will never run Rust destructors.

On Linux/glibc, pthread_cancel does indeed use _Unwind_ForcedUnwind.

On both OSes, pthread_exit uses the same code path as pthread_cancel, so the behavior is equivalent (unwinding on Linux, not on Darwin).

Here are some test results with both panic=unwind and panic=abort on Linux and macOS – various types of exotic crashes:

https://gist.github.com/comex/401f5c02f066fdea8e06c75a28267247

gnzlbg · 2019-10-17T23:34:36Z

@comex thanks!

On Darwin, pthread_cancel doesn't use unwinding at all; it uses a separate linked list to track pthread_cleanup handlers. As such, it will never run Rust destructors.

This is interesting, because it doesn't work exactly like a longjmp since it runs some pthread_cleanup handlers :D

I tried to visualize all the info and came up with this table:

name	kind	runs destructors?	stoppable?	pausable?	aborts on aborts-on-panic-shims
Rust panic	native	yes	catch	catch+resume	yes
C++ throw	foreign	yes	catch could	catch+resume could	yes
pthread_cancel (linux)	foreign	yes	no	catch+resume could	no
pthread_cancel (osx)	foreign	no	?	?	no
longjmp (linux)	foreign	no	no (setjmp only)	no	no
longjmp (windows-SEH)	foreign	yes	catch could	catch+resume could	no

Instead of specifying the behavior of each of these on each platform, it might be better to specify the behavior of Rust depending on the columns.

If the kind is native, the behavior is well-defined and does ...
If the kind is foreign, the behavior might be undefined, depending on other factors
If the unwind does not run destructors, the behavior is undefined if unwinding deallocates drop types with safety invariants that require running destructors like Pin
If the unwind is not stoppable or pausable, the behavior is undefined if the unwind reaches a thread boundary
If the unwind does not abort on abort-on-panic shims and runs destructors and there are types with destructors in the frames, the behavior is undefined (e.g. calling a nounwind function that ends up unwinding such that destructors run might lead to only some destructors being run, depending on optimizations)
Some guarantees for catch_unwind, for example:
- if the unwind is foreign, the behavior is UB
- or, if the unwind is foreign, catch unwind never catches the unwind

So if you have a platform and want to call some function that panics, you'd just fill in a row in the table, and answer the questions to see when the behavior is defined, and to what.

Something like that might end up reducing the amount of UB to few special cases, e.g.:

deallocating a Pin without calling its destructor
unwinding through a thread boundary with an unwind that cannot be stopped or paused (e.g. a longjmp that's not caught by its setjmp on the current thread)
unwind is not caught by abort-on-panic-shims but runs destructors and there are types with drop glue on frames (e.g. longjmp on windows-seh if the frames have types with destructors)

leaving a lot of useful cases as "well-defined".

JakobDegen · 2023-06-06T16:09:12Z

Closing in favor of #404 which is more specific than this one

nagisa changed the title ~~Thread cancellation, unwinding and their interaction with drops~~ Thread cancellation, asynchronous unwinding exceptions and their interaction with drops Mar 29, 2018

RalfJung transferred this issue from rust-lang/rust-memory-model Oct 16, 2019

RalfJung added the C-open-question Category: An open question that we should revisit label Oct 16, 2019

RalfJung mentioned this issue Oct 16, 2019

mem::forget on stack frames #210

Closed

RalfJung added the A-drop label Oct 16, 2019

RalfJung added A-unwind Topic: related to unwinding and removed T-drop labels Dec 14, 2019

RalfJung mentioned this issue Jan 26, 2020

Document when we *do* guarantee that drop runs rust-lang/nomicon#135

Open

RalfJung added the A-drop Topic: related to dropping label Feb 19, 2020

RalfJung mentioned this issue Feb 19, 2020

Clean shutdown? rwf2/Rocket#180

Closed

matu3ba mentioned this issue Jul 20, 2020

Add Drop::poll_drop_ready for asynchronous destructors rust-lang/rfcs#2958

Closed

JakobDegen closed this as completed Jun 6, 2023

RalfJung mentioned this issue Jun 1, 2023

Is longjmp through rust code UB #404

Open

bjorn3 mentioned this issue Jun 12, 2023

std::thread::JoinHandle::join panics with seccomp-killed threads rust-lang/rust#112521

Open

jneem mentioned this issue Feb 20, 2024

LSP evaluation in the background tweag/nickel#1814

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread cancellation, asynchronous unwinding exceptions and their interaction with drops #211

Thread cancellation, asynchronous unwinding exceptions and their interaction with drops #211

nagisa commented Mar 29, 2018

Amanieu commented Mar 29, 2018

nagisa commented Mar 29, 2018 via email •

edited

Loading

Amanieu commented Mar 29, 2018

Amanieu commented Mar 29, 2018

nagisa commented Mar 29, 2018

nagisa commented Mar 29, 2018 •

edited

Loading

Amanieu commented Mar 29, 2018

nagisa commented Mar 29, 2018 •

edited

Loading

RalfJung commented Mar 29, 2018 •

edited

Loading

RalfJung commented Mar 29, 2018

Amanieu commented Mar 29, 2018

RalfJung commented Mar 30, 2018 •

edited

Loading

jugglerchris commented Apr 5, 2018

comex commented Oct 16, 2019

RalfJung commented Oct 16, 2019

gnzlbg commented Oct 17, 2019 •

edited

Loading

RalfJung commented Oct 17, 2019

comex commented Oct 17, 2019

gnzlbg commented Oct 17, 2019

JakobDegen commented Jun 6, 2023

Thread cancellation, asynchronous unwinding exceptions and their interaction with drops #211

Thread cancellation, asynchronous unwinding exceptions and their interaction with drops #211

Comments

nagisa commented Mar 29, 2018

Amanieu commented Mar 29, 2018

nagisa commented Mar 29, 2018 via email • edited Loading

Amanieu commented Mar 29, 2018

Amanieu commented Mar 29, 2018

nagisa commented Mar 29, 2018

nagisa commented Mar 29, 2018 • edited Loading

Amanieu commented Mar 29, 2018

nagisa commented Mar 29, 2018 • edited Loading

RalfJung commented Mar 29, 2018 • edited Loading

RalfJung commented Mar 29, 2018

Amanieu commented Mar 29, 2018

RalfJung commented Mar 30, 2018 • edited Loading

jugglerchris commented Apr 5, 2018

comex commented Oct 16, 2019

RalfJung commented Oct 16, 2019

gnzlbg commented Oct 17, 2019 • edited Loading

RalfJung commented Oct 17, 2019

comex commented Oct 17, 2019

gnzlbg commented Oct 17, 2019

JakobDegen commented Jun 6, 2023

nagisa commented Mar 29, 2018 via email •

edited

Loading

nagisa commented Mar 29, 2018 •

edited

Loading

nagisa commented Mar 29, 2018 •

edited

Loading

RalfJung commented Mar 29, 2018 •

edited

Loading

RalfJung commented Mar 30, 2018 •

edited

Loading

gnzlbg commented Oct 17, 2019 •

edited

Loading