Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread::park causes undefined behaviour on panic #102398

Closed
joboet opened this issue Sep 28, 2022 · 7 comments · Fixed by #102412
Closed

thread::park causes undefined behaviour on panic #102398

joboet opened this issue Sep 28, 2022 · 7 comments · Fixed by #102412
Labels
A-atomic Area: Atomics, barriers, and sync primitives I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@joboet
Copy link
Member

joboet commented Sep 28, 2022

A lot of code dealing with synchronization assumes that thread::park does not panic. This assumption is however not correct. For instance, a panic occurs if the TLS key used to store the thread info cannot be created, if a new thread id cannot be created, if the creation of a parking primitive fails, if the current time is not available and if the parking operation fails.

The code examples above all use intrusive lists and have to wait on a condition to before destroying the list nodes. If thread::park panics, the nodes will be dropped during stack unwinding without checking that condition, leading to undefined behaviour (some of these cases are in std, so I'm marking this as a soundness problem with std).

Fixing this would either require guarding against panics in each of these cases or ensuring that thread::park never panics.

@rustbot label +I-unsound +A-atomic +T-libs +T-libs-api

@rustbot rustbot added A-atomic Area: Atomics, barriers, and sync primitives I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 28, 2022
@m-ou-se m-ou-se added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 28, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Sep 28, 2022

Thanks for bringing this up!

I agree this is a problem. In general, the pattern with parking is:

while !some_condition {
    thread::park();
}

Unwinding would mean leaving that while loop even though the condition isn't true. While code right after the while loop isn't executed, there are many cases where leaving the current frame before the condition is true is problematic, as all your examples show.

I think it's not reasonable to expect every user of park() to take this subtle pitfall into account. Instead, I believe we should guarantee that park() never unwinds.

@m-ou-se
Copy link
Member

m-ou-se commented Sep 28, 2022

All the situations in which thread::park() can currently panic are very unusual things that are most likely unrecoverable errors. So I think it'd be fine to just never let park() unwind, and instead always abort() on panic.

@DemiMarie
Copy link
Contributor

All the situations in which thread::park() can currently panic are very unusual things that are most likely unrecoverable errors. So I think it'd be fine to just never let park() unwind, and instead always abort() on panic.

I think Rust should allocate the needed resources eagerly when creating a new thread, to avoid this problem.

@joboet
Copy link
Member Author

joboet commented Oct 7, 2022

All the situations in which thread::park() can currently panic are very unusual things that are most likely unrecoverable errors. So I think it'd be fine to just never let park() unwind, and instead always abort() on panic.

I think Rust should allocate the needed resources eagerly when creating a new thread, to avoid this problem.

It actually does most of the time, as thread::spawn always creates a Thread. However, if the thread is created through other means, the Thread struct needs to be lazily allocated, which can of course fail. Either way, we also need to protect against the case where clock_gettime fails - it should not, but who knows what some obscure system might do (the UNIX parking implementation is used on systems like L4RE and 3DS/Horizon).

In general, I think it is easier for maintainability if we just always convert panics to aborts in park/park_timeout rather than going through every system and abstraction to ensure no panics are caused. Still, this is possible to do in the future (as is returning instantly).

@garypen
Copy link

garypen commented Oct 9, 2022

I just tripped over an issue in this area. I'm trying to put together a userspace mutex for learning and it works fine apart from when there is a panic. If there has been a panic, then I get a panic when handling panic and fall over with this stack trace:

thread '<unnamed>' panicked at 'use of std::thread::current() is not possible after the thread's local data has been destroyed', library/std/src/thread/mod.rs:681:35
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_display
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:72:5
   3: core::panicking::panic_str
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:56:5
   4: core::option::expect_failed
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/option.rs:1874:5
   5: core::option::Option<T>::expect
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/option.rs:738:21
   6: std::thread::current
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/thread/mod.rs:681:5
   7: std::thread::park_timeout
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/thread/mod.rs:997:9
   8: futex::Mutex<T>::lock
             at /Users/garypen/dev/futex/src/lib.rs:58:21
   9: <dhat::Alloc as core::alloc::global::GlobalAlloc>::dealloc
             at ./src/lib.rs:1287:51
  10: alloc::alloc::dealloc
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/alloc/src/alloc.rs:107:14
  11: <alloc::alloc::Global as core::alloc::Allocator>::deallocate
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/alloc/src/alloc.rs:244:22
  12: <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/alloc/src/raw_vec.rs:478:22
  13: core::ptr::drop_in_place<alloc::raw_vec::RawVec<(*mut u8,unsafe extern "C" fn(*mut u8))>>
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ptr/mod.rs:487:1
  14: <<alloc::vec::into_iter::IntoIter<T,A> as core::ops::drop::Drop>::drop::DropGuard<T,A> as core::ops::drop::Drop>::drop
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/alloc/src/vec/into_iter.rs:324:94
  15: core::ptr::drop_in_place<<alloc::vec::into_iter::IntoIter<T,A> as core::ops::drop::Drop>::drop::DropGuard<(*mut u8,unsafe extern "C" fn(*mut u8)),alloc::alloc::Global>>
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ptr/mod.rs:487:1
  16: <alloc::vec::into_iter::IntoIter<T,A> as core::ops::drop::Drop>::drop
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/alloc/src/vec/into_iter.rs:335:5
  17: core::ptr::drop_in_place<alloc::vec::into_iter::IntoIter<(*mut u8,unsafe extern "C" fn(*mut u8))>>
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ptr/mod.rs:487:1
  18: std::sys::unix::thread_local_dtor::register_dtor::run_dtors
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/sys/unix/thread_local_dtor.rs:88:13

Note that at 18, some destructors are run, that appear to be releasing Thread local storage. By the time I try to run park_timeout(), at 7, the thread is already invalid (although , I don't know that). I tried:

                    if !std::thread::panicking() {
                        std::thread::park_timeout(SLEEP_DURATION);
                    }

But, I get the same problem. I presume because of raciness between the check for panicking and the panic actually starting.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 11, 2022

@garypen It looks like you're implementing a global allocator using your thread::park_timeout-based mutex? That's going to be problematic, since the thread parking data is stored in a heap allocation (an Arc) that gets deallocated on thread destruction.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Oct 11, 2022
Never panic in `thread::park` and `thread::park_timeout`

fixes rust-lang#102398

``@rustbot`` label +T-libs +T-libs-api
@garypen
Copy link

garypen commented Oct 11, 2022

@m-ou-se Thank you. I have an approach to the problem which works as a spinlock, but I was experimenting to see if I could do something more efficient using park_timeout(). It's one of those where I wasn't sure if it was interesting to report, but I thought I'd mention it.

@bors bors closed this as completed in e0954ca Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants