Synchronize dbghelp with the standard library #230

alexcrichton · 2019-07-31T16:09:29Z

The dbghelp library on MSVC is not safe to use concurrently from multiple threads. There is no synchronization between the standard library's usage of RUST_BACKTRACE=1 and this crate generating a backtrace, meaning that it's not technically possible to 100% safely use this crate on MSVC.

Fixing this will likely require some form of global lock exported by the standard library specifically for backtraces and specifically for MSVC (it can be a noop on all other platforms). This library would then, when compiled on crates.io, need to coordinate with the standard library in terms of locking to ensure that only one implementation is using dbghelp at a time.

The text was updated successfully, but these errors were encountered:

arsing · 2019-08-13T22:43:22Z

This also causes heap corruption crashes currently if both libstd and a user crate take backtraces at approximately the same time. This happens with one of our crates' tests very reliably, where one test happens to run an assert!() (which triggers libstd backtrace collection) at about the same time as another test creates a failure error (which collect its own backtrace).

I happened to spot StackWalkEx in the crashing thread's backtrace, but I assume it would happen even with the older backtrace crate that solely used StackWalk64 ? So just downgrading the backtrace crate would not be useful unless we also used an older Rust that didn't use the backtrace crate?

The original code would create a TcpListener and throw it away, which meant that multiple calls to `get_unused_port` could end up with the same port. Also, enable all the edgelet-docker tests that had been disabled on Windows due to the original issue (even though it was not a Windows-specific problem). However this means that the number of tests in the edgelet-docker crate running at the same time is much higher, so it's more likely for the tests to hit the issue described in rust-lang/backtrace-rs#230 (comment) Specifically, any test that panics at the same time as another test is using `failure` to collect a backtrace has a chance of crashing the entire test binary with STATUS_HEAP_CORRUPTION on Windows, if the `RUST_BACKTRACE` env var is set. To avoid this, I've modified the tests that use `#[should_panic]` + `Result::unwrap` to assert that an operation must fail to instead use `Result::unwrap_err`.

alexcrichton · 2019-08-13T23:15:21Z

Yes this unfortunately has always been a bug with the backtrace crate, no version has ever had this bug fixed. This is definitely a case where binaries can segfault because the documentation of Windows explicitly states that it needs synchronization, and we're not doing that! It "just so happens to work" most of the time because it's relatively rare to get two backtraces at the same time.

The original code would create a TcpListener and throw it away, which meant that multiple calls to `get_unused_port` could end up with the same port. Also, enable all the edgelet-docker tests that had been disabled on Windows due to the original issue (even though it was not a Windows-specific problem). However this means that the number of tests in the edgelet-docker crate running at the same time is much higher, so it's more likely for the tests to hit the issue described in rust-lang/backtrace-rs#230 (comment) Specifically, any test that panics at the same time as another test is using `failure` to collect a backtrace has a chance of crashing the entire test binary with STATUS_HEAP_CORRUPTION on Windows, if the `RUST_BACKTRACE` env var is set. To avoid this, I've modified the tests that assert that an operation must fail via `#[should_panic]` + `Result::unwrap` to instead use `Result::unwrap_err` I've only done this for edgelet-docker since it has a large number of concurrent tests so it has a very high chance of hitting this, in comparison to the other crates that use `#[should_panic]`

The original code would create a `TcpListener` and throw it away, which meant that multiple calls to `get_unused_port` could end up with the same port. Also, enable all the edgelet-docker tests that had been disabled on Windows due to the original issue (even though it was not a Windows-specific problem). However this means that the number of tests in the edgelet-docker crate running at the same time is much higher, so it's more likely for the tests to hit the issue described in rust-lang/backtrace-rs#230 (comment) Specifically, any test that panics at the same time as another test is using `failure` to collect a backtrace has a chance of crashing the entire test binary with STATUS_HEAP_CORRUPTION on Windows, if the `RUST_BACKTRACE` env var is set. To avoid this, I've modified the tests that assert that an operation must fail via `#[should_panic]` + `Result::unwrap` to instead use `Result::unwrap_err`

alexcrichton · 2019-08-15T15:23:36Z

I've posted a possible mitigation for what I'd hope is 90%-ish of cases at #241

On Windows `dbghelp.dll` is required to be used from only one thread at a time, and while this crate provides that guarantee this crate does not synchronize with itself in the standard library. This commit solves this issue by using a Windows-specific trick by creating a named mutex for synchronization. When this crate is updated in the standard library it means that the named mutex here will be shared with the standard library, so the standard library and this crate should be sharing the same synchronization primitive and should be able to coordinate calls to `dbghelp.dll`. More details are included in the commit itself, but this should... Closes #230

This commit updates the `backtrace` crate from 0.3.34 to 0.3.35. The [included set of changes][changes] for this update mostly includes some gimli-related improvements (not relevant for the standard library) but critically includes a fix for rust-lang/backtrace-rs#230. The standard library will not aqcuire a session-local lock whenever a backtrace is generated on Windows to allow external synchronization with the `backtrace` crate itself, allowing `backtrace` to be safely used while other threads may be panicking. [changes]: rust-lang/backtrace-rs@0.3.34...0.3.35

std: Update `backtrace` crate dependency This commit updates the `backtrace` crate from 0.3.34 to 0.3.35. The [included set of changes][changes] for this update mostly includes some gimli-related improvements (not relevant for the standard library) but critically includes a fix for rust-lang/backtrace-rs#230. The standard library will not aqcuire a session-local lock whenever a backtrace is generated on Windows to allow external synchronization with the `backtrace` crate itself, allowing `backtrace` to be safely used while other threads may be panicking. [changes]: rust-lang/backtrace-rs@0.3.34...0.3.35

arsing mentioned this issue Aug 13, 2019

Fix race between tests that try to bind to a port Azure/iotedge#1566

Merged

alexcrichton mentioned this issue Aug 15, 2019

Fix some cases of concurrent panics and backtraces #241

Closed

alexcrichton mentioned this issue Aug 16, 2019

Fix concurrent panics and backtraces #242

Merged

alexcrichton closed this as completed in #242 Aug 19, 2019

alexcrichton mentioned this issue Aug 19, 2019

std: Update backtrace crate dependency rust-lang/rust#63700

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize dbghelp with the standard library #230

Synchronize dbghelp with the standard library #230

alexcrichton commented Jul 31, 2019

arsing commented Aug 13, 2019

alexcrichton commented Aug 13, 2019

alexcrichton commented Aug 15, 2019

Synchronize dbghelp with the standard library #230

Synchronize dbghelp with the standard library #230

Comments

alexcrichton commented Jul 31, 2019

arsing commented Aug 13, 2019

alexcrichton commented Aug 13, 2019

alexcrichton commented Aug 15, 2019