Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Enforce a deadlock-free locking order for Mutexes. #5539

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

jimblandy
Copy link
Member

If debug_assertions or the "validate-locks" feature are enabled, change wgpu-core to use a wrapper around parking_lot::Mutex that checks for potential deadlocks.

@jimblandy
Copy link
Member Author

jimblandy commented Apr 15, 2024

This has already found a cycle, although it might take a lot of threads to hit it. If you uncomment out the indicated code in wgpu-core/src/lock/rank.rs, then you'll get this error mesage:

error[E0391]: cycle detected when const checking `lock::rank::COMMAND_BUFFER_DATA`
  --> wgpu-core/src/lock/rank.rs:28:35
   |
28 |                   followers: 0 $( | $follower.bit )*,
   |                                     ^^^^^^^^^
...
39 | / define_lock_ranks! {
40 | |     rank COMMAND_BUFFER_DATA 0 followed by {
41 | |         DEVICE_USAGE_SCOPES,
42 | |         SHARED_TRACKER_INDEX_ALLOCATOR_INNER,
...  |
83 | |     rank SURFACE_PRESENTATION 22 followed by { };
84 | | }
   | |_- in this macro invocation
   |
note: ...which requires const checking `lock::rank::BUFFER_MAP_STATE`...
  --> wgpu-core/src/lock/rank.rs:28:35
   |
28 |                   followers: 0 $( | $follower.bit )*,
   |                                     ^^^^^^^^^
...
39 | / define_lock_ranks! {
40 | |     rank COMMAND_BUFFER_DATA 0 followed by {
41 | |         DEVICE_USAGE_SCOPES,
42 | |         SHARED_TRACKER_INDEX_ALLOCATOR_INNER,
...  |
83 | |     rank SURFACE_PRESENTATION 22 followed by { };
84 | | }
   | |_- in this macro invocation
note: ...which requires const checking `lock::rank::DEVICE_PENDING_WRITES`...
  --> wgpu-core/src/lock/rank.rs:28:35
   |
28 |                   followers: 0 $( | $follower.bit )*,
   |                                     ^^^^^^^^^
...
39 | / define_lock_ranks! {
40 | |     rank COMMAND_BUFFER_DATA 0 followed by {
41 | |         DEVICE_USAGE_SCOPES,
42 | |         SHARED_TRACKER_INDEX_ALLOCATOR_INNER,
...  |
83 | |     rank SURFACE_PRESENTATION 22 followed by { };
84 | | }
   | |_- in this macro invocation
note: ...which requires const checking `lock::rank::DEVICE_LIFE_TRACKER`...
  --> wgpu-core/src/lock/rank.rs:28:35
   |
28 |                   followers: 0 $( | $follower.bit )*,
   |                                     ^^^^^^^^^
...
39 | / define_lock_ranks! {
40 | |     rank COMMAND_BUFFER_DATA 0 followed by {
41 | |         DEVICE_USAGE_SCOPES,
42 | |         SHARED_TRACKER_INDEX_ALLOCATOR_INNER,
...  |
83 | |     rank SURFACE_PRESENTATION 22 followed by { };
84 | | }
   | |_- in this macro invocation
note: ...which requires const checking `lock::rank::DEVICE_TEMP_SUSPECTED`...
  --> wgpu-core/src/lock/rank.rs:28:35
   |
28 |                   followers: 0 $( | $follower.bit )*,
   |                                     ^^^^^^^^^
...
39 | / define_lock_ranks! {
40 | |     rank COMMAND_BUFFER_DATA 0 followed by {
41 | |         DEVICE_USAGE_SCOPES,
42 | |         SHARED_TRACKER_INDEX_ALLOCATOR_INNER,
...  |
83 | |     rank SURFACE_PRESENTATION 22 followed by { };
84 | | }
   | |_- in this macro invocation
   = note: ...which again requires const checking `lock::rank::COMMAND_BUFFER_DATA`, completing the cycle
   = note: cycle used when running analysis passes on this crate
   = note: see https://rustc-dev-guide.rust-lang.org/overview.html#queries and https://rustc-dev-guide.rust-lang.org/query.html for more information
   = note: this error originates in the macro `define_lock_ranks` (in Nightly builds, run with -Z macro-backtrace for more info)

@jimblandy jimblandy force-pushed the core-lock-validation branch from 2999c50 to c4bceef Compare April 15, 2024 05:07
@jimblandy jimblandy changed the title [core] Enforce a dead-lock free locking order for Mutexes. [core] Enforce a deadlock-free locking order for Mutexes. Apr 15, 2024
@jimblandy jimblandy force-pushed the core-lock-validation branch from c4bceef to 0f1f20b Compare April 15, 2024 05:16
@ErichDonGubler ErichDonGubler self-assigned this Apr 15, 2024
@ErichDonGubler ErichDonGubler added the area: correctness We're behaving incorrectly label Apr 15, 2024
@ErichDonGubler
Copy link
Member

I'll take on review for this one. I've already done quite a bit of review with Jim as he's explained how the locking mechanism works. We'll both work to polish this up, and document and test properly. 🫡

@jimblandy jimblandy force-pushed the core-lock-validation branch from b7de28c to 9e99c27 Compare April 15, 2024 18:25
@jimblandy
Copy link
Member Author

jimblandy commented Apr 16, 2024

One wrinkle: This has already detected a deadlock, and I haven't pushed very far yet, so I'd guess there'd be more to come. We'll have to fix them before we can actually land this.

[Edit: no, we can just make it only enabled when the feature is turned on, leaving both debug and release builds unaffected. This means we're not getting test coverage for the instrumented lock types, but at least we can have it in tree, rather than sitting on a branch as an indefinitely long list of deadlocks gets addressed.]

Copy link
Member

@ErichDonGubler ErichDonGubler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not gonna commit to a review outcome yet, since this isn't out of draft yet. :^)

wgpu-core/src/lock/validating.rs Outdated Show resolved Hide resolved
wgpu-core/src/lock/rank.rs Outdated Show resolved Hide resolved
wgpu-core/src/lock/validating.rs Outdated Show resolved Hide resolved
wgpu-core/src/lock/rank.rs Show resolved Hide resolved
wgpu-core/src/lock/mod.rs Outdated Show resolved Hide resolved
wgpu-core/src/lock/mod.rs Outdated Show resolved Hide resolved
wgpu-core/src/lock/mod.rs Outdated Show resolved Hide resolved
wgpu-core/src/lock/non_validating.rs Outdated Show resolved Hide resolved
wgpu-core/src/lock/validating.rs Outdated Show resolved Hide resolved
@jimblandy jimblandy force-pushed the core-lock-validation branch 2 times, most recently from 5bd0658 to f30f47b Compare April 18, 2024 04:57
@jimblandy jimblandy marked this pull request as ready for review April 18, 2024 04:57
@jimblandy jimblandy requested a review from a team as a code owner April 18, 2024 04:57
@jimblandy jimblandy force-pushed the core-lock-validation branch from f30f47b to 1e2f85f Compare April 18, 2024 05:02
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely amazing stuff!

Have a docs nit, but I love the docs, they are very clear

wgpu-core/src/lock/ranked.rs Outdated Show resolved Hide resolved
@jimblandy
Copy link
Member Author

@cwfitzgerald Thanks for the review! @ErichDonGubler and I agreed this should have unit tests, so I'll work on that tomorrow and then ask Erich to take another look.

@cwfitzgerald
Copy link
Member

Sounds good, could @ErichDonGubler just mark it as request changes so I don't accidentally merge something I shouldn't :)

Copy link
Member

@ErichDonGubler ErichDonGubler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting on feedback I've already left. 😊

@jimblandy jimblandy force-pushed the core-lock-validation branch 2 times, most recently from 7d22dd6 to 70aaf44 Compare April 20, 2024 03:58
@jimblandy jimblandy mentioned this pull request Apr 20, 2024
9 tasks
@jimblandy jimblandy force-pushed the core-lock-validation branch 2 times, most recently from 0097585 to d723d15 Compare April 20, 2024 23:53
@jimblandy
Copy link
Member Author

jimblandy commented Apr 20, 2024

@ErichDonGubler, I took most of your suggestions but you had one that I didn't want to take. I've explained and left it unresolved; let me know what you think.

@jimblandy jimblandy force-pushed the core-lock-validation branch 2 times, most recently from 88840af to 510f139 Compare April 22, 2024 04:15
Copy link
Member

@ErichDonGubler ErichDonGubler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nitpick feedback left. Approving to enable progress, since I trust @jimblandy to duly consider the feedback I've left.

@jimblandy jimblandy force-pushed the core-lock-validation branch 2 times, most recently from 57f184a to 2aa7798 Compare April 22, 2024 18:15
@jimblandy
Copy link
Member Author

jimblandy commented Apr 22, 2024

@ErichDonGubler Hmm. Plot twist.

  1. Just today I remarked on Rust's wisdom in flagging the use of feature = "foo" in the code when there's no "foo" feature in Cargo.toml. This is completely untrue: rustc just treats them as if they're not enabled.

  2. We want our CI to run tests with --all-features, which would enable "validate-locks", which causes a panic in ordinary operation.

I'm not sure what the best way forward is.

  • The easiest is to omit the feature from Cargo.toml. If people want to experiment it, they'll have to add it back themselves.

  • We could just comment it out, but that has the disadvantage that it's not clear what you need to uncomment to make things work. The nice thing about putting feature = "validate-locks", even without a Cargo.toml entry, is that its use marks everything that needs to be adjusted, and it's pretty obvious what to do if you want to try it out.

Other ideas??

@jimblandy jimblandy force-pushed the core-lock-validation branch from 2aa7798 to dae6efd Compare April 22, 2024 20:41
@jimblandy
Copy link
Member Author

jimblandy commented Apr 22, 2024

In chat we decided to go with cfg(validate_locks).

Edit: Err, cfgs are compilation-wide, so actually it's cfg(wgpu_validate_locks).

If `debug_assertions` or the `"validate-locks"` feature are enabled,
change `wgpu-core` to use a wrapper around `parking_lot::Mutex` that
checks for potential deadlocks.

At the moment, `wgpu-core` does contain deadlocks, so the ranking in
the `lock::rank` module is incomplete, in the interests of keeping it
acyclic. gfx-rs#5572 tracks the work needed to complete the ranking.
@jimblandy jimblandy force-pushed the core-lock-validation branch from dae6efd to d8a6f55 Compare April 22, 2024 20:43
@jimblandy jimblandy merged commit b3c5a6f into gfx-rs:trunk Apr 23, 2024
25 checks passed
@jimblandy jimblandy deleted the core-lock-validation branch April 23, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: correctness We're behaving incorrectly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants