-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't suspend for debugger while holding the slot backpatching lock #40060
Conversation
f496527
to
1336c15
Compare
Was the previous fix incomplete which led to the issue re-occurring due to timing? |
The underlying issue was introduced when we started using cooperative GC mode for the slot backpatching info data structure. It was not obvious that such an issue would arise. The workaround used in the previous fix was a partial fix to the most problematic occurrences of the issue, we did not come up with a complete fix at the time after a fair bit of discussion, and the workaround was never intended to be a complete fix for the issue. The issue appears to be more problematic in 5.0 due to differences in the ways in which the lock is used. This change should be a complete fix for the issue. |
- Mostly reverted the previous workaround for the issue (commit fc06054) - Added a forbid-suspend-for-debugger region in preemptive GC mode - Added a crst holder type that acquires a lock and enters the forbid region - Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used - When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode Fixes dotnet#37278
cc: @gregg-miskelly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kount! This looks good to me. Given the subtleties in the area I don't have high confidence that I would spot an issue if there was one, but I am OK accepting that risk.
Long term this still feels a bit "princess and the pea" to me. The pea is that we designed a low-level data structure that requires co-op mode to access it, and then we've layered on a bunch of complex locking/suspend logic to resolve the undesirable effects (the mattresses in the analogy). As I recall the original purpose of the co-op mode datastructure was to simplify some lifetime management for state attached to collectible assemblies. My suspicion is that the amount of complexity and performance overhead we have added working around co-op mode outweighs the value we are getting from it. Of course I make no claim that I have carefully evaluated the alternatives, so it is only my hunch at this point.
Thanks @noahfalk! Longer-term, avoiding coop mode would avoid some complication, though there are other complications it may not avoid. I haven't seen a startup perf hit from using the coop mode data structure initially or from this change, so there may not be any improvement from using a preemptive mode data structure. The alternative data structure may have to be a separate data structure and along with other changes that would go along with using it, it would be a bit too large/risky for .NET 5 at the moment. In the interest of simplifying some things it would be good longer-term to see if we can switch to a mostly preemptive mode solution. |
…otnet#40060) - Mostly reverted the previous workaround for the issue (commit fc06054) - Added a forbid-suspend-for-debugger region in preemptive GC mode - Added a crst holder type that acquires a lock and enters the forbid region - Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used - When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode Fixes dotnet#37278
…bid-suspend-for-debugger region - Followup to dotnet#40060 - In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension. - Fixed to break the spin-wait by checking whether the thread is in the forbid region Fix for dotnet#42375 in master
… a forbid-suspend-for-debugger region - Port of dotnet#42587 to 5.0 - Followup to dotnet#40060 - In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension. - Fixed to break the spin-wait by checking whether the thread is in the forbid region Fixes dotnet#42375
…s in a forbid-suspend-for-debugger region - Port of dotnet#42587 to 5.0 RC2 - Followup to dotnet#40060 - In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension. - Fixed to break the spin-wait by checking whether the thread is in the forbid region Fix for dotnet#42375
…bid-suspend-for-debugger region (#42587) - Followup to #40060 - In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension. - Fixed to break the spin-wait by checking whether the thread is in the forbid region Fix for #42375 in master
…s in a forbid-suspend-for-debugger region (#42630) - Port of #42587 to 5.0 RC2 - Followup to #40060 - In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension. - Fixed to break the spin-wait by checking whether the thread is in the forbid region Fix for #42375
…ding in some cases 1. When suspending for the debugger is in progress (the debugger is waiting for some threads to reach a safe point for suspension), a thread that is not yet suspended may trigger another runtime suspension. This is currently not allowed because the order of operations conflicts with requirements to send GC events for managed data breakpoints to work correctly when suspending for a GC. Instead, the thread suspends for the debugger first, and after the runtime is resumed, continues suspending for GC. 2. At the same time, if the thread that is not suspended yet is in a forbid-suspend-for-debugger region, it cannot suspend for the debugger, which conflicts with the above scenario, but is currently necessary for the issue fixed by dotnet#40060 3. The current plan is to change managed data breakpoints implementation to pin objects instead of using GC events to track object relocation, and to deprecate the GC events APIs 4. With that, the requirement in #1 goes away, so this change conditions the check to avoid suspending the runtime during a pending suspension for the debugger when GC events are not enabled - Verified that the latest deadlock seen in dotnet#42375 manifests only when a data breakpoint set and not otherwise - Combined with dotnet#44471 and a VS update to use that to switch to the pinning mechanism, the deadlock issue seen above should disappear completely
Fixes #37278