Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't suspend for debugger while holding the slot backpatching lock #40060

Merged
merged 1 commit into from
Aug 4, 2020

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented Jul 29, 2020

  • Mostly reverted the previous workaround for the issue (commit fc06054)
  • Added a forbid-suspend-for-debugger region in preemptive GC mode
  • Added a crst holder type that acquires a lock and enters the forbid region
  • Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used
  • When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode

Fixes #37278

@kouvel kouvel added this to the 5.0.0 milestone Jul 29, 2020
@kouvel kouvel self-assigned this Jul 29, 2020
@kouvel
Copy link
Member Author

kouvel commented Jul 29, 2020

CC @janvorli @mangod9 @noahfalk

@kouvel kouvel force-pushed the TierDebugFix branch 3 times, most recently from f496527 to 1336c15 Compare July 29, 2020 05:04
@mangod9
Copy link
Member

mangod9 commented Jul 29, 2020

Was the previous fix incomplete which led to the issue re-occurring due to timing?

@kouvel
Copy link
Member Author

kouvel commented Jul 29, 2020

Was the previous fix incomplete which led to the issue re-occurring due to timing?

The underlying issue was introduced when we started using cooperative GC mode for the slot backpatching info data structure. It was not obvious that such an issue would arise. The workaround used in the previous fix was a partial fix to the most problematic occurrences of the issue, we did not come up with a complete fix at the time after a fair bit of discussion, and the workaround was never intended to be a complete fix for the issue. The issue appears to be more problematic in 5.0 due to differences in the ways in which the lock is used. This change should be a complete fix for the issue.

- Mostly reverted the previous workaround for the issue (commit fc06054)
- Added a forbid-suspend-for-debugger region in preemptive GC mode
- Added a crst holder type that acquires a lock and enters the forbid region
- Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used
- When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode

Fixes dotnet#37278
@hoyosjs
Copy link
Member

hoyosjs commented Jul 29, 2020

cc: @gregg-miskelly

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kount! This looks good to me. Given the subtleties in the area I don't have high confidence that I would spot an issue if there was one, but I am OK accepting that risk.

Long term this still feels a bit "princess and the pea" to me. The pea is that we designed a low-level data structure that requires co-op mode to access it, and then we've layered on a bunch of complex locking/suspend logic to resolve the undesirable effects (the mattresses in the analogy). As I recall the original purpose of the co-op mode datastructure was to simplify some lifetime management for state attached to collectible assemblies. My suspicion is that the amount of complexity and performance overhead we have added working around co-op mode outweighs the value we are getting from it. Of course I make no claim that I have carefully evaluated the alternatives, so it is only my hunch at this point.

@kouvel
Copy link
Member Author

kouvel commented Aug 4, 2020

Thanks @noahfalk! Longer-term, avoiding coop mode would avoid some complication, though there are other complications it may not avoid. I haven't seen a startup perf hit from using the coop mode data structure initially or from this change, so there may not be any improvement from using a preemptive mode data structure. The alternative data structure may have to be a separate data structure and along with other changes that would go along with using it, it would be a bit too large/risky for .NET 5 at the moment. In the interest of simplifying some things it would be good longer-term to see if we can switch to a mostly preemptive mode solution.

@kouvel kouvel closed this Aug 4, 2020
@kouvel kouvel reopened this Aug 4, 2020
@kouvel kouvel merged commit 6d98f6d into dotnet:master Aug 4, 2020
@kouvel kouvel deleted the TierDebugFix branch August 4, 2020 22:02
Jacksondr5 pushed a commit to Jacksondr5/runtime that referenced this pull request Aug 10, 2020
…otnet#40060)

- Mostly reverted the previous workaround for the issue (commit fc06054)
- Added a forbid-suspend-for-debugger region in preemptive GC mode
- Added a crst holder type that acquires a lock and enters the forbid region
- Where the slot backpatching lock would be taken where cooperative GC mode may be entered inside that lock, the new crst holder is used
- When a suspend for debugger is requested, a thread in preemptive GC mode that is in the forbid region is considered not yet suspended and is synched later similarly to threads in cooperative GC mode

Fixes dotnet#37278
kouvel added a commit to kouvel/runtime that referenced this pull request Sep 22, 2020
…bid-suspend-for-debugger region

- Followup to dotnet#40060
- In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension.
- Fixed to break the spin-wait by checking whether the thread is in the forbid region

Fix for dotnet#42375 in master
kouvel added a commit to kouvel/runtime that referenced this pull request Sep 22, 2020
… a forbid-suspend-for-debugger region

- Port of dotnet#42587 to 5.0
- Followup to dotnet#40060
- In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension.
- Fixed to break the spin-wait by checking whether the thread is in the forbid region

Fixes dotnet#42375
kouvel added a commit to kouvel/runtime that referenced this pull request Sep 23, 2020
…s in a forbid-suspend-for-debugger region

- Port of dotnet#42587 to 5.0 RC2
- Followup to dotnet#40060
- In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension.
- Fixed to break the spin-wait by checking whether the thread is in the forbid region

Fix for dotnet#42375
kouvel added a commit that referenced this pull request Sep 23, 2020
…bid-suspend-for-debugger region (#42587)

- Followup to #40060
- In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension.
- Fixed to break the spin-wait by checking whether the thread is in the forbid region

Fix for #42375 in master
kouvel added a commit that referenced this pull request Sep 23, 2020
…s in a forbid-suspend-for-debugger region (#42630)

- Port of #42587 to 5.0 RC2
- Followup to #40060
- In short timing windows if a thread is placed in a pending-suspend-for-debugger state while in a forbid-suspend-for-debugger region, from the above PR it would not suspend for the debugger but it was missed that it can also get stuck in a perpetual spin-wait while entering cooperative GC mode. This causes the thread to not suspend for the debugger (VS times out after waiting) and the thread can only progress by unmarking it for debugger suspension.
- Fixed to break the spin-wait by checking whether the thread is in the forbid region

Fix for #42375
kouvel added a commit to kouvel/runtime that referenced this pull request Nov 11, 2020
…ding in some cases

1. When suspending for the debugger is in progress (the debugger is waiting for some threads to reach a safe point for suspension), a thread that is not yet suspended may trigger another runtime suspension. This is currently not allowed because the order of operations conflicts with requirements to send GC events for managed data breakpoints to work correctly when suspending for a GC. Instead, the thread suspends for the debugger first, and after the runtime is resumed, continues suspending for GC.
2. At the same time, if the thread that is not suspended yet is in a forbid-suspend-for-debugger region, it cannot suspend for the debugger, which conflicts with the above scenario, but is currently necessary for the issue fixed by dotnet#40060
3. The current plan is to change managed data breakpoints implementation to pin objects instead of using GC events to track object relocation, and to deprecate the GC events APIs
4. With that, the requirement in #1 goes away, so this change conditions the check to avoid suspending the runtime during a pending suspension for the debugger when GC events are not enabled

- Verified that the latest deadlock seen in dotnet#42375 manifests only when a data breakpoint set and not otherwise
- Combined with dotnet#44471 and a VS update to use that to switch to the pinning mechanism, the deadlock issue seen above should disappear completely
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Func eval unreliable in .NET 5 due to Tiered JIT
5 participants