This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[3.1] Protect against a rare invalid lock acquision attempt during etw processing during shutdown #27241
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…essing during abrupt shutdown Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129 - This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: dotnet#27238. - This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned. - The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned. - The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.
noahfalk
approved these changes
Oct 18, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Applied feedback from #27242 |
FYI, I don't think you will need this PR. If you submit a fix to 3.0 it should be auto-ported to 3.1. |
Ah ok, I'll check the port and will close it |
brianrob
approved these changes
Oct 20, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
kouvel
changed the title
Protect against a rare invalid lock acquision attempt during etw processing during abrupt shutdown
Protect against a rare invalid lock acquision attempt during etw processing during shutdown
Oct 21, 2019
kouvel
changed the title
Protect against a rare invalid lock acquision attempt during etw processing during shutdown
[3.1] Protect against a rare invalid lock acquision attempt during etw processing during shutdown
Oct 21, 2019
approved for 3.1 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129
Issue
https://github.com/dotnet/coreclr/issues/27129
A recent regression from adding optimization tier info to method JIT events. A lock is taken to get the optimization tier. On shutdown, ETW rundown was always performed after all except one thread had already been abruptly terminated by the OS. At that point, the lock may be orphaned (probably a rare case), leading to a hang during shutdown.
Customer impact
We have not seen this issue occur frequently yet in customer scenarios, and has shown up relatively frequently when using the JIT diff tool while profiling, see issue above. It is a narrow-window timing issue.
Fix description
The fix is to set global state indicating process detach before doing ETW rundown, which indicates that all other threads have been abruptly terminated already, and the state of the system is unreliable, and to avoid taking the offending lock in that case. Instead, during ETW rundown method JIT events would indicate
Unknown
for the optimization tier. IndicatingUnknown
is not ideal but that is the best we can do for now.The fix for 5.0 (#27238) is a more complete long-term fix and has wider implications (PR has more details), so a more targeted fix is being done for 3.0.x and 3.1.
Risk
There is currently no risk from sending
Unknown
optimization tiers for method events, as rundown information sent by the runtime, and optimization tiers in general, are not currently used by PerfView. Once a PR and issue are fixed and the information starts getting used by PerfView in a noticeable way, then methods that were already jitted prior to starting profiling would not show an optimization tier. In my opinion that is relatively low risk.