-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime hangs on exit on a spinlock #13564
Milestone
Comments
cc @kouvel |
kouvel
referenced
this issue
in kouvel/coreclr
Oct 16, 2019
… during process detach (a form of abrupt shutdown) Longer-term fix for https://github.com/dotnet/coreclr/issues/27129: - Etw rundown events sent during process shutdown are currently (and have for a long time) been sent during process detach. By that time, all other threads have been abruptly terminated by the OS, and as a result the state of the system is fundamentally unpredictable. - In this particular case, locks have been orphaned by threads that have been abruptly terminated, so taking locks is not feasible during processing of rundown events, and if acquiring locks were to be avoided based on such knowledge (not recommended, this would get messy), we'd have to resort to providing information that would not accurately reflect the state, in the events - I consider any situation where process detach occurs before an opportunity to handle graceful shutdown (that is, the runtime is unaware that a shutdown is about to happen and does not have an opportunity to handle shutdown prior to process detach (before the OS already shuts some things down)), then that is abrupt shutdown and in that scenario all bets are off - in the case of this change, etw rundown events would not be sent - This change has the following effects: - Graceful shutdown such as returning from `Main` or `Environment.Exit()` will send rundown events very slighly earlier than before. Background threads will still be running and there may be other etw events interspersed among rundown events and sent after rundown events. - On Windows, Ctrl+C and Ctrl+Break are not handled by the runtime and by default result in abrupt termination. The only indication the runtime gets is the process detach event, by which time the OS has already terminated all other threads - When these events are not handled (by the runtime or by the app), this is an abrupt shutdown scenario and rundown events will not be sent - When these events are handled by the app and canceled along with `Environment.Exit()`, that converts these events into graceful shutdown (see above). If an app handles these events and chooses to not cancel the event, the event remains unhandled and leads to abrupt shutdown (see immediately above). - On Unixes, there is no significant change. SIGTERM is graceful shutdown as described above and there are no similar issues of abrupt shutdown. - There is an option of sending rundown events upon process detach (when we don't have an opportunity to do so gracefully), but as I described above that will get messy and is not a path that we should be headed down
kouvel
referenced
this issue
in kouvel/coreclr
Oct 16, 2019
…essing during abrupt shutdown Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129 - This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: dotnet#27238. - This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned. - The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned. - The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.
kouvel
referenced
this issue
in kouvel/coreclr
Oct 16, 2019
…essing during abrupt shutdown Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129 - This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: dotnet#27238. - This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned. - The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned. - The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.
kouvel
referenced
this issue
in dotnet/coreclr
Oct 18, 2019
… during process detach (a form of abrupt shutdown) (#27238) Longer-term fix for https://github.com/dotnet/coreclr/issues/27129: - Etw rundown events sent during process shutdown are currently (and have for a long time) been sent during process detach. By that time, all other threads have been abruptly terminated by the OS, and as a result the state of the system is fundamentally unpredictable. - In this particular case, locks have been orphaned by threads that have been abruptly terminated, so taking locks is not feasible during processing of rundown events, and if acquiring locks were to be avoided based on such knowledge (not recommended, this would get messy), we'd have to resort to providing information that would not accurately reflect the state, in the events - I consider any situation where process detach occurs before an opportunity to handle graceful shutdown (that is, the runtime is unaware that a shutdown is about to happen and does not have an opportunity to handle shutdown prior to process detach (before the OS already shuts some things down)), then that is abrupt shutdown and in that scenario all bets are off - in the case of this change, etw rundown events would not be sent - This change has the following effects: - Graceful shutdown such as returning from `Main` or `Environment.Exit()` will send rundown events very slighly earlier than before. Background threads will still be running and there may be other etw events interspersed among rundown events and sent after rundown events. - On Windows, Ctrl+C and Ctrl+Break are not handled by the runtime and by default result in abrupt termination. The only indication the runtime gets is the process detach event, by which time the OS has already terminated all other threads - When these events are not handled (by the runtime or by the app), this is an abrupt shutdown scenario and rundown events will not be sent - When these events are handled by the app and canceled along with `Environment.Exit()`, that converts these events into graceful shutdown (see above). If an app handles these events and chooses to not cancel the event, the event remains unhandled and leads to abrupt shutdown (see immediately above). - On Unixes, there is no significant change. SIGTERM is graceful shutdown as described above and there are no similar issues of abrupt shutdown. - There is an option of sending rundown events upon process detach (when we don't have an opportunity to do so gracefully), but as I described above that will get messy and is not a path that we should be headed down
kouvel
referenced
this issue
in dotnet/coreclr
Oct 24, 2019
…w processing during shutdown (#27241) * Protect against a rare invalid lock acquision attempt during etw processing during abrupt shutdown Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129 - This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: #27238. - This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned. - The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned. - The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x. * Report tier as unknown when it cannot be determined * Return unknown only on process detach
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
From time to time the jit diff tool hangs on exit:
This is the only remaining thread in the process and all it does it to spin there. Presumably the spinlock was abandoned by a terminated thread.
Known issue? Perhaps already fixed? The jit utils are using the released 3.0, not the current coreclr build.
The text was updated successfully, but these errors were encountered: