Runtime hangs on exit on a spinlock #13564

mikedn · 2019-10-10T18:56:40Z

From time to time the jit diff tool hangs on exit:

ntdll.dll!NtDelayExecution()
KernelBase.dll!SleepEx()
coreclr.dll!SpinLock::SpinToAcquire() Line 270
coreclr.dll!CallCounter::IsCallCountingEnabled(MethodDesc * pMethodDesc) Line 56
coreclr.dll!TieredCompilationManager::GetInitialOptimizationTier(MethodDesc * pMethodDesc) Line 115
coreclr.dll!PrepareCodeConfig::GetJitOptimizationTier(PrepareCodeConfig * config, MethodDesc * methodDesc) Line 1215
coreclr.dll!ETW::MethodLog::SendMethodEvent(MethodDesc * pMethodDesc, unsigned long dwEventOptions, int bIsJit, SString * namespaceOrClassName, SString * methodName, SString * methodSignature, unsigned __int64 pNativeCodeStartAddress, PrepareCodeConfig * pConfig) Line 6402
coreclr.dll!ETW::MethodLog::SendEventsForJitMethodsHelper(LoaderAllocator * pLoaderAllocatorFilter, unsigned long dwEventOptions, int fLoadOrDCStart, int fUnloadOrDCEnd, int fSendMethodEvent, int fSendILToNativeMapEvent, int fGetCodeIds) Line 6889
coreclr.dll!ETW::MethodLog::SendEventsForJitMethods(BaseDomain * pDomainFilter, LoaderAllocator * pLoaderAllocatorFilter, unsigned long dwEventOptions) Line 7046
coreclr.dll!ETW::EnumerationLog::IterateDomain(BaseDomain * pDomain, unsigned long enumerationOptions) Line 7139
coreclr.dll!ETW::EnumerationLog::IterateAppDomain(AppDomain * pAppDomain, unsigned long enumerationOptions) Line 7096
coreclr.dll!ETW::EnumerationLog::EnumerationHelper(Module * moduleFilter, BaseDomain * enumerationOptions, unsigned long) Line 7412
coreclr.dll!ETW::EnumerationLog::ProcessShutdown() Line 5674
coreclr.dll!EEShutDownHelper(int fIsDllUnloading) Line 1327
coreclr.dll!EEShutDown(int fIsDllUnloading) Line 1803
coreclr.dll!EEDllMain(HINSTANCE__ * hInst, unsigned long dwReason, void * lpReserved) Line 1973
coreclr.dll!DllMain(void * hInstance, unsigned long dwReason, void * lpReserved) Line 156
coreclr.dll!CoreDllMain(void * hInstance, unsigned long dwReason, void * lpReserved) Line 107
ntdll.dll!LdrpCallInitRoutine()

This is the only remaining thread in the process and all it does it to spin there. Presumably the spinlock was abandoned by a terminated thread.

Known issue? Perhaps already fixed? The jit utils are using the released 3.0, not the current coreclr build.

The text was updated successfully, but these errors were encountered:

jkotas · 2019-10-10T20:59:20Z

cc @kouvel

… during process detach (a form of abrupt shutdown) Longer-term fix for https://github.com/dotnet/coreclr/issues/27129: - Etw rundown events sent during process shutdown are currently (and have for a long time) been sent during process detach. By that time, all other threads have been abruptly terminated by the OS, and as a result the state of the system is fundamentally unpredictable. - In this particular case, locks have been orphaned by threads that have been abruptly terminated, so taking locks is not feasible during processing of rundown events, and if acquiring locks were to be avoided based on such knowledge (not recommended, this would get messy), we'd have to resort to providing information that would not accurately reflect the state, in the events - I consider any situation where process detach occurs before an opportunity to handle graceful shutdown (that is, the runtime is unaware that a shutdown is about to happen and does not have an opportunity to handle shutdown prior to process detach (before the OS already shuts some things down)), then that is abrupt shutdown and in that scenario all bets are off - in the case of this change, etw rundown events would not be sent - This change has the following effects: - Graceful shutdown such as returning from `Main` or `Environment.Exit()` will send rundown events very slighly earlier than before. Background threads will still be running and there may be other etw events interspersed among rundown events and sent after rundown events. - On Windows, Ctrl+C and Ctrl+Break are not handled by the runtime and by default result in abrupt termination. The only indication the runtime gets is the process detach event, by which time the OS has already terminated all other threads - When these events are not handled (by the runtime or by the app), this is an abrupt shutdown scenario and rundown events will not be sent - When these events are handled by the app and canceled along with `Environment.Exit()`, that converts these events into graceful shutdown (see above). If an app handles these events and chooses to not cancel the event, the event remains unhandled and leads to abrupt shutdown (see immediately above). - On Unixes, there is no significant change. SIGTERM is graceful shutdown as described above and there are no similar issues of abrupt shutdown. - There is an option of sending rundown events upon process detach (when we don't have an opportunity to do so gracefully), but as I described above that will get messy and is not a path that we should be headed down

…essing during abrupt shutdown Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129 - This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: dotnet#27238. - This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned. - The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned. - The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.

… during process detach (a form of abrupt shutdown) (#27238) Longer-term fix for https://github.com/dotnet/coreclr/issues/27129: - Etw rundown events sent during process shutdown are currently (and have for a long time) been sent during process detach. By that time, all other threads have been abruptly terminated by the OS, and as a result the state of the system is fundamentally unpredictable. - In this particular case, locks have been orphaned by threads that have been abruptly terminated, so taking locks is not feasible during processing of rundown events, and if acquiring locks were to be avoided based on such knowledge (not recommended, this would get messy), we'd have to resort to providing information that would not accurately reflect the state, in the events - I consider any situation where process detach occurs before an opportunity to handle graceful shutdown (that is, the runtime is unaware that a shutdown is about to happen and does not have an opportunity to handle shutdown prior to process detach (before the OS already shuts some things down)), then that is abrupt shutdown and in that scenario all bets are off - in the case of this change, etw rundown events would not be sent - This change has the following effects: - Graceful shutdown such as returning from `Main` or `Environment.Exit()` will send rundown events very slighly earlier than before. Background threads will still be running and there may be other etw events interspersed among rundown events and sent after rundown events. - On Windows, Ctrl+C and Ctrl+Break are not handled by the runtime and by default result in abrupt termination. The only indication the runtime gets is the process detach event, by which time the OS has already terminated all other threads - When these events are not handled (by the runtime or by the app), this is an abrupt shutdown scenario and rundown events will not be sent - When these events are handled by the app and canceled along with `Environment.Exit()`, that converts these events into graceful shutdown (see above). If an app handles these events and chooses to not cancel the event, the event remains unhandled and leads to abrupt shutdown (see immediately above). - On Unixes, there is no significant change. SIGTERM is graceful shutdown as described above and there are no similar issues of abrupt shutdown. - There is an option of sending rundown events upon process detach (when we don't have an opportunity to do so gracefully), but as I described above that will get messy and is not a path that we should be headed down

…w processing during shutdown (#27241) * Protect against a rare invalid lock acquision attempt during etw processing during abrupt shutdown Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129 - This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: #27238. - This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned. - The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned. - The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x. * Report tier as unknown when it cannot be determined * Return unknown only on process detach

kouvel · 2019-10-24T22:40:49Z

Fixed by:

master - Process/send etw rundown events only during graceful shutdown and not during process detach (a form of abrupt shutdown) coreclr#27238
3.1 - [3.1] Protect against a rare invalid lock acquision attempt during etw processing during shutdown coreclr#27241

Did not meet the bar for 3.0.

kouvel closed this as completed Oct 24, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 3.1 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime hangs on exit on a spinlock #13564

Runtime hangs on exit on a spinlock #13564

mikedn commented Oct 10, 2019

jkotas commented Oct 10, 2019

kouvel commented Oct 24, 2019

Runtime hangs on exit on a spinlock #13564

Runtime hangs on exit on a spinlock #13564

Comments

mikedn commented Oct 10, 2019

jkotas commented Oct 10, 2019

kouvel commented Oct 24, 2019