Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime hangs on exit on a spinlock #13564

Closed
mikedn opened this issue Oct 10, 2019 · 2 comments
Closed

Runtime hangs on exit on a spinlock #13564

mikedn opened this issue Oct 10, 2019 · 2 comments

Comments

@mikedn
Copy link
Contributor

mikedn commented Oct 10, 2019

From time to time the jit diff tool hangs on exit:

ntdll.dll!NtDelayExecution()
KernelBase.dll!SleepEx()
coreclr.dll!SpinLock::SpinToAcquire() Line 270
coreclr.dll!CallCounter::IsCallCountingEnabled(MethodDesc * pMethodDesc) Line 56
coreclr.dll!TieredCompilationManager::GetInitialOptimizationTier(MethodDesc * pMethodDesc) Line 115
coreclr.dll!PrepareCodeConfig::GetJitOptimizationTier(PrepareCodeConfig * config, MethodDesc * methodDesc) Line 1215
coreclr.dll!ETW::MethodLog::SendMethodEvent(MethodDesc * pMethodDesc, unsigned long dwEventOptions, int bIsJit, SString * namespaceOrClassName, SString * methodName, SString * methodSignature, unsigned __int64 pNativeCodeStartAddress, PrepareCodeConfig * pConfig) Line 6402
coreclr.dll!ETW::MethodLog::SendEventsForJitMethodsHelper(LoaderAllocator * pLoaderAllocatorFilter, unsigned long dwEventOptions, int fLoadOrDCStart, int fUnloadOrDCEnd, int fSendMethodEvent, int fSendILToNativeMapEvent, int fGetCodeIds) Line 6889
coreclr.dll!ETW::MethodLog::SendEventsForJitMethods(BaseDomain * pDomainFilter, LoaderAllocator * pLoaderAllocatorFilter, unsigned long dwEventOptions) Line 7046
coreclr.dll!ETW::EnumerationLog::IterateDomain(BaseDomain * pDomain, unsigned long enumerationOptions) Line 7139
coreclr.dll!ETW::EnumerationLog::IterateAppDomain(AppDomain * pAppDomain, unsigned long enumerationOptions) Line 7096
coreclr.dll!ETW::EnumerationLog::EnumerationHelper(Module * moduleFilter, BaseDomain * enumerationOptions, unsigned long) Line 7412
coreclr.dll!ETW::EnumerationLog::ProcessShutdown() Line 5674
coreclr.dll!EEShutDownHelper(int fIsDllUnloading) Line 1327
coreclr.dll!EEShutDown(int fIsDllUnloading) Line 1803
coreclr.dll!EEDllMain(HINSTANCE__ * hInst, unsigned long dwReason, void * lpReserved) Line 1973
coreclr.dll!DllMain(void * hInstance, unsigned long dwReason, void * lpReserved) Line 156
coreclr.dll!CoreDllMain(void * hInstance, unsigned long dwReason, void * lpReserved) Line 107
ntdll.dll!LdrpCallInitRoutine()

This is the only remaining thread in the process and all it does it to spin there. Presumably the spinlock was abandoned by a terminated thread.

Known issue? Perhaps already fixed? The jit utils are using the released 3.0, not the current coreclr build.

@jkotas
Copy link
Member

jkotas commented Oct 10, 2019

cc @kouvel

kouvel referenced this issue in kouvel/coreclr Oct 16, 2019
… during process detach (a form of abrupt shutdown)

Longer-term fix for https://github.com/dotnet/coreclr/issues/27129:
- Etw rundown events sent during process shutdown are currently (and have for a long time) been sent during process detach. By that time, all other threads have been abruptly terminated by the OS, and as a result the state of the system is fundamentally unpredictable.
- In this particular case, locks have been orphaned by threads that have been abruptly terminated, so taking locks is not feasible during processing of rundown events, and if acquiring locks were to be avoided based on such knowledge (not recommended, this would get messy), we'd have to resort to providing information that would not accurately reflect the state, in the events
- I consider any situation where process detach occurs before an opportunity to handle graceful shutdown (that is, the runtime is unaware that a shutdown is about to happen and does not have an opportunity to handle shutdown prior to process detach (before the OS already shuts some things down)), then that is abrupt shutdown and in that scenario all bets are off - in the case of this change, etw rundown events would not be sent
- This change has the following effects:
  - Graceful shutdown such as returning from `Main` or `Environment.Exit()` will send rundown events very slighly earlier than before. Background threads will still be running and there may be other etw events interspersed among rundown events and sent after rundown events.
  - On Windows, Ctrl+C and Ctrl+Break are not handled by the runtime and by default result in abrupt termination. The only indication the runtime gets is the process detach event, by which time the OS has already terminated all other threads
    - When these events are not handled (by the runtime or by the app), this is an abrupt shutdown scenario and rundown events will not be sent
    - When these events are handled by the app and canceled along with `Environment.Exit()`, that converts these events into graceful shutdown (see above). If an app handles these events and chooses to not cancel the event, the event remains unhandled and leads to abrupt shutdown (see immediately above).
  - On Unixes, there is no significant change. SIGTERM is graceful shutdown as described above and there are no similar issues of abrupt shutdown.
- There is an option of sending rundown events upon process detach (when we don't have an opportunity to do so gracefully), but as I described above that will get messy and is not a path that we should be headed down
kouvel referenced this issue in kouvel/coreclr Oct 16, 2019
…essing during abrupt shutdown

Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129
- This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: dotnet#27238.
- This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned.
- The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned.
- The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.
kouvel referenced this issue in kouvel/coreclr Oct 16, 2019
…essing during abrupt shutdown

Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129
- This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: dotnet#27238.
- This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned.
- The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned.
- The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.
kouvel referenced this issue in dotnet/coreclr Oct 18, 2019
… during process detach (a form of abrupt shutdown) (#27238)

Longer-term fix for https://github.com/dotnet/coreclr/issues/27129:
- Etw rundown events sent during process shutdown are currently (and have for a long time) been sent during process detach. By that time, all other threads have been abruptly terminated by the OS, and as a result the state of the system is fundamentally unpredictable.
- In this particular case, locks have been orphaned by threads that have been abruptly terminated, so taking locks is not feasible during processing of rundown events, and if acquiring locks were to be avoided based on such knowledge (not recommended, this would get messy), we'd have to resort to providing information that would not accurately reflect the state, in the events
- I consider any situation where process detach occurs before an opportunity to handle graceful shutdown (that is, the runtime is unaware that a shutdown is about to happen and does not have an opportunity to handle shutdown prior to process detach (before the OS already shuts some things down)), then that is abrupt shutdown and in that scenario all bets are off - in the case of this change, etw rundown events would not be sent
- This change has the following effects:
  - Graceful shutdown such as returning from `Main` or `Environment.Exit()` will send rundown events very slighly earlier than before. Background threads will still be running and there may be other etw events interspersed among rundown events and sent after rundown events.
  - On Windows, Ctrl+C and Ctrl+Break are not handled by the runtime and by default result in abrupt termination. The only indication the runtime gets is the process detach event, by which time the OS has already terminated all other threads
    - When these events are not handled (by the runtime or by the app), this is an abrupt shutdown scenario and rundown events will not be sent
    - When these events are handled by the app and canceled along with `Environment.Exit()`, that converts these events into graceful shutdown (see above). If an app handles these events and chooses to not cancel the event, the event remains unhandled and leads to abrupt shutdown (see immediately above).
  - On Unixes, there is no significant change. SIGTERM is graceful shutdown as described above and there are no similar issues of abrupt shutdown.
- There is an option of sending rundown events upon process detach (when we don't have an opportunity to do so gracefully), but as I described above that will get messy and is not a path that we should be headed down
kouvel referenced this issue in dotnet/coreclr Oct 24, 2019
…w processing during shutdown (#27241)

* Protect against a rare invalid lock acquision attempt during etw processing during abrupt shutdown

Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129
- This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: #27238.
- This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned.
- The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned.
- The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.

* Report tier as unknown when it cannot be determined

* Return unknown only on process detach
@kouvel
Copy link
Member

kouvel commented Oct 24, 2019

@kouvel kouvel closed this as completed Oct 24, 2019
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.1 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants