-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Deadlock on mono_loader_lock in mono_class_create_from_typedef #93686
Comments
Good catch! |
Related to / duplicate of #51864 This is hard |
@uweigand this will be quite tricky to fix and I think even if we put something together, it is highly unlikely that we will backport the fix in upstream net8.0 as it's bound to be risky (any time we make big changes to I would encourage, to whatever extent possible, to fix this in user code by rearranging the calling code to trigger assembly loading early. (I realize in this case I'm proposing that you change MSBuild... sorry.) |
I did reproduce the hang issue on dotnet 8 on different Redhat beaker machines with different RHEL distros. I was able to reproduce for around 10 or 12 hang situations and in each of the case, when I attached gdb and looked at the backtrace information, I see the same backtrace as attached in the file.. I did work with Ulrich and local team in IBM, and Ulrich helped us to figure out that its the exact same issue as this one. I have attached the logfile with all the details of all the process and threads backtrace.. |
@Vishwanatha-HD has now found an instance of this deadlock in the .NET 6 source-build. Unfortunately, while one of the involved locks is also the Again, we have a deadlock between two of msbuild's threads, this time a dedicated event logging thread and some thread pool thread. The event logging thread
The thread pool thread
|
Hi All, |
@uweigand overall how frequent are these deadlocks? I tried to prototype one idea for a fix (essentially avoid loader callbacks into managed code while holding the global loader lock by pre-loading any assemblies mentioned by a a MonoClass before taking the lock), and it was very invasive, was likely to result in overall performance degradation and also never quite converged on a correct solution (I could still see the loader lock held when loading assemblies - and I ran out of time on the experiment to fix them all). I'm going to think about some alternate approaches during .NET 9, but I'm not certain there is any reasonable approach here.
// FIXME Locking here is somewhat historical due to mono_register_jit_icall_wrapper taking loader lock.
// atomic_compare_exchange should suffice.
mono_loader_lock ();
mono_jit_lock ();
if (!callinfo->wrapper) {
callinfo->wrapper = p;
}
mono_jit_unlock ();
mono_loader_unlock (); This will likely just shift around the specific circumstances of the deadlock, however. It wont' fix the fundamental problem. |
Related to dotnet#93686 While this doesn't eliminate all deadlocks related to the global loader lock and managed locks, it removes one unneeded use of the loader lock. The wrapper (and trampoline) of a JIT icall are only ever set from NULL to non-NULL. We can use atomics to deal with races instad of double checked locking. This was not the case historically, because the JIT info was dynamically allocated - so we used the loader lock to protect the integrity of the hash table
Related to dotnet#93686 While this doesn't eliminate all deadlocks related to the global loader lock and managed locks, it removes one unneeded use of the loader lock. The wrapper (and trampoline) of a JIT icall are only ever set from NULL to non-NULL. We can use atomics to deal with races instad of double checked locking. This was not the case historically, because the JIT info was dynamically allocated - so we used the loader lock to protect the integrity of the hash table
Related to dotnet#93686 While this doesn't eliminate all deadlocks related to the global loader lock and managed locks, it removes one unneeded use of the loader lock. The wrapper (and trampoline) of a JIT icall are only ever set from NULL to non-NULL. We can use atomics to deal with races instad of double checked locking. This was not the case historically, because the JIT info was dynamically allocated - so we used the loader lock to protect the integrity of the hash table
Related to dotnet#93686 While this doesn't eliminate all deadlocks related to the global loader lock and managed locks, it removes one unneeded use of the loader lock. The wrapper (and trampoline) of a JIT icall are only ever set from NULL to non-NULL. We can use atomics to deal with races instad of double checked locking. This was not the case historically, because the JIT info was dynamically allocated - so we used the loader lock to protect the integrity of the hash table
I've now done a bit of code-flow analysis to try and figure out the places where Mono invokes a C# routine while holding the loader lock. The following list contains functions that take the loader lock, and lists all calls to subroutines while holding the lock that potentially invoke a C# routine on some path. This may not be 100% accurate (there may be false positives as in many cases, the invocation of the C# routine is conditional, and it might happen that at this call site the condition is never true; and there may be false negatives in case I missed part of the CFG due to indirect calls or macro magic), but it does show that this problem might be more widespread that I originally thought. I've done the same analysis on the code after applying the "ready levels" patch #95927, but unfortunately this doesn't look significantly different - even when accounting for the fact that some of the recursive class loader calls under the lock no longer load the parent classes, most of these functions still have other code paths where C# is invoked.
As an aside, it seems unclear what exactly the loader lock is protecting - many of these uses of the lock should probably use some other serialization mechanism (if any is needed at all) ... |
Related to dotnet#93686 While this doesn't eliminate all deadlocks related to the global loader lock and managed locks, it removes one unneeded use of the loader lock. The wrapper (and trampoline) of a JIT icall are only ever set from NULL to non-NULL. We can use atomics to deal with races instad of double checked locking. This was not the case historically, because the JIT info was dynamically allocated - so we used the loader lock to protect the integrity of the hash table
…lines (#104038) * [mini] Use atomics, instead of loader lock, for JIT wrappers Related to #93686 While this doesn't eliminate all deadlocks related to the global loader lock and managed locks, it removes one unneeded use of the loader lock. The wrapper (and trampoline) of a JIT icall are only ever set from NULL to non-NULL. We can use atomics to deal with races instead of double checked locking. This was not the case historically, because the JIT info was dynamically allocated - so we used the loader lock to protect the integrity of the hash table
I have encountered a similar issue. In my case, two different threads are JIT compiling the same method. One of the threads' call chain goes through mono_class_create_from_typedef and cause deadlock. So, I would like to ask if the fix for this issue is still in progress? 😣 |
We've seen a deadlock in msbuild when performing a source-build of rc2 natively on s390x (using the Mono runtime). In a nutshell, the deadlock occurs due to a lock-order inversion between the Mono JIT's
mono_loader_lock
and the managedMicrosoft.Build.BackEnd.Logging.LoggingService._lockObject
.Specifically, we have two involved threads, the main thread and a thread pool thread.
The main thread runs
NuGet.Build.Tasks.Console.Program:Main
which callsNuGet.Build.Tasks.Console.MSBuildStaticGraphRestore:LoadProjects
which callsMicrosoft.Build.Evaluation.ProjectCollection:.ctor
which callsMicrosoft.Build.BackEnd.Logging.LoggingService:RegisterLogger
which TAKESMicrosoft.Build.BackEnd.Logging.LoggingService._lockObject
and callsMicrosoft.Build.Evaluation.ProjectCollection/ReusableLogger:RegisterForEvents
Microsoft.Build.BackEnd.Logging.EventSourceSink:add_BuildStarted
gets invokedmono_class_create_from_typedef
which TAKESmono_loader_lock
The thread pool thread runs
NuGet.Build.Tasks.Console.ConsoleLoggingQueue:Process
which callsNuGet.Build.Tasks.ConsoleOutLogMessage:ToJson
which callsSystem.Collections.Concurrent.ConcurrentDictionary'2<TKey_REF, TValue_REF>:GetOrAdd
Newtonsoft.Json.Serialization.DefaultContractResolver:CreateContract
gets invokedmono_class_create_from_typedef
which TAKESmono_loader_lock
mono_metadata_interfaces_from_typedef_full
, which loads for the first timeSystem.Linq.Expressions.dll
Microsoft.Build.BackEnd.Components.RequestBuilder.AssemblyLoadsTracker:CurrentDomainOnAssemblyLoad
which callsMicrosoft.Build.BackEnd.Logging.LoggingService:ProcessLoggingEvent
which TAKESMicrosoft.Build.BackEnd.Logging.LoggingService._lockObject
It seems to me the root cause of the deadlock is that
mono_class_create_from_typedef
holdsmono_loader_lock
across function calls that may trigger invoking managed code (the assembly load hooks) which might do anything - this doesn't seem a good idea.CC - @directhex @lambdageek @vargaz @akoeplinger
FYI - @giritrivedi @alhad-deshpande @janani66 @omajid @tmds
The text was updated successfully, but these errors were encountered: