Fix Unix named mutex crash during some race conditions #36268

kouvel · 2020-05-12T07:16:27Z

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the Mutex class.

When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
- When a process releases its last reference to the mutex
  - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
  - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
    - The mutex gets another reference from within the same process
    - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
- The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fix for #34271 in master
Closes #28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class. - When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior - There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown - For this corner case I settled on what seems like a decent solution and compatible with older runtimes: - When a process releases its last reference to the mutex - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases: - The mutex gets another reference from within the same process - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior - There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize Fix for dotnet#34271 in master Closes dotnet#28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail

janvorli · 2020-05-12T10:23:41Z

@kouvel the System.Threading.Tests.MutexTests.CrossProcess_NamedMutex_ProtectedFileAccessAtomic test is failing in the CI on all platforms with:

System.AggregateException : One or more errors occurred. (Remote process failed with an unhandled exception.)

Child exception:
  System.Threading.WaitHandleCannotBeOpenedException: No handle of the given name exists.
   at System.Threading.Mutex.OpenExisting(String name) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Mutex.cs:line 47
   at System.Threading.Tests.MutexTests.<>c.<CrossProcess_NamedMutex_ProtectedFileAccessAtomic>b__15_1(String m, String f) in /_/src/libraries/System.Threading/tests/MutexTests.cs:line 417
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) in /_/src/mono/netcore/System.Private.CoreLib/src/System/Reflection/RuntimeMethodInfo.cs:line 359

janvorli

LGTM, thank you!

janvorli · 2020-05-12T10:56:41Z

src/coreclr/src/pal/src/synchobj/mutex.cpp

    _ASSERTE(m_lockCount != 0);

+    bool hasRefFromLockOwnerThread = m_hasRefFromLockOwnerThread;
+    if (hasRefFromLockOwnerThread)


A nit - this condition is not necessary, the m_hasRefFromLockOwnerThread ends up being set to false in all cases.

I have moved the check and clearing to below. Accesses to the field are synchronized by the creation/deletion process lock, so it doesn't need to be read before releasing the mutex's lock.

… (avoid two guids in name)

kouvel added the area-PAL-coreclr label May 12, 2020

kouvel added this to the 5.0 milestone May 12, 2020

kouvel requested a review from janvorli May 12, 2020 07:16

kouvel self-assigned this May 12, 2020

This was referenced May 12, 2020

[3.1] Fix Unix named mutex crash during some race conditions dotnet/coreclr#28039

Closed

[3.1] Test-only followup for "Fix Unix named mutex crash during some race conditions" dotnet/corefx#42917

Closed

janvorli approved these changes May 12, 2020

View reviewed changes

kouvel mentioned this pull request May 12, 2020

[Mono] Named mutexes are not working cross-process #36307

Open

Address feedback, disable a test on Mono, slightly simplify same test…

c67b108

… (avoid two guids in name)

safern mentioned this pull request May 13, 2020

System.Diagnostics.Tests.EventLogSourceCreationTests failing on PRs #36135

Open

kouvel merged commit f3a3e64 into dotnet:master May 13, 2020

kouvel deleted the NamedMutexFix branch May 13, 2020 16:37

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Unix named mutex crash during some race conditions #36268

Fix Unix named mutex crash during some race conditions #36268

kouvel commented May 12, 2020

janvorli commented May 12, 2020

janvorli left a comment

janvorli May 12, 2020

kouvel May 12, 2020

Fix Unix named mutex crash during some race conditions #36268

Fix Unix named mutex crash during some race conditions #36268

Conversation

kouvel commented May 12, 2020

janvorli commented May 12, 2020

janvorli left a comment

Choose a reason for hiding this comment

janvorli May 12, 2020

Choose a reason for hiding this comment

kouvel May 12, 2020

Choose a reason for hiding this comment