Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Unix named mutex crash during some race conditions #36268

Merged
merged 2 commits into from
May 13, 2020

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented May 12, 2020

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the Mutex class.

  • When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
  • There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
  • For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
    • When a process releases its last reference to the mutex
      • If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
      • If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
        • The mutex gets another reference from within the same process
        • The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
    • The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
  • There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fix for #34271 in master
Closes #28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class.
- When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
- There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
- For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
  - When a process releases its last reference to the mutex
    - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
    - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
      - The mutex gets another reference from within the same process
      - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
  - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
- There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fix for dotnet#34271 in master
Closes dotnet#28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail
@janvorli
Copy link
Member

@kouvel the System.Threading.Tests.MutexTests.CrossProcess_NamedMutex_ProtectedFileAccessAtomic test is failing in the CI on all platforms with:

System.AggregateException : One or more errors occurred. (Remote process failed with an unhandled exception.)

Child exception:
  System.Threading.WaitHandleCannotBeOpenedException: No handle of the given name exists.
   at System.Threading.Mutex.OpenExisting(String name) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Mutex.cs:line 47
   at System.Threading.Tests.MutexTests.<>c.<CrossProcess_NamedMutex_ProtectedFileAccessAtomic>b__15_1(String m, String f) in /_/src/libraries/System.Threading/tests/MutexTests.cs:line 417
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) in /_/src/mono/netcore/System.Private.CoreLib/src/System/Reflection/RuntimeMethodInfo.cs:line 359

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

_ASSERTE(m_lockCount != 0);

bool hasRefFromLockOwnerThread = m_hasRefFromLockOwnerThread;
if (hasRefFromLockOwnerThread)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - this condition is not necessary, the m_hasRefFromLockOwnerThread ends up being set to false in all cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the check and clearing to below. Accesses to the field are synchronized by the creation/deletion process lock, so it doesn't need to be read before releasing the mutex's lock.

@kouvel kouvel merged commit f3a3e64 into dotnet:master May 13, 2020
@kouvel kouvel deleted the NamedMutexFix branch May 13, 2020 16:37
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System.Threading.Tests.MutexTests tests failing on Linux
2 participants