AtomicFu QoS Problems on iOS #462

stefanhaustein · 2024-07-31T08:24:19Z

Summary

On Darwin (iOS) the kernel must support a lock to handle QoS properly, and a custom built lock using atomics – such as the one in Synchronized.kt – won't fit that model (atomics do not handle QoS). The problem is that priorities are strictly enforced, so priority inversion can easily lead to severe problems (see background section below).

While the implementation quickly does fall back to a POSIX mutex, it's still problematic. Consider the following scenario:

T1 (low prio) takes a lock with a single CAS. There doesn't exist POSIX mutex at all
T2 (high prio) tries to acquire a lock. It's already locked, so T2 promotes internal locking mechanism to POSIX mutex and invokes wait on its conditional variable. (Note: T1 never observed a POSIX lock yet! It will observe it and notify waiters during unlock)
T3 (medium prio) does its job, explicitly taking time share from T1. T2 makes no progress

Proposed Fix

Dispatch queues and iOS XPC have sophisticated QoS support, but those are not suitable replacements for the locking in synchronized.kt.

The two documented locks that offer QoS support by recording ownership for the kernel are os_unfair_lock and pthread_mutex. Native iOS code will generally select between these two depending on the particular requirements of the use case (recursion, fairness, etc.).

For a general purpose lock, pthread_mutex is the reasonable choice. This is used in Android already in Lock.native.kt.

Unfortunately, it doesn't seem feasible to address this in the existing "general" native actual of SynchronizedObject.kt, but it should be feasible to add a MacOS / iOS compatible version that respects the actual.

Another issue might be potential similar locks in the coroutine infrastructure implementation.

Alternatives Considered

Option 1: Use the existing implementation, but skip THIN

This would basically allocate the lock on demand only, addressing the "use the lock pool" request on the pull request.

Risks / Problems

We'd need to prove that the while loops in the implementation don't effectively constitute an unbounded spin lock.

Option 2: Add QoS Donation to the existing implementation

At the point where the lock is upgraded from THIN to FAT, we know the thread holding the THIN lock and could donate QoS to it.

Risks / Problems

We'd need to prove that the while loops in the implementation don't effectively constitute an unbounded spin lock.
This seems much more complicated than the existing pull request – we can re-consider this if we run into actual real world problems.

Background

iOS resolves priority inversions between threads by temporarily promoting the QoS of the lower priority thread to the same level as the higher priority thread being blocked. In order for this promotion to occur, you must use one of the Apple-supplied lock mechanisms that support QoS donation.

Not all locks available in Apple APIs support QoS. For example, spinlocks, semaphores, and condition locks (including NSCondition) do not. Apple has aggressively deprecated their spinlock APIs.

Unfortunately, while the QoS-compatibility of locks has evolved to "common knowledge" in iOS dev, there is no authoritative Apple documentation.

Some public discussions of locks and QoS:

The takeaway of these discussions is that it is possible for a thread or serial queue (these are not exactly the same thing) to block for too long, or even indefinitely, on a lock that does not perform QoS promotion. The most problematic case is one where the low-priority thread holds the lock, but is never serviced because there are always higher QoS threads ready to run but blocking on the lock. Spinlocking is apparently particularly bad at triggering this case.

While deadlocks are probably rare in practice, we have evidence from Apple and other sources that some of our apps exhibit priority inversions when using inappropriate locks.

In order for KMP to generally interop with native iOS code executing at multiple QoS levels, it will need to use locking mechanisms compatible with QoS promotion.

stefanhaustein · 2024-12-13T16:31:12Z

Update:

I have updated the implementation to be cheap when there is no contention (by switching to qos donation) and now also changed it to use a pool for native objects: https://gist.github.com/stefanhaustein/883b07e0e6c0ef17eabede8ba85e220a

This work was done in the context of compose-multiplatform-core -- in order to be able to run their benchmarks. There were also some build setup changes in the meantime, breaking the old PR.

So the next (hopefully quick) steps will be

Figure out how to integrate this into the build again
Small adjustments from compose-multiplatform-core back to atomicfu
New PR

stefanhaustein mentioned this issue Aug 26, 2024

Use pthread_mutex instead of a custom lock on Apple platforms #468

Closed

dkhalanskyjb mentioned this issue Dec 12, 2024

Potentially harfmul SynchronizedObject usages on iOS platform Kotlin/kotlinx.coroutines#4282

Open

stefanhaustein mentioned this issue Dec 19, 2024

Apple QoS support via donation #499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AtomicFu QoS Problems on iOS #462

AtomicFu QoS Problems on iOS #462

stefanhaustein commented Jul 31, 2024 •

edited

Loading

stefanhaustein commented Dec 13, 2024

AtomicFu QoS Problems on iOS #462

AtomicFu QoS Problems on iOS #462

Comments

stefanhaustein commented Jul 31, 2024 • edited Loading

Summary

Proposed Fix

Alternatives Considered

Option 1: Use the existing implementation, but skip THIN

Option 2: Add QoS Donation to the existing implementation

Background

stefanhaustein commented Dec 13, 2024

stefanhaustein commented Jul 31, 2024 •

edited

Loading