Replies: 3 comments 3 replies
-
Hi Itamar -- the required memory orderings ought to be better explained. I'm planning on doing this in a separate document. My plan is to:
I like this approach because I think it's hard to reason correctly about memory ordering. Exhaustive verifiers like CDSChecker don't scale up to real-world programs. Non-exhaustive verifiers (C11Checker, ThreadSanitizer) are okay at catching data races, but don't do a good job of catching memory orderings that are too weak. This is only relevant for the Python internals -- not for Python C extensions. The collection thread-safety scheme (that avoids locks for most read operations) is only intended for some Python built-in collections (like I expect Python C extensions to use standard techniques (like locks) to protect shared mutable data, or to move data to thread-local state. |
Beta Was this translation helpful? Give feedback.
-
Makes sense re core. So the issue is that for Python C extensions this is the source of potential small bottlenecks that get smeared across lots of calls. Consider random object like the transaction object of a database adapter. The vast majority of the time it's single-threaded, but it can't assume that is how it'll be used. So now it needs a lock on every API call, whereas before it didn't. And that can add up, every call to C functions potentially is going to be a lock acquire. It would help if there was a super-optimized lock implementation available, like https://webkit.org/blog/6161/locking-in-webkit/ or the Rust equivalent For thread local case... you have to use the slow kind of thread local on Linux, since the faster kinds don't work with dlopen() (which extensions need). And the overhead does add up. |
Beta Was this translation helpful? Give feedback.
-
Also -- IIUC, the tricky optimization to avoid locking on reads isn't really a generic "go fast" optimization. It's mostly only useful for objects that see heavy read traffic from multiple threads simultaneously, and few writes. So like -- module and class dicts. Does that sound right? If an object is mostly accessed from a single thread, then efficient locks like futexes or
Huh, can you elaborate? I think of |
Beta Was this translation helpful? Give feedback.
-
Keeping in mind I'm just learning about lower-level synchronization, so all of this may be wrong—
The current design doc states that collections deal with synchronization requirements by having only the writer hold a lock. But that's not quite what's going on.
Locks have two purposes: prevent concurrent access, and ensure happened-before semantics across threads. Locking only on writes addresses the first concern, but not the second. I.e. there is no guarantee that a reading thread will actually see the writes, or a consistent view of the writes, if it's running on another CPU.
The actual implementation has added in enough use of atomics that I assume you've addressed this (i.e. you have readers rely on an atomic to generate the
happened-before
relationship).But that's not what the design doc says, so it seems like the design doc significantly understates the work needed to get Python C extensions to be thread-safe; you can't just do a writes-only lock, but will need either full-on locking of reads+writes (with corresponding costs in single-threaded mode), or a lock-free design. And there's the work involved, but also acquiring the specialized knowledge do to this correctly.
Which is not to say this project isn't worth doing, it's super-exciting, just... probably worth expanding on the current doc to make the costs clearer (and perhaps motivate thinking about ways to solve this that reduce the burden for C extension maintainers).
Beta Was this translation helpful? Give feedback.
All reactions