-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-105766: Add Locking Around Custom Allocators #105619
gh-105766: Add Locking Around Custom Allocators #105619
Conversation
@vstinner, any objections? I tried to do this in a way that would not penalize the main interpreter or interpreters that share the GIL. Likewise, allocators that only wrap the current allocator should not be affected. |
The rules was (is?) that you need to hold the GIL to call Presumably, the presence of per-interpreter GILs, means that allocators need to use per-interpreter state rather than process-wide state, but I don't see how this enforces that. |
The allocators are process-global, so a per-interpreter GIL wouldn't guard against races between interpreters. Requiring that they be per-interpreter would mean we'd have to change the allocator API. |
If the allocators are process-wide, there would need to be a lock on every call to |
Yeah, we lock around every allocation when a custom, non-wrapper allocator is used in a subinterpreter that has its own GIL. I don't see what else we can do, aside from doing nothing (allowing races). |
Maybe we just ask custom allocators to make sure they are thread-safe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall yes I think something like this PR is needed for the current state of affairs. reusing the main runtime's GIL as the allocator lock is also what I'd assumed would be a natural first implementation.
Python/pylifecycle.c
Outdated
@@ -603,6 +604,9 @@ init_interp_create_gil(PyThreadState *tstate, int own_gil) | |||
if (_PyStatus_EXCEPTION(status)) { | |||
return status; | |||
} | |||
HEAD_LOCK(runtime); | |||
runtime->allocators.num_gils++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for overflow. if the value is already INT_MAX
pre-increment, we need to bail, even if that means SystemError
.
Python/pylifecycle.c
Outdated
@@ -1730,6 +1734,11 @@ finalize_interp_delete(PyInterpreterState *interp) | |||
/* Cleanup auto-thread-state */ | |||
_PyGILState_Fini(interp); | |||
|
|||
_PyRuntimeState *runtime = interp->runtime; | |||
HEAD_LOCK(runtime); | |||
runtime->allocators.num_gils--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an assert(runtime->allocators.num_gils > 0);
before this.
Include/internal/pycore_pymem.h
Outdated
@@ -30,6 +30,11 @@ struct _pymem_allocators { | |||
debug_alloc_api_t obj; | |||
} debug; | |||
PyObjectArenaAllocator obj_arena; | |||
int num_gils; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest unsigned int
_PyMem_MallocLocked(void *ctx, size_t size) | ||
{ | ||
PyMemAllocatorEx *wrapped = (PyMemAllocatorEx *)ctx; | ||
if (_PyRuntime.allocators.num_gils > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You lock updates (writes) to this value. Is it safe to use without atomic access or a lock? (requiring an atomic read here would presumably be a performance hit?).
Rather than always checking... could the logic that does num_gils++
with the (main runtime) lock held also do:
... num_gils++;
if (num_gils == 2 && !has_locking_wrapper(...)) {
maybe_add_locking_wrapper(...);
}
such that the wrapped pointer switch happens upon first creation of an additional gil while the main runtime gil (the only gil prior to the current code running) is still held.
I'm not sure it is worth doing the opposite during finalization. Once wrapped due to per subinterpreter GILs, just stay wrapped. At least the wrappers won't have this conditional anymore.
I suspect this thought is either overoptimization... or actually necessary to avoid locking/atomic access to num_gils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea. I'll try it out.
When you're done making the requested changes, leave the comment: |
note the discussion Eric started: https://discuss.python.org/t/what-to-do-about-custom-allocators-and-the-gil/30327 |
I'm closing this. Instead, we'll specify that allocators must be thread-safe (at least when isolated subinterpreters are in play). |
The "mem" and "object" allocators are documented as dependent on the GIL. However, after per-interpreter GIL landed we weren't enforcing that. This fixes that by using wrapping all allocations/frees with a dedicated runtime-global lock, but only if necessary. Note that we actually use the main interpreter's GIL as that lock, but only for interpreter's that have their own GIL. Doing so closely matches the original behavior.