Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-105766: Add Locking Around Custom Allocators #105619

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Jun 10, 2023

The "mem" and "object" allocators are documented as dependent on the GIL. However, after per-interpreter GIL landed we weren't enforcing that. This fixes that by using wrapping all allocations/frees with a dedicated runtime-global lock, but only if necessary. Note that we actually use the main interpreter's GIL as that lock, but only for interpreter's that have their own GIL. Doing so closely matches the original behavior.

@ericsnowcurrently
Copy link
Member Author

@vstinner, any objections? I tried to do this in a way that would not penalize the main interpreter or interpreters that share the GIL. Likewise, allocators that only wrap the current allocator should not be affected.

@markshannon
Copy link
Member

The rules was (is?) that you need to hold the GIL to call PyObject_Malloc, etc.
So why do we now need any locks (apart from the GIL)?

Presumably, the presence of per-interpreter GILs, means that allocators need to use per-interpreter state rather than process-wide state, but I don't see how this enforces that.

@ericsnowcurrently
Copy link
Member Author

The allocators are process-global, so a per-interpreter GIL wouldn't guard against races between interpreters. Requiring that they be per-interpreter would mean we'd have to change the allocator API.

@ericsnowcurrently ericsnowcurrently marked this pull request as ready for review June 12, 2023 15:50
@markshannon
Copy link
Member

If the allocators are process-wide, there would need to be a lock on every call to ob_malloc, which would be disastrous for performance. So that doesn't sound right

@ericsnowcurrently
Copy link
Member Author

Yeah, we lock around every allocation when a custom, non-wrapper allocator is used in a subinterpreter that has its own GIL. I don't see what else we can do, aside from doing nothing (allowing races).

@ericsnowcurrently
Copy link
Member Author

Maybe we just ask custom allocators to make sure they are thread-safe?

@ericsnowcurrently ericsnowcurrently changed the title gh-100227: Add Locking Around Custom Allocators gh-105766: Add Locking Around Custom Allocators Jun 14, 2023
Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall yes I think something like this PR is needed for the current state of affairs. reusing the main runtime's GIL as the allocator lock is also what I'd assumed would be a natural first implementation.

@@ -603,6 +604,9 @@ init_interp_create_gil(PyThreadState *tstate, int own_gil)
if (_PyStatus_EXCEPTION(status)) {
return status;
}
HEAD_LOCK(runtime);
runtime->allocators.num_gils++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for overflow. if the value is already INT_MAX pre-increment, we need to bail, even if that means SystemError.

@@ -1730,6 +1734,11 @@ finalize_interp_delete(PyInterpreterState *interp)
/* Cleanup auto-thread-state */
_PyGILState_Fini(interp);

_PyRuntimeState *runtime = interp->runtime;
HEAD_LOCK(runtime);
runtime->allocators.num_gils--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an assert(runtime->allocators.num_gils > 0); before this.

@@ -30,6 +30,11 @@ struct _pymem_allocators {
debug_alloc_api_t obj;
} debug;
PyObjectArenaAllocator obj_arena;
int num_gils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest unsigned int

_PyMem_MallocLocked(void *ctx, size_t size)
{
PyMemAllocatorEx *wrapped = (PyMemAllocatorEx *)ctx;
if (_PyRuntime.allocators.num_gils > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lock updates (writes) to this value. Is it safe to use without atomic access or a lock? (requiring an atomic read here would presumably be a performance hit?).

Rather than always checking... could the logic that does num_gils++ with the (main runtime) lock held also do:

... num_gils++;
if (num_gils == 2 && !has_locking_wrapper(...)) {
    maybe_add_locking_wrapper(...);
}

such that the wrapped pointer switch happens upon first creation of an additional gil while the main runtime gil (the only gil prior to the current code running) is still held.

I'm not sure it is worth doing the opposite during finalization. Once wrapped due to per subinterpreter GILs, just stay wrapped. At least the wrappers won't have this conditional anymore.

I suspect this thought is either overoptimization... or actually necessary to avoid locking/atomic access to num_gils.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea. I'll try it out.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@gpshead
Copy link
Member

gpshead commented Jul 27, 2023

@ericsnowcurrently
Copy link
Member Author

I'm closing this. Instead, we'll specify that allocators must be thread-safe (at least when isolated subinterpreters are in play).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants