[subinterpreters] Make the PyGILState API compatible with subinterpreters #59956

ncoghlan · 2012-08-21T05:18:01Z

BPO	15751
Nosy	@mhammond, @jcea, @ronaldoussoren, @ncoghlan, @pitrou, @vstinner, @phsilva, @asvetlov, @ericsnowcurrently, @soltysh, @seberg, @erlend-aasland, @Felk, @h-vetinari

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-08-21.05:18:00.735>
labels = ['expert-subinterpreters', 'expert-C-API', 'type-feature', '3.10']
title = '[subinterpreters] Make the PyGILState API compatible with subinterpreters'
updated_at = <Date 2022-01-07.18:32:22.492>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2022-01-07.18:32:22.492>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['C API', 'Subinterpreters']
creation = <Date 2012-08-21.05:18:00.735>
creator = 'ncoghlan'
dependencies = []
files = []
hgrepos = []
issue_num = 15751
keywords = []
message_count = 49.0
messages = ['168742', '168759', '168764', '168766', '168768', '168769', '168771', '168772', '168776', '168779', '168780', '168787', '168789', '168790', '168792', '168812', '168815', '168816', '168818', '168832', '169008', '169010', '169013', '169014', '169278', '169280', '169284', '169286', '169310', '169329', '169330', '169332', '169333', '169334', '169352', '169354', '169359', '198115', '206016', '206017', '206042', '206048', '261779', '368885', '368904', '368906', '375134', '380325', '409995']
nosy_count = 17.0
nosy_names = ['mhammond', 'jcea', 'ronaldoussoren', 'ncoghlan', 'pitrou', 'vstinner', 'phsilva', 'grahamd', 'asvetlov', 'python-dev', 'eric.snow', 'maciej.szulik', 'seberg', 'erlendaasland', 'felk', 'h-vetinari', 'gabrielhae']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue15751'
versions = ['Python 3.10']

Linked PRs

ncoghlan · 2012-08-21T05:18:00Z

Currently, modules that use the PyGILState* APIs cannot be used with mod_wsgi, as mod_wsgi uses the subinterpreter support.

Graham Dumpleton and I spent some time discussing this at PyCon AU 2012, and we believe that the incompatibility can be resolved with a single API in the core: a function that mod_wsgi can call to set the interpreter used by the GIL state APIs to implicitly create new thread states.

This is still only a seed of an idea (and it's entirely possible we've missed something), but it's a place to start in resolving this longstanding incompatibility.

grahamd · 2012-08-21T11:56:35Z

Just to clarify. One can still tell WSGI applications under mod_wsgi to run in the main interpreter and in that case modules using PyGILState* do work. By default though, sub interpreters are used as noted.

The mechanism for forcing use of main interpreter is the directive:

WSGIApplicationGroup %{GLOBAL}

Some details about this issue can be found in:

http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Python_Simplified_GIL_State_API

pitrou · 2012-08-21T12:56:22Z

a function that mod_wsgi can call to set the interpreter used by the
GIL state APIs to implicitly create new thread states.

How would it work?

ncoghlan · 2012-08-21T13:21:28Z

It would twiddle the autoInterpreterState and autoTLSkey entries in the pystate.c global variables to point to a different subinterpreter.

As I understand the situation, mod_wsgi doesn't need arbitrary externally created threads to be able to call back into arbitrary subinterpreters, it just needs to be able to direct externally created threads in a process to a subinterpreter other than the main one.

Graham, looking at the current impl - have you experimented with just calling _PyGILState_Init() with the interpreter state and current thread state for the desired subinterpreter to see what happens?

I think the new method could just be a cleaner combination of _PyGILState_ReInit and _PyGILState_Init. If I'm right, then calling _PyGILState_Init should convert the current crashes and deadlocks into a relatively less harmful memory leak (since the old entry in the TLS won't get deleted properly).

ncoghlan · 2012-08-21T13:25:51Z

Graham, even better would be if you could try the following combination:

_PyGILState_Fini();
_PyGILState_Init(si, st);

(where si and st are the interpreter state and thread state for the target subinterpreter)

If a new PyGILState_SwitchInterpreter API is going to be able to solve this in 3.4, then I believe those private APIs should be enough to make it possible in *current* versions.

If those private APIs *aren't* enough, then I'm missing something and this isn't going to be as easy as I thought.

pitrou · 2012-08-21T13:28:07Z

As I understand the situation, mod_wsgi doesn't need arbitrary
externally created threads to be able to call back into arbitrary
subinterpreters, it just needs to be able to direct externally created
threads in a process to a subinterpreter other than the main one.

But how does it know that the right externally created thread will point
to the right subinterpreter? The OS is free to schedule threads as it
desires.

ncoghlan · 2012-08-21T14:06:07Z

Just as they do today, all externally created threads will still go to *one* interpreter when they hit PyGILState_Ensure(). It's just that interpreter won't be the main one.

Since the whole point of the PyGILState API is to support threads that don't have a previously created thread state, there's no getting around the requirement to have a single blessed interpreter that handles all externally created threads in a given process.

It will be up to mod_wsgi (and any other embedding application that uses the new function) to make sure it calls this at a time when there aren't any existing calls to PyGILState that would be disrupted. (Assuming we can't figure out a locking scheme that *ensures* no such threads are running when the switch occurs)

pitrou · 2012-08-21T14:10:23Z

Just as they do today, all externally created threads will still go to
*one* interpreter when they hit PyGILState_Ensure(). It's just that
interpreter won't be the main one.

Uh but how does it solve the issue? (unless you create a mod_wsgi app
with only a single worker thread)

ncoghlan · 2012-08-21T14:33:43Z

My understanding of the mod_wsgi architecture is that it uses subinterpreters to maintain a persistent process, while still providing a relatively pristine interpreter state to handle each new request. This means even when you're using multiple processes with a single request handling thread per process, you're running a subinterpreter unless you explicitly configure mod_wsgi to always run in the main interpreter (which, I believe, will result in additional state persistence across requests).

The proposed API change can only fix scenarios where *at a given point in time*, *all* PyGILState_Ensure calls should be directed to a particular subinterpreter. The target subinterpreter may change *later*, but there still cannot be two desired targets within the same process at the same time.

However, at the moment, PyGILState doesn't even allow that - all externally created threads are handled in the *main* interpreter even if that isn't what the embedding application wants.

ncoghlan · 2012-08-21T14:44:03Z

Sorry, I mischaracterised the way mod_wsgi works slightly. However, my understanding is still that the scope of this particular fix is merely to allow all external threads to be redirected to a different subinterpreter at various times over the life of a process. It does not need to allow different external threads to be redirected to different subinterpreters.

(Note: I am assuming that any hooks Apache/mod_wsgi has into external thread creation could already just create an appropriate thread state if that was the desired behaviour. It may be I'm incorrect on this, and what Graham really wants is the ability to change the target interpreter state just for the current thread. However, if that's what he wants, then there's additional background info I need on mod_wsgi and its ability to influence thread creation within a process, because I didn't get the impression on the weekend that that is what he was after)

pitrou · 2012-08-21T14:47:26Z

My understanding of the mod_wsgi architecture is that it uses
subinterpreters to maintain a persistent process, while still
providing a relatively pristine interpreter state to handle each new
request.

I don't think that's true. On hg.python.org, the hglookup application
keeps a cached mapping of changeset ids to repo URLs, which wouldn't be
possible if its interpreter was re-bootstrapped for each new request.

ncoghlan · 2012-08-21T15:23:56Z

s/slightly/completely/ (I believe my description of the core problem was right, but my description of why that problem exists was wrong - it actually has to do with the way mod_wsgi handles virtual hosts and endpoints)

If we expose an official way to switch the destination of the PyGILState APIs, then what it means is that mod_wsgi can, over the lifecycle of a single process in the pool, *switch* the virtual host and WSGI endpoint that process is handling by changing the active interpreter. There are still some extension modules where that won't work (because they create persistent threads that periodically call back into Python, so they may still end up calling back in to the wrong interpreter), but it will allow those that just do callbacks from external worker threads (or other operations that are similarly bound by the lifecycle of a request) to start working properly.

pitrou · 2012-08-21T15:51:24Z

If we expose an official way to switch the destination of the
PyGILState APIs, then what it means is that mod_wsgi can, over the
lifecycle of a single process in the pool, *switch* the virtual host
and WSGI endpoint that process is handling by changing the active
interpreter.

I don't understand what that means. It's the OS which switches threads,
including interpreter threads.

(by the way, the real fix to the GILState vs. subinterpreters issue
would be to create new API functions which take an interpreter as
argument, so that you have a distinct TLS key per interpreter)

ncoghlan · 2012-08-21T16:08:01Z

Umm, no. The whole point of the GILState API is that you can call it from a thread which knows *nothing* about Python.

It will then use the *process global* state in the pystate.c statics to initialise that thread with a Python thread state. Currently, that thread state will always be for the main Python interpreter for the process.

All Graham wants is an officially supported way to change which interpreter the pystate.c globals reference *without* shutting down and reinitialising Python completely.

There are going to be limitations on how effective this will be - it still won't support *all* extension modules that use the PyGILState APIs. It will, however, support many more of them than the status quo (which is zero, unless you force your WSGI app to use the main interpreter, which has its own problems).

And you absolutely can control when the OS switches threads - controlling that is what synchronisation primitives are *for*.

pitrou · 2012-08-21T16:27:32Z

Umm, no. The whole point of the GILState API is that you can call it
from a thread which knows *nothing* about Python.

No to what? Any sane callback API allows to pass it some user data, that
user data can just as well include the pointer to the desired
interpreter. Really, that's the right way to do it. Since a new API is
required, we can indeed add tweaks to the current API until the
migration is done, but I'm quite sure any proper solution to the problem
requires explicitly passing the interpreter.

There are going to be limitations on how effective this will be - it
still won't support *all* extension modules that use the PyGILState
APIs. It will, however, support many more of them than the status quo

I still don't understand how that allows to "support some extension
modules" (which ones?).
A typical mod_wsgi setup uses several threads per daemon process, and
each thread (AFAIU) uses a separate subinterpreter for its WSGI
application instance (Graham, is that false?). Therefore, an
externally-launched thread can relate to *any* of these subintepreters,
which are themselves scheduled by the OS.

And you absolutely can control when the OS switches threads -
controlling that is what synchronisation primitives are *for*.

I don't think mod_wsgi has access to enough hooks or information to do
that satisfyingly (like the OS would do). Besides, I don't understand
how mod_wsgi could have control over the scheduling of
externally-launched threads. Therefore, an externally-launched thread
could be scheduled at any time, and not only after the "right"
interpreter thread was interrupted.

grahamd · 2012-08-21T22:14:31Z

In both embedded mode and daemon mode of mod_wsgi, albeit how thread pool is managed is different, there is a fixed number of threads with those being dedicated to handling web requests.

On a request arriving next available thread from the thread pool handles accepting of request at C code level, that thread may then map to any WSGI application and so any sub interpreter, or even the main interpreter.

Thus there is no one to one mapping between thread and (sub)interpreter.

The way the mod_wsgi code works now is that when it knows it will be calling into the main interpreter, it uses PyGILState_Ensure(). If not, it will use a thread state for that thread specific to the sub interpreter it is calling in to. At the end of the request, the thread state is remembered and not thrown away so that thread locals still work for that thread across requests for that sub interpreter.

Thus, there can be more than one thread state per thread, but this is fine so long as it is only used against the sub interpreter it was created for.

This is actually an enforced requirement of Python, because if you create more than one thread state for a thread for the same sub interpreter, or even an additional one for the main interpreter when there is also the auto TLS, then Python will die if you compile and run it is in debug mode.

Now, since mod_wsgi always knows which interpreter it is calling into, the intent was that there was this single API call so that mod_wsgi could say that at this time, this thread is going to be calling into that interpreter. It could then just call PyGILState_Ensure().

Any third party module then which uses the simplistic calling sequence of calling PyGILState_Release() on exiting Python code and thence within the same thread calling PyGILState_Ensure() when coming back into Python with a callback will work, as mod_wsgi has specified the interpreter context for that thread at that time.

As pointed out, if a third party module was creating its own background threads at C level and calling PyGILState_Ensure() when calling back into Python code, this could pose a problem. This could also be an issue for Python created background threads.

In the case of the latter, if a Python thread is created in a specific sub interpreter, it should automatically designate for that thread that that is its interpreter context, so if it calls out and does the Release/Ensure dance, that it goes back into the same sub interpreter.

The C initiated thread is a bit more complicated though and may not be solvable, but a lot of the main third party modules which don't work in sub interpreters, such as lxml, don't use background threads, so the simplistic approach means that will work at least.

So, in summary, saw a single API call which allowed designation of which interpreter a thread is operating against, overriding the implicit default of the main interpreter. PyGILState API will need to manage a set of interpreter states for each interpreter, with potential for more than one thread state for a thread due to a thread being able to call into multiple interpreters at different times.

pitrou · 2012-08-21T22:31:53Z

Le mardi 21 août 2012 à 22:14 +0000, Graham Dumpleton a écrit :

Any third party module then which uses the simplistic calling sequence
of calling PyGILState_Release() on exiting Python code and thence
within the same thread calling PyGILState_Ensure() when coming back
into Python with a callback will work, as mod_wsgi has specified the
interpreter context for that thread at that time.

Why would a module do that, instead of using
Py_{BEGIN,END}_ALLOW_THREADS?

grahamd · 2012-08-21T22:36:40Z

Those macros only work for general GIL releasing and pop straight away, not for the case where released, calls into some non Python C library, which then calls back into Python.

My recollection is, and so unless they have changed it, SWIG generated calls use the GILState calls. See:

https://issues.apache.org/jira/browse/MODPYTHON-217

pitrou · 2012-08-21T22:56:45Z

Those macros only work for general GIL releasing and pop straight
away, not for the case where released, calls into some non Python C
library, which then calls back into Python.

I see, so you are right that this new API could be useful. However, we
should also add a new GIL state API that fixes the issue for good (by
passing an interpreter), otherwise we will still have such discussions
in five years and it will be very silly.

My recollection is, and so unless they have changed it, SWIG generated
calls use the GILState calls. See:

Well, if SWIG isn't fixed, people should stop using an unmaintained and
buggy tool.

grahamd · 2012-08-22T00:46:30Z

If you have a Ex version of Ensure, then if the interpreter argument is NULL, then it should assume main interpreter. That way the normal version of Ensure can just call PyGILState_EnsureEx(NULL).

ncoghlan · 2012-08-24T12:35:55Z

I'm not sure it makes sense to call this new API "PyGILState_EnsureEx". My concern is that the behaviour is quite different in the presence of an existing thread state:

Ensure:

if a thread state exists, use that interpreter
otherwise, use the default interpreter configured in the pystate.c globals

New API:

if a thread state exists, and the interpreter doesn't match the requested one, fail with an error
otherwise, use the requested interpreter

I guess it makes sense if we treat the NULL pointer as the degenerate case meaning "use the interpreter of this thread, or the default interpreter if no interpreter has been declared for this thread". PyGILState_Ensure would then simply call PyGILState_EnsureEx(NULL) internally.

So, my question for Graham would be, given this ability, would mod_wsgi still need the ability to change the default interpreter? Or would it be enough for you to be able to register the threads *you* create with a specific interpreter?

pitrou · 2012-08-24T12:45:06Z

New API:

if a thread state exists, and the interpreter doesn't match the
requested one, fail with an error

otherwise, use the requested interpreter

That's not what I'm proposing. What I'm proposing is that the new API
uses a per-interpreter TLS key (so you can have several thread states
per OS thread).

So basically:

Ensure:

look up global TLS key, which returns the thread state
if no thread state (TLS lookup failed), create a new one for the main
interpreter and register it on the global TLS key

New API:

look up the interpreter's TLS key, which returns the thread state
if no thread state (TLS lookup failed), create a new one for the
interpreter and register it on the interpreter's TLS key

Graham is merely suggesting for simplification that "global TLS key" ==
"main interpreter's TLS key", so Ensure(...) ==
EnsureEx(main_interpreter, ...).

ncoghlan · 2012-08-24T12:55:43Z

And the current "autoTLSkey" could move into the interpreter state object? I like it - that's a lot more flexible than the approach I was thinking of.

grahamd · 2012-08-24T13:00:29Z

It is past my bed time and not thinking straight, but I believe Antoine is aligned with what I had in mind, as need multiple thread states per OS thread where each is associated with separate interpreter.

My main reason for allowing NULL to EnsureEX rather than requiring main_interpreter to be explicitly passed, is that way way back in time, my recollection is that getting access to the main interpreter pointer was a pain as you had to iterate over the list of interpreters and assume it was the last one due to it being created first. I don't remember there being a special global variable or function for getting a pointer to the main interpreter. This may well have changed since and there is an easier way do let me know. So saw it as a convenience.

mhammond · 2012-08-28T13:21:28Z

The GIL state api was mainly interested in the case of a thread which has (possibly) never been seen before calling into Python. IIUC, the proposal here is so that a thread that *has* been seen before can be associated with a thread-state specified by the embedding application (and the degenerate case would be to assume the thread hasn't been seen, and as such should get the default interpreter)

If that isn't too wide of the mark, I agree it sounds workable and worthwhile.

mhammond · 2012-08-28T13:36:24Z

To clarify, I wrote:

can be associated with a thread-state specified by the
embedding application

Where I meant to say:

Can be associated with an interpreter state and corresponding thread-state ...

Or something like that - it's been a while since I've looked at that code.

ncoghlan · 2012-08-28T14:12:09Z

Thinking about it, I believe there still needs to be a concept of an "active thread state" TLS key in order to deal with Graham's original problem. Specifically, if PyGILState_EnsureEx is used to associate the thread with a particular interpreter, then subsequent calls to PyGILState_Ensure from *that thread* should get the explicitly associated interpreter, rather than the main interpreter.

Then, the lookup sequence for "interpreter=NULL" would be:

Check the active TLS key, if set, use that thread state
If not set, look up the main interpreter's TLS key. If set, also set that on the active TLS key and use that thread state.
If still not set, create a new thread state on the main interpreter, set it for both the active TLS and the main interpreter's TLS key and use that thread state

A similar approach almost works when requesting a specific interpreter, but where that goes wrong is when the active TLS key is *already set*. You can't just overwrite it, because that will cause problems for subsequent PyGIL_Release calls. You could just make it an error, but I think Graham's original idea makes it possible to do better than that.

Specifically, a PyGILState_SwitchInterpreter API could focus solely on the manipulation of the "active thread state" TLS key entry. The sequence of commands in mod_wsgi would then look like:

old_interp = PyGILState_SwitchInterpreter(target_interp);
old_gil = PyGILState_Ensure();
/* Call into Python using target_interp */
PyGILState_Release(old_gil);
PyGILState_SwitchInterpreter(old_interp); /* May not be needed in the mod_wsgi case, since it knows it is making the outermost call into the PyGILState_* APIs */

All of the other elements of Antoine's proposal (i.e. the per-interpreter TLS key entries) would still be needed, it's just that the API for associating a thread with an interpreter would remain separated from that of associating the thread with a particular thread state.

The big advantage of doing it this way is that it will nest properly, whereas PyGILState_EnsureEx would need a more complicated API to correctly report both the old interpreter state and the old GIL state within that interpreter.

pitrou · 2012-08-28T16:47:57Z

Le mardi 28 août 2012 à 14:12 +0000, Nick Coghlan a écrit :

old_interp = PyGILState_SwitchInterpreter(target_interp);
old_gil = PyGILState_Ensure();
/* Call into Python using target_interp */
PyGILState_Release(old_gil);
PyGILState_SwitchInterpreter(old_interp); /* May not be needed in the mod_wsgi case, since it knows it is making the outermost call into the PyGILState_* APIs */

Why wouldn't it be simply written:

old_gil = PyGILState_EnsureEx(target_interp);
/* Call into Python using target_interp */
PyGILState_Release(old_gil);

seberg · 2020-08-10T18:27:00Z

In NumPy ufuncs and datatype casting (iteration) we have the following setup:

user-code (releasing GIL) -> NumPy API -> 3rd-party C-function

(in the ufunc code, numpy is the one releasing the GIL, although others, such as numba probably hook in and replace that.)
Now, I want the 3rd-party C-function to be able to report errors in a reasonable way. Which, in my opinion means it must be able to grab the GIL, set an error (potentially release the GIL) and return an error result.
In theory, setting the error could be deferred to some later "cleanup" when the GIL is held, but that just does not seem practical to me.

One thing which may make this not as problematic is that the all of this can be expected to run within a Python created thread, so I somewhat think the initial proposal here would effectively fix the current NumPy usage (I am not sure).

The reason I ask/post this is, that this is all part of public-API which requires a complete re-implementation very soon. While extensions to that new API may be possible, that is always a bit painful, so it would be good to know how that API should be designed right now.

At this point, it seems that I should design the 3rd-party C-function API so that it is passed a void *reserved = NULL (always NULL) to be able to pass a PyThreadState * or PyInterpreterState * at some point safely. (Or a PyThreadState **/ PyInterpreterState **?)

In the majority of cases, I could already pass this right now, but I have currently no clear idea of what is the best practice. I also need to take such an argument (requiring new API) in at least one place. If we know how this will pan out, adding it sooner rather than later would be good, since it will make future transition/adoption faster or at least easier.

As far as I understand, right now PyGILState_Ensure()` not working is probably the single most blocking issue for correct subinterpreter support in NumPy, since this is baked into public API it may take years for true support in the above way, but we may be able to go a large part of the way now. My problem is that I can only do that if I am clear on what is needed.

vstinner · 2020-11-04T13:58:42Z

See also bpo-40522: Get the current Python interpreter state from Thread Local Storage (autoTSSkey).

vstinner · 2022-01-07T18:32:22Z

The bpo-46295 was marked as a duplicate of this issue.

StonehengeLabs · 2022-04-13T07:35:11Z

This one still has the 3.10 label attached. Is there any chance to see any improvement scheduled for 3.11?

The objective of this change is to help make the GILState-related code easier to understand. This mostly involves moving code around and some semantically equivalent refactors. However, there are a also a small number of slight changes in structure and behavior: * tstate_current is moved out of _PyRuntimeState.gilstate * autoTSSkey is moved out of _PyRuntimeState.gilstate * autoTSSkey is initialized earlier * autoTSSkey is re-initialized (after fork) earlier #59956

This is a follow-up to gh-101161. The objective is to make it easier to read Python/pystate.c by grouping the functions there in a consistent way. This exclusively involves moving code around and adding various kinds of comments. #59956

…te (gh-101209) We've factored out a struct from the two PyThreadState fields. This accomplishes two things: * make it clear that the trashcan-related code doesn't need any other parts of PyThreadState * allows us to use the trashcan mechanism even when there isn't a "current" thread state We still expect the caller to hold the GIL. #59956

A PyThreadState can be in one of many states in its lifecycle, represented by some status value. Those statuses haven't been particularly clear, so we're addressing that here. Specifically: * made the distinct lifecycle statuses clear on PyThreadState * identified expectations of how various lifecycle-related functions relate to status * noted the various places where those expectations don't match the actual behavior At some point we'll need to address the mismatches. (This change also includes some cleanup.) #59956

ericsnowcurrently · 2023-01-30T19:12:37Z

FYI, I have a PR up that partially fixes the incompatibility: gh-101431.

It makes it so the GILState thread state and the "current" thread state (from PyThreadState_Swap()) don't get out of sync. So now PyThreadState_GET() == PyGILState_GetThisThreadState() will always be true (while the GIL is held).

The PR does not address autoInterpreterState, which determines which interpreter is used to create a new thread state.

…01308) A PyThreadState can be in one of many states in its lifecycle, represented by some status value. Those statuses haven't been particularly clear, so we're addressing that here. Specifically: * made the distinct lifecycle statuses clear on PyThreadState * identified expectations of how various lifecycle-related functions relate to status * noted the various places where those expectations don't match the actual behavior At some point we'll need to address the mismatches. (This change also includes some cleanup.) python#59956

…ters (gh-101431) The GILState API (PEP 311) implementation from 2003 made the assumption that only one thread state would ever be used for any given OS thread, explicitly disregarding the case of subinterpreters. However, PyThreadState_Swap() still facilitated switching between subinterpreters, meaning the "current" thread state (holding the GIL), and the GILState thread state could end up out of sync, causing problems (including crashes). This change addresses the issue by keeping the two in sync in PyThreadState_Swap(). I verified the fix against gh-99040. Note that the other GILState-subinterpreter incompatibility (with autoInterpreterState) is not resolved here. #59956

…State (gh-101625) This test should have been in gh-101431. #59956

yqs112358 · 2023-10-05T12:48:20Z

It is known that Python3.12 has supported per-interpreter GIL. Considering that running different sub-interpreters in parallel in multiple threads becomes a reality, it is a necessity to make the PyGILState API compatible.

Hope this will be fixed later in 3.12.x. ❤
Or, is there any instructions to teach us how to safely call PyGILState_Ensure() and PyGILState_Release() when using sub-interpreters in Non-Python created threads?

ericsnowcurrently · 2023-10-05T15:37:36Z

Considering that running different sub-interpreters in parallel in multiple threads becomes a reality, it is a necessity to make the PyGILState API compatible.

Just to be clear, this issue describes two distinct but related concerns:

[bug] crashes (or other inconsistent behavior) when using the GILState API with subinterpreters
[feature] there is no way to select when interpreter is used by PyGILState_Ensure() when it creates a new thread state¹

Presumably you are talking about the bug part.

Hope this will be fixed later in 3.12.x.

The bug part should be fixed in 3.12, via gh-101431² (and mostly tested via gh-101625). There should no longer be any crashes or inconsistent state when using the GILState API with subinterpreters. (The feature part of this issue is still unresolved.)

If the PR didn't actually solve the problem then we'll see if we can fix it in 3.12. However, a fix would have to wait until 3.13 if it requires adding new API.

Is there a specific case that isn't working for you on 3.12? If so, please provide an example that reproduces the failure.

Or, is there any instructions to teach us how to safely call PyGILState_Ensure() and PyGILState_Release() when using sub-interpreters in Non-Python created threads?

It shouldn't need any special treatment. Use PyGILState_Ensure() like normal. You don't get to pick which interpreter state any new thread state targets though. That's the feature part of this issue that still needs to be solved (in 3.13).

it always uses the main interpreter ↩
Note that the main objective of gh-59956: Partial Fix for GILState API Compatibility with Subinterpreters #101431 was to preserve the existing behavior of the GILState API, which was impacted by various runtime changes needed for a per-interpreter GIL. However, my understanding was that it also solved the bug part of this issue. ↩

pitrou · 2023-10-05T15:43:43Z

Not speaking for the OP, but if PyGILState_Ensure() always selects the main interpreter it means that any extension using it will not be compatible with subinterpreters (unless the piece of code that's guarded by PyGILState_Ensure is stateless enough that it doesn't matter which interpreter it executes in).

For example, if an extension gives access to a database and allows writing user-defined functions (UDFs) in Python, UDFs would then be always executed in the main interpreter which is not necessarily expected.

What would be needed is an additional API such as:

PyGILState_STATE_EX PyGILState_EnsureWithInterpreter(PyInterpreterState*);
void PyGILState_ReleaseWithInterpreter(PyGILState_STATE_EX, PyInterpreterState*);

(PyGILState_STATE_EX might or might not be the same as PyGILState_STATE)

... with one open question being: what happens if the given interpreter was destroyed in the meantime?
(should there instead be some kind of interpreter id pointing to a hash table of interpreters?)

ncoghlan added the type-feature A feature request or enhancement label Aug 21, 2012

ncoghlan changed the title ~~Add PyGILState_SwitchInterpreter~~ Support subinterpreters in the GIL state API Aug 24, 2012

vstinner added 3.10 only security fixes and removed 3.9 only security fixes labels Nov 4, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

kumaraditya303 removed the 3.10 only security fixes label Sep 6, 2022

bedevere-bot mentioned this issue Jan 19, 2023

gh-59956: Clarify GILState-related Code #101161

Merged

bedevere-bot mentioned this issue Jan 19, 2023

gh-59956: Fix Function Groupings in pystate.c #101172

Merged

bedevere-bot mentioned this issue Jan 21, 2023

gh-59956: Allow the "Trashcan" Mechanism to Work Without a Thread State #101209

Merged

bedevere-bot mentioned this issue Jan 25, 2023

gh-59956: Clarify Runtime State Status Expectations #101308

Merged

bedevere-bot mentioned this issue Jan 30, 2023

gh-59956: Partial Fix for GILState API Compatibility with Subinterpreters #101431

Merged

ericsnowcurrently self-assigned this Jan 30, 2023

bedevere-bot mentioned this issue Feb 6, 2023

gh-59956: Add a Test to Verify GILState Matches the "Current" Thread State #101625

Merged

ericsnowcurrently added a commit that referenced this issue Feb 6, 2023

gh-59956: Add a Test to Verify GILState Matches the "Current" Thread …

914f8fd

…State (gh-101625) This test should have been in gh-101431. #59956

GrahamDumpleton mentioned this issue Feb 10, 2023

sqlite callback functions can deadlock in Python subinterpreters #101767

Open

vstinner mentioned this issue May 4, 2023

gh-103323: Get the "Current" Thread State from a Thread-Local Variable #103324

Merged

neonene mentioned this issue Apr 18, 2024

Convert _ctypes extension module to multi-phase init #117142

Closed

2 tasks

TBBle mentioned this issue Aug 26, 2024

Creating an ungodly amount of sub interpreters in a short amount of time causes memory debug assertions. #123134

Open

[subinterpreters] Make the PyGILState API compatible with subinterpreters #59956

[subinterpreters] Make the PyGILState API compatible with subinterpreters #59956

Comments

ncoghlan commented Aug 21, 2012 • edited by bedevere-bot Loading

Linked PRs

ncoghlan commented Aug 21, 2012

grahamd mannequin commented Aug 21, 2012

pitrou commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

pitrou commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

pitrou commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

pitrou commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

pitrou commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

pitrou commented Aug 21, 2012

grahamd mannequin commented Aug 21, 2012

pitrou commented Aug 21, 2012

grahamd mannequin commented Aug 21, 2012

pitrou commented Aug 21, 2012

grahamd mannequin commented Aug 22, 2012

ncoghlan commented Aug 24, 2012

pitrou commented Aug 24, 2012

ncoghlan commented Aug 24, 2012

grahamd mannequin commented Aug 24, 2012

mhammond commented Aug 28, 2012

mhammond commented Aug 28, 2012

ncoghlan commented Aug 28, 2012

pitrou commented Aug 28, 2012

seberg mannequin commented Aug 10, 2020

vstinner commented Nov 4, 2020

vstinner commented Jan 7, 2022

StonehengeLabs commented Apr 13, 2022

ericsnowcurrently commented Jan 30, 2023

yqs112358 commented Oct 5, 2023 • edited Loading

ericsnowcurrently commented Oct 5, 2023

Footnotes

pitrou commented Oct 5, 2023 • edited Loading

ncoghlan commented Aug 21, 2012 •

edited by bedevere-bot

Loading

yqs112358 commented Oct 5, 2023 •

edited

Loading

pitrou commented Oct 5, 2023 •

edited

Loading