-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-103323: Get the "Current" Thread State from a Thread-Local Variable #103324
gh-103323: Get the "Current" Thread State from a Thread-Local Variable #103324
Conversation
Per the benchmarks, this change is a little faster (less than 1%) on Linux/GCC. |
🤖 New build scheduled with the buildbot fleet by @ericsnowcurrently for commit feb8ef5 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
It might be worth trying to see what is the performance impact of storing the interpreter state in TLS as well. |
* main: (53 commits) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) Add tests for empty range equality (python#103751) pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713) ...
* superopt: (82 commits) pythongh-101517: fix line number propagation in code generated for except* (python#103550) pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) ...
FTR, on Windows this introduced a ~2% performance regression, and on MacOS there's ~3% regression. Note that these penalties may be partially mitigated by passing the current thread state as an argument throughout the internal C-API (where currently we only do so in some places). The implementation here is also relatively naïve. There are likely opportunities to improve performance via compiler-specific directives. |
static inline PyThreadState* | ||
_PyRuntimeState_GetThreadState(_PyRuntimeState *Py_UNUSED(runtime)) | ||
{ | ||
return _PyThreadState_GET(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function no longer makes sense: I wrote PR #104171 to remove it.
I prepared this change for years in advance:
|
In the Python 3.9 and 3.10 era, I moved multiple global states to the "interpreter state" (interp): https://pythondev.readthedocs.io/subinterpreters.html#done These changes caused various crashes in third party C extensions which use the C API with the GIL released (!). For example, calling |
It might be interesting to check the hot code calling _PyThreadState_GET() and see if tstate could be passed to only call _PyThreadState_GET() once. I'm not sure if it's worth it. Also, in stdlib C extensions, I would prefer to use the internal C API less rather than more :-) |
@@ -663,6 +663,27 @@ extern char * _getpty(int *, int, mode_t, int); | |||
# define WITH_THREAD | |||
#endif | |||
|
|||
#ifdef WITH_THREAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is useless. This macro is now always defined. It's only kept for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it affect WASM builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the code 3 lines above:
#ifndef WITH_THREAD
# define WITH_THREAD
#endif
#if defined(HAVE_THREAD_LOCAL) && !defined(Py_BUILD_CORE_MODULE) | ||
extern _Py_thread_local PyThreadState *_Py_tss_tstate; | ||
#endif | ||
PyAPI_DATA(PyThreadState *) _PyThreadState_GetCurrent(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for this new _PyThreadState_GetCurrent() function? There is already PyThreadState_Get(). How is it different?
The API to get the current thread state is already complicated and has a complicated history: https://pythondev.readthedocs.io/pystate.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's there because I couldn't find a way to mix PyAPI_DATA
with _Py_thread_local
. Looks like that's the same issue you ran into in 2020.
Does this change fix indirectly the PyGILState API for subinterpreters? See: #59956 |
Thanks @ericsnowcurrently for taking care of this very old project! It seems like the |
We replace
_PyRuntime.tstate_current
with a thread-local variable. As part of this change, we add a_Py_thread_local
macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.Note that we do not provide a fallback to the thread-local, either falling back to the old
tstate_current
or to thread-specific storage (PyThread_tss_*()
). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with
_Py_tss_tstate
(from here). However, that can be addressed separately and is not urgent (nor critical).My only remaining uncertainty is with the existing "GIL is held" constraint. With
_PyRuntime.tstate_current
, it was only guaranteed valid in the thread currently holding the GIL, if any. With this change, it is valid even when the GIL isn't held. I don't see how that would be a problem, but I'm going to double-check anyway.(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)