Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40522: Store tstate in a Thread Local Storage #23976

Closed
wants to merge 1 commit into from
Closed

bpo-40522: Store tstate in a Thread Local Storage #23976

wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Dec 28, 2020

If Python is built with GCC or clang, the current interpreter and the
current Python thread state are now stored in a Thread Local Storage.

Changes:

  • GCC and clang use the __thread keyword to declare the TLS
    variables.
  • Add set_current_tstate() sub-function which sets these two new TLS
    variables (if available).
  • _PyThreadState_Swap() and _PyThreadState_DeleteCurrent() now call
    set_current_tstate().
  • _PyThreadState_GET() and _PyInterpreterState_GET() now use the TLS
    variable if available.

https://bugs.python.org/issue40522

@vstinner
Copy link
Member Author

Build fails on macOS with Clang 12.0.0 (clang-1200.0.32.27): "illegal thread local variable reference"

building 'math' extension
gcc -Wno-unused-result -Wsign-compare -g -O0 -Wall -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden -I./Include/internal -I./Include -I. -I/usr/local/include -I/Users/runner/work/cpython/cpython/Include -I/Users/runner/work/cpython/cpython -c /Users/runner/work/cpython/cpython/Modules/mathmodule.c -o build/temp.macosx-10.15-x86_64-3.10-pydebug/Users/runner/work/cpython/cpython/Modules/mathmodule.o -DPy_BUILD_CORE_MODULE
gcc -bundle -undefined dynamic_lookup build/temp.macosx-10.15-x86_64-3.10-pydebug/Users/runner/work/cpython/cpython/Modules/mathmodule.o Modules/_math.o -L/usr/local/lib -lm -o build/lib.macosx-10.15-x86_64-3.10-pydebug/math.cpython-310d-darwin.so

ld: illegal thread local variable reference to regular symbol __Py_current_tstate for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

@vstinner
Copy link
Member Author

On MSC, we can try to use __declspec(thread). I tried but I got an error about DLL export. Maybe it should not be exported, and so the variables should not be used when building an extension module (not built as a builtin module).

@vstinner
Copy link
Member Author

cc @markshannon who loves TLS :-)

@vstinner
Copy link
Member Author

See https://bugs.python.org/issue40522#msg383899 for the emitted assembly code.

@vstinner
Copy link
Member Author

@vstinner
Copy link
Member Author

"In C11, the keyword _Thread_local is used to define thread-local variables. The header <threads.h>, if supported, defines thread_local as a synonym for that keyword."
https://en.wikipedia.org/wiki/Thread-local_storage#C.2B.2B

If Python is built with GCC or clang, the current interpreter and the
current Python thread state are now stored in a Thread Local Storage.

Changes:

* configure checks for C11 _Thread_local keyword.
* Use _Thread_local keyword, GCC and clang __thread extension.
* Add set_current_tstate() sub-function which sets these two new TLS
  variables (if available).
* _PyThreadState_Swap() and _PyThreadState_DeleteCurrent() now call
  set_current_tstate().
* _PyThreadState_GET() and _PyInterpreterState_GET() now use the TLS
  variable if available.
@markshannon
Copy link
Member

Would it be possible to keep all the portability macros in one place by putting something like #define Py_ThreadLocal(type, varname) ... in pyport.h?

@markshannon
Copy link
Member

One other remark (not for this PR, but for future work):
The HotSpot JVM uses aligned stacks to be able to access its thread-local information as fast as possible by zeroing the low bits of the machine stack pointer.
It is possible to get very close to this in (mostly) portable C https://godbolt.org/z/eG64zz

@vstinner
Copy link
Member Author

vstinner commented Jan 4, 2021

I need to test -femulated-tls clang flag. See also: http://llvm.org/docs/LangRef.html#thread-local-storage-models

@python python deleted a comment from leandrogmuller Jan 13, 2021
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Feb 13, 2021
@markshannon
Copy link
Member

@vstinner Did you ever get anywhere with this?

@vstinner
Copy link
Member Author

vstinner commented Sep 7, 2021

@vstinner Did you ever get anywhere with this?

I got compiler issues and then I stopped working on this PR.

A compromise would be to find a way to use a C99 keyword internally, but use a function call externally.

"externally" includes stdlib modules built as shared libraries (Linux .so, Windows .dll).

@h-vetinari
Copy link

A compromise would be to find a way to use a C99 keyword internally, but use a function call externally.

Or simply mandate C99? It's been >20 years...

@vstinner
Copy link
Member Author

vstinner commented Sep 7, 2021

Or simply mandate C99? It's been >20 years...

See previous comments. I got compiler errors to use this "recent feature". Sorry, the feature comes from C11, not C99. I don't call.

@vstinner
Copy link
Member Author

I failed finding time to fix compiler errors. I prefer to abandon this PR for now.

@vstinner vstinner closed this Sep 21, 2021
@vstinner vstinner deleted the thread_tstate branch September 21, 2021 22:09
ericsnowcurrently added a commit that referenced this pull request Apr 24, 2023
gh-103324)

We replace _PyRuntime.tstate_current with a thread-local variable. As part of this change, we add a _Py_thread_local macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.

Note that we do not provide a fallback to the thread-local, either falling back to the old tstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.

Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with _Py_tss_tstate (from here). However, that can be addressed separately and is not urgent (nor critical).

(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants