Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization #105805

Merged
merged 17 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 60 additions & 16 deletions Doc/c-api/init.rst
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,11 @@
Some memory allocated by extension modules may not be freed. Some extensions may not
work properly if their initialization routine is called more than once; this can
happen if an application calls :c:func:`Py_Initialize` and :c:func:`Py_FinalizeEx`
more than once.
more than once. :c:func:`Py_FinalizeEx` must not be called recursively from
within itself. Therefore, it must not be called by any code that may be run
as part of the interpreter shutdown process, such as :py:mod:`atexit`
handlers, object finalizers, or any code that may be run while flushing the
stdout and stderr files.

.. audit-event:: cpython._PySys_ClearAuditHooks "" c.Py_FinalizeEx

Expand Down Expand Up @@ -960,6 +964,37 @@
The only exception is if :c:func:`exec` will be called immediately
after.

.. _cautions-regarding-runtime-finalization:

Cautions regarding runtime finalization
---------------------------------------

In the late stage of :term:`interpreter shutdown`, after attempting to wait for

Check warning on line 972 in Doc/c-api/init.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

c:func reference target not found: _Py_IsFinalizing [ref.func]
non-daemon threads to exit (though this can be interrupted by
:class:`KeyboardInterrupt`) and running the :mod:`atexit` functions, the runtime
is marked as *finalizing*: :c:func:`_Py_IsFinalizing` and
:func:`sys.is_finalizing` return true. At this point, only the *finalization
thread* that initiated finalization (typically the main thread) is allowed to
acquire the :term:`GIL`.

If any thread, other than the finalization thread, attempts to acquire the GIL

Check warning on line 980 in Doc/c-api/init.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

c:func reference target not found: PyEval_AcquireLock [ref.func]
during finalization, either explicitly via :c:func:`PyGILState_Ensure`,
:c:macro:`Py_END_ALLOW_THREADS`, :c:func:`PyEval_AcquireThread`, or
:c:func:`PyEval_AcquireLock`, or implicitly when the interpreter attempts to
reacquire it after having yielded it, the thread enters **a permanently blocked
state** where it remains until the program exits. In most cases this is
harmless, but this can result in deadlock if a later stage of finalization
attempts to acquire a lock owned by the blocked thread, or otherwise waits on
the blocked thread.

Gross? Yes. This prevents random crashes and/or unexpectedly skipped C++
finalizations further up the call stack when such threads were forcibly exited
here in CPython 3.13 and earlier. The CPython runtime GIL acquiring C APIs
have never had any error reporting or handling expectations at GIL acquisition
time that would've allowed for graceful exit from this situation. Changing that
would require new stable C APIs and rewriting the majority of C code in the
CPython ecosystem to use those with error handling.


High-level API
--------------
Expand Down Expand Up @@ -1033,11 +1068,14 @@
ensues.

.. note::
Calling this function from a thread when the runtime is finalizing
will terminate the thread, even if the thread was not created by Python.
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
check if the interpreter is in process of being finalized before calling
this function to avoid unwanted termination.
Calling this function from a thread when the runtime is finalizing will
hang the thread until the program exits, even if the thread was not
created by Python. Refer to
:ref:`cautions-regarding-runtime-finalization` for more details.

.. versionchanged:: next
Hangs the current thread, rather than terminating it, if called while the
interpreter is finalizing.

.. c:function:: PyThreadState* PyThreadState_Get()

Expand Down Expand Up @@ -1092,11 +1130,14 @@
to call arbitrary Python code. Failure is a fatal error.

.. note::
Calling this function from a thread when the runtime is finalizing
will terminate the thread, even if the thread was not created by Python.
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
check if the interpreter is in process of being finalized before calling
this function to avoid unwanted termination.
Calling this function from a thread when the runtime is finalizing will
hang the thread until the program exits, even if the thread was not
created by Python. Refer to
:ref:`cautions-regarding-runtime-finalization` for more details.

.. versionchanged:: next
Hangs the current thread, rather than terminating it, if called while the
interpreter is finalizing.

.. c:function:: void PyGILState_Release(PyGILState_STATE)

Expand Down Expand Up @@ -1374,17 +1415,20 @@
If this thread already has the lock, deadlock ensues.

.. note::
Calling this function from a thread when the runtime is finalizing
will terminate the thread, even if the thread was not created by Python.
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
check if the interpreter is in process of being finalized before calling
this function to avoid unwanted termination.
Calling this function from a thread when the runtime is finalizing will
hang the thread until the program exits, even if the thread was not
created by Python. Refer to
:ref:`cautions-regarding-runtime-finalization` for more details.

.. versionchanged:: 3.8
Updated to be consistent with :c:func:`PyEval_RestoreThread`,
:c:func:`Py_END_ALLOW_THREADS`, and :c:func:`PyGILState_Ensure`,
and terminate the current thread if called while the interpreter is finalizing.

.. versionchanged:: next
Hangs the current thread, rather than terminating it, if called while the
interpreter is finalizing.

:c:func:`PyEval_RestoreThread` is a higher-level function which is always
available (even when threads have not been initialized).

Expand Down
13 changes: 13 additions & 0 deletions Include/internal/pycore_pythread.h
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,19 @@ PyAPI_FUNC(int) PyThread_join_thread(PyThread_handle_t);
* a non-zero value on failure.
*/
PyAPI_FUNC(int) PyThread_detach_thread(PyThread_handle_t);
/*
* Hangs the thread indefinitely without exiting it.
*
* gh-87135: There is no safe way to exit a thread other than returning
* normally from its start function. This is used during finalization in lieu
* of actually exiting the thread. Since the program is expected to terminate
* soon anyway, it does not matter if the thread stack stays around until then.
*
* This is unfortunate for embedders who may not be terminating their process
* when they're done with the interpreter, but our C API design does not allow
* for safely exiting threads attempting to re-enter Python post finalization.
*/
void _Py_NO_RETURN PyThread_hang_thread(void);

#ifdef __cplusplus
}
Expand Down
21 changes: 20 additions & 1 deletion Include/pythread.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,26 @@ typedef enum PyLockStatus {

PyAPI_FUNC(void) PyThread_init_thread(void);
PyAPI_FUNC(unsigned long) PyThread_start_new_thread(void (*)(void *), void *);
PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
/* Terminates the current thread. Considered unsafe.
*
* WARNING: This function is only safe to call if all functions in the full call
* stack are written to safely allow it. Additionally, the behavior is
* platform-dependent. This function should be avoided, and is no longer called
* by Python itself. It is retained only for compatibility with existing C
* extension code.
*
* With pthreads, calls `pthread_exit` causes some libcs (glibc?) to attempt to
* unwind the stack and call C++ destructors; if a `noexcept` function is
* reached, they may terminate the process. Others (macOS) do unwinding.
*
* On Windows, calls `_endthreadex` which kills the thread without calling C++
* destructors.
*
* In either case there is a risk of invalid references remaining to data on the
* thread stack.
*/
Py_DEPRECATED(3.14) PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);

PyAPI_FUNC(unsigned long) PyThread_get_thread_ident(void);

#if (defined(__APPLE__) || defined(__linux__) || defined(_WIN32) \
Expand Down
70 changes: 70 additions & 0 deletions Lib/test/test_threading.py
Original file line number Diff line number Diff line change
Expand Up @@ -1171,6 +1171,76 @@ def __del__(self):
self.assertEqual(out.strip(), b"OK")
self.assertIn(b"can't create new thread at interpreter shutdown", err)

@cpython_only
def test_finalize_daemon_thread_hang(self):
if support.check_sanitizer(thread=True, memory=True):
# the thread running `time.sleep(100)` below will still be alive
# at process exit
self.skipTest(
"https://github.com/python/cpython/issues/124878 - Known"
" race condition that TSAN identifies.")
# gh-87135: tests that daemon threads hang during finalization
script = textwrap.dedent('''
import os
import sys
import threading
import time
import _testcapi

lock = threading.Lock()
lock.acquire()
thread_started_event = threading.Event()
def thread_func():
try:
thread_started_event.set()
_testcapi.finalize_thread_hang(lock.acquire)
finally:
# Control must not reach here.
os._exit(2)

t = threading.Thread(target=thread_func)
t.daemon = True
t.start()
thread_started_event.wait()
# Sleep to ensure daemon thread is blocked on `lock.acquire`
#
# Note: This test is designed so that in the unlikely case that
# `0.1` seconds is not sufficient time for the thread to become
# blocked on `lock.acquire`, the test will still pass, it just
# won't be properly testing the thread behavior during
# finalization.
time.sleep(0.1)

def run_during_finalization():
# Wake up daemon thread
lock.release()
# Sleep to give the daemon thread time to crash if it is going
# to.
#
# Note: If due to an exceptionally slow execution this delay is
# insufficient, the test will still pass but will simply be
# ineffective as a test.
time.sleep(0.1)
# If control reaches here, the test succeeded.
os._exit(0)

# Replace sys.stderr.flush as a way to run code during finalization
orig_flush = sys.stderr.flush
def do_flush(*args, **kwargs):
orig_flush(*args, **kwargs)
if not sys.is_finalizing:
return
sys.stderr.flush = orig_flush
run_during_finalization()

sys.stderr.flush = do_flush

# If the follow exit code is retained, `run_during_finalization`
# did not run.
sys.exit(1)
''')
assert_python_ok("-c", script)

class ThreadJoinOnShutdown(BaseTestCase):

def _run_and_join(self, script):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Attempting to acquire the GIL after runtime finalization has begun in a
different thread now causes the thread to hang rather than terminate, which
avoids potential crashes or memory corruption caused by attempting to
terminate a thread that is running code not specifically designed to support
termination. In most cases this hanging is harmless since the process will
soon exit anyway.

The ``PyThread_exit_thread`` function is now deprecated. Its behavior is
inconsistent across platforms, and it can only be used safely in the
unlikely case that every function in the entire call stack has been designed
to support the platform-dependent termination mechanism. It is recommended
that users of this function change their design to not require thread
termination. In the unlikely case that thread termination is needed and can
be done safely, users may migrate to calling platform-specific APIs such as
``pthread_exit`` (POSIX) or ``_endthreadex`` (Windows) directly.
30 changes: 30 additions & 0 deletions Modules/_testcapimodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -3310,6 +3310,35 @@ test_critical_sections(PyObject *module, PyObject *Py_UNUSED(args))
Py_RETURN_NONE;
}

// Used by `finalize_thread_hang`.
#ifdef _POSIX_THREADS
static void finalize_thread_hang_cleanup_callback(void *Py_UNUSED(arg)) {
// Should not reach here.
Py_FatalError("pthread thread termination was triggered unexpectedly");
}
#endif

// Tests that finalization does not trigger pthread cleanup.
//
// Must be called with a single nullary callable function that should block
// (with GIL released) until finalization is in progress.
static PyObject *
finalize_thread_hang(PyObject *self, PyObject *callback)
{
// WASI builds some pthread stuff but doesn't have these APIs today?
#if defined(_POSIX_THREADS) && !defined(__wasi__)
pthread_cleanup_push(finalize_thread_hang_cleanup_callback, NULL);
#endif
PyObject_CallNoArgs(callback);
// Should not reach here.
Py_FatalError("thread unexpectedly did not hang");
#if defined(_POSIX_THREADS) && !defined(__wasi__)
pthread_cleanup_pop(0);
#endif
Py_RETURN_NONE;
}


static PyMethodDef TestMethods[] = {
{"set_errno", set_errno, METH_VARARGS},
{"test_config", test_config, METH_NOARGS},
Expand Down Expand Up @@ -3449,6 +3478,7 @@ static PyMethodDef TestMethods[] = {
{"test_weakref_capi", test_weakref_capi, METH_NOARGS},
{"function_set_warning", function_set_warning, METH_NOARGS},
{"test_critical_sections", test_critical_sections, METH_NOARGS},
{"finalize_thread_hang", finalize_thread_hang, METH_O, NULL},
{NULL, NULL} /* sentinel */
};

Expand Down
26 changes: 17 additions & 9 deletions Python/ceval_gil.c
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include "pycore_pylifecycle.h" // _PyErr_Print()
#include "pycore_pymem.h" // _PyMem_IsPtrFreed()
#include "pycore_pystats.h" // _Py_PrintSpecializationStats()
#include "pycore_pythread.h" // PyThread_hang_thread()

/*
Notes about the implementation:
Expand Down Expand Up @@ -277,10 +278,9 @@ drop_gil(PyInterpreterState *interp, PyThreadState *tstate, int final_release)
/* Take the GIL.

The function saves errno at entry and restores its value at exit.
It may hang rather than return if the interpreter has been finalized.

tstate must be non-NULL.

Returns 1 if the GIL was acquired, or 0 if not. */
tstate must be non-NULL. */
static void
take_gil(PyThreadState *tstate)
{
Expand All @@ -293,12 +293,18 @@ take_gil(PyThreadState *tstate)

if (_PyThreadState_MustExit(tstate)) {
/* bpo-39877: If Py_Finalize() has been called and tstate is not the
thread which called Py_Finalize(), exit immediately the thread.
thread which called Py_Finalize(), this thread cannot continue.

This code path can be reached by a daemon thread after Py_Finalize()
completes. In this case, tstate is a dangling pointer: points to
PyThreadState freed memory. */
PyThread_exit_thread();
PyThreadState freed memory.

This used to call a *thread_exit API, but that was not safe as it
lacks stack unwinding and local variable destruction important to
C++. gh-87135: The best that can be done is to hang the thread as
the public APIs calling this have no error reporting mechanism (!).
*/
PyThread_hang_thread();
}

assert(_PyThreadState_CheckConsistency(tstate));
Expand Down Expand Up @@ -342,7 +348,9 @@ take_gil(PyThreadState *tstate)
if (drop_requested) {
_Py_unset_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);
}
PyThread_exit_thread();
// gh-87135: hang the thread as *thread_exit() is not a safe
// API. It lacks stack unwind and local variable destruction.
PyThread_hang_thread();
}
assert(_PyThreadState_CheckConsistency(tstate));

Expand Down Expand Up @@ -383,7 +391,7 @@ take_gil(PyThreadState *tstate)

if (_PyThreadState_MustExit(tstate)) {
/* bpo-36475: If Py_Finalize() has been called and tstate is not
the thread which called Py_Finalize(), exit immediately the
the thread which called Py_Finalize(), gh-87135: hang the
thread.

This code path can be reached by a daemon thread which was waiting
Expand All @@ -393,7 +401,7 @@ take_gil(PyThreadState *tstate)
/* tstate could be a dangling pointer, so don't pass it to
drop_gil(). */
drop_gil(interp, NULL, 1);
PyThread_exit_thread();
PyThread_hang_thread();
}
assert(_PyThreadState_CheckConsistency(tstate));

Expand Down
2 changes: 1 addition & 1 deletion Python/pylifecycle.c
Original file line number Diff line number Diff line change
Expand Up @@ -2020,7 +2020,7 @@ _Py_Finalize(_PyRuntimeState *runtime)
/* Ensure that remaining threads are detached */
_PyEval_StopTheWorldAll(runtime);

/* Remaining daemon threads will automatically exit
/* Remaining daemon threads will be trapped in PyThread_hang_thread
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
_PyInterpreterState_SetFinalizing(tstate->interp, tstate);
_PyRuntimeState_SetFinalizing(runtime, tstate);
Expand Down
8 changes: 8 additions & 0 deletions Python/thread_nt.h
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,14 @@ PyThread_exit_thread(void)
_endthreadex(0);
}

void _Py_NO_RETURN
PyThread_hang_thread(void)
{
while (1) {
SleepEx(INFINITE, TRUE);
}
}

/*
* Lock support. It has to be implemented as semaphores.
* I [Dag] tried to implement it with mutex but I could find a way to
Expand Down
Loading
Loading