Daemonic threads not killed in some circumstances in python 3.13 #123940

rhpvorderman · 2024-09-11T09:58:04Z

Bug report

Bug description:

I got a bug report on python-zlib-ng where using the threaded opening the program would hang if an exception occurred and a context manager was not used.

I fixed it by changing the threads to be daemonic: pycompression/python-zlib-ng#54.

This fixes the issue on python 3.9, 3.10, 3.11 and 3.12, but not 3.13-rc2. Also the latest main branch is affected. Unfortunately I could not create a minimal reproducer, other than running the test directly. Daemonic threads seem not to work in this particular case, where there are some BufferedIO streams and queues involved.

git clone -b issue53 https://github.com/pycompression/python-zlib-ng
# Build Cpython
./python -m ensurepip
./python -m pip install pytest pytest-timeout <path to python-zlib-ng repo>
./python -m pytest <path to python-zlib-ng repo>/tests/test_gzip_ng_threaded.py::test_threaded_program_can_exit_on_error

Since there are many code changes between 3.12 in 3.13 the underlying cause is surely there, but it is hard to pinpoint.

CPython versions tested on:

3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

gh-123940: Ensure force-terminated daemon threads can be joined #124150

The text was updated successfully, but these errors were encountered:

y5c4l3 · 2024-09-12T18:57:36Z

Joining daemon threads at finalizer of _ThreadedGzipReader and _ThreadedGzipWriter (i.e. close()) gives a deadlock in recent versions of Python. I am still investigating on this and trying to give an MRE. The following patch makes it pass the test, but I am not sure whether it will break other versions (tested on Python 3.12 and latest).

diff --git a/src/zlib_ng/gzip_ng_threaded.py b/src/zlib_ng/gzip_ng_threaded.py
index 7fa7249..65bd95c 100644
--- a/src/zlib_ng/gzip_ng_threaded.py
+++ b/src/zlib_ng/gzip_ng_threaded.py
@@ -156,7 +156,6 @@ class _ThreadedGzipReader(io.RawIOBase):
         if self._closed:
             return
         self.running = False
-        self.worker.join()
         self.fileobj.close()
         if self.closefd:
             self.raw.close()
@@ -329,7 +328,7 @@ class _ThreadedGzipWriter(io.RawIOBase):
         if self._closed:
             return
         self.flush()
-        self.stop()
+        self.running = False
         if self.exception:
             self.raw.close()
             self._closed = True

Is it possible to workaround it by starting the worker in first call to read() / write() and using non-daemon threads?

rhpvorderman · 2024-09-13T09:59:27Z

@y5c4l3 Thanks! I will investigate and report back.

y5c4l3 · 2024-09-14T13:51:46Z

Reproducible:

# mwe.py
import io
import threading
from spawner import ModuleSpawnerIO

class SpawnerIO(io.RawIOBase):
    def __init__(self):
        self.task = threading.Thread(target=self._process, daemon=True)
        self.task.start()
    def _process(self):
        while True:
            pass
    def readable(self):
        return True
    def close(self):
        print('finalizing...')
        self.task.join()

# exit as expected, not calling finalizer
# r = io.BufferedReader(SpawnerIO())

# finalized and stuck
r = io.BufferedReader(ModuleSpawnerIO())

# spawner.py
import io
import threading

class ModuleSpawnerIO(io.RawIOBase):
    def __init__(self):
        self.task = threading.Thread(target=self._process, daemon=True)
        self.task.start()
    def _process(self):
        while True:
            pass
    def readable(self):
        return True
    def close(self):
        print('finalizing...')
        self.task.join()

Wrapping ModuleSpawnerIO from another module inside BufferedReader ensures that the finalizer will be called. The spawned thread dies without notifying the main thread of thread_is_exiting:

cpython/Python/ceval_gil.c

Lines 294 to 302 in 1de4613

    
               if (_PyThreadState_MustExit(tstate)) { 
        
                   /* bpo-39877: If Py_Finalize() has been called and tstate is not the 
        
                      thread which called Py_Finalize(), exit immediately the thread. 
        
                      This code path can be reached by a daemon thread after Py_Finalize() 
        
                      completes. In this case, tstate is a dangling pointer: points to 
        
                      PyThreadState freed memory. */ 
        
                   PyThread_exit_thread(); 
        
               }

Simpler Reproducible with `atexit`

import io
import threading
import atexit

def process():
    while True:
        pass

t1 = threading.Thread(target=process, daemon=True)
t1.start()
atexit.register(lambda: t1.join())

hauntsaninja · 2024-09-15T00:47:51Z

Thanks for the repro! Bisects to #114839
cc @mpage in case you have time to take a look :-)

mpage · 2024-09-15T04:48:36Z

Thanks for the minimal repo! I can take a look on Monday.

mpage · 2024-09-15T05:38:56Z

@hauntsaninja - What were you using as the test case for bisection? The simpler reproducer using atexit has the same behavior on 3.12.4 and main. Both hang until interrupted, which is what I would expect to happen.

hauntsaninja · 2024-09-15T06:16:36Z

I was using the mwe.py + spawner.py repro (which exits in 3.12.5). I think y5c4l3 edited in the atexit example after I posted.

During finalization, daemon threads are force to exit immediately (without returning through the call-stack normally) upon acquiring the GIL. Finalizers that run after this must be able to join the forcefully terminated threads. The current implemenation notified of thread completetion before returning from `thread_run`. This code will never execute if the thread is forced to exit during finalization. Any code that attempts to join such a thread will block indefinitely. To fix this, use the old approach of notifying of thread completion when the thread state is cleared. This happens both when `thread_run` exits normally and when thread states are destroyed as part of finalization (which happens immediately after forcing daemon threads to exit, before any python code can run).

During finalization, daemon threads are force to exit immediately (without returning through the call-stack normally) upon acquiring the GIL. Finalizers that run after this must be able to join the forcefully terminated threads. The current implementation notified of thread completion before returning from `thread_run`. This code will never execute if the thread is forced to exit during finalization. Any code that attempts to join such a thread will block indefinitely. To fix this, use the old approach of notifying of thread completion when the thread state is cleared. This happens both when `thread_run` exits normally and when thread states are destroyed as part of finalization (which happens immediately after forcing daemon threads to exit, before any python code can run).

colesbury · 2024-09-18T15:56:09Z

Matt is working on a fix, but I don't think the approach used in https://github.com/pycompression/python-zlib-ng is a good idea:

In particular:

Don't join threads in destructors. That seems likely to cause problems because now you have blocking synchronization potentially at arbitrary places. In general, you can't be sure which thread will call destructors due to GC.
Don't rely on deamon threads being joinable at shutdown.

On the second point: both the join and shutdown behavior has changed in recent releases and is likely to continue to change even with Matt's fix. For example, join() now calls the OS join API, so if something keeps the daemon thread alive (like blocking I/O), the join() will block forever in 3.13 (even with Matt's fix). Additionally, it relies on daemon threads being force killed, which started in 3.9 and has been a continual source of problems especially with interoperability with C++ code and C++ destructors. In other words, I think it's something we will want to move away from.

Finally, slight variants of the reproducer will hard crash in 3.11 and 3.12 (and possibly earlier versions), which makes me think it's not a robust approach.

rhpvorderman · 2024-09-20T07:04:34Z

Don't join threads in destructors. That seems likely to cause problems because now you have blocking synchronization potentially at arbitrary places. In general, you can't be sure which thread will call destructors due to GC.

There is no custom destructors in python-zlib-ng that join threads. Where did you see this code? In fact, for the classes that cause the issue, there are no custom destructors.

Don't rely on deamon threads being joinable at shutdown [...] not a robust approach.

I really appreciate your explanation. What patterns do you recommend to avoid this sort of behaviour? I have a thread that does the IO. Are there any well documented patterns to prevent this from blocking?

There should be at least some documentation on this in the release notes if it is going to break currently used code. Python-isal uses the same code and is marked as an essential package for the python ecosystem. I'd like it to be able to work with python 3.13 when it comes out.

colesbury · 2024-09-20T18:46:37Z

There is no custom destructors in python-zlib-ng that join threads. Where did you see this code?

The __del__ for RawIOBase calls close() and you have a thread.join() called from your close() (via stop()).

What patterns do you recommend to avoid this sort of behaviour?

Don't call thread.join() in close() if it's being called from __del__(). Something like:

class _ThreadedGzipReader(io.RawIOBase):
    def stop(self):
        if self.running:
            self.running = False
            if not self._finalizing:
                for worker in self.compression_workers:
                    worker.join()
                self.output_worker.join()

    def __del__(self):
        self._finalizing = True
        super().__del__()

rhpvorderman · 2024-09-23T06:15:56Z

@colesbury Thank you so much for this additional information. Now I understand the problem and can fix it appropriately.

hauntsaninja · 2024-09-23T20:21:01Z

People are leaning towards not treating this as blocking for 3.13.0 (x-ref colesbury's message on internal Discord)

colesbury also mentioned #116514 as a reference for hard crashes on 3.11/3.12 (I wasn't able to reproduce #124150 (comment) )

rhpvorderman added the type-bug An unexpected behavior, bug, or error label Sep 11, 2024

rhpvorderman changed the title ~~Daemonic threads not killed in python 3.13~~ Daemonic threads not killed in some circumstances in python 3.13 Sep 11, 2024

hauntsaninja added 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Sep 15, 2024

mpage self-assigned this Sep 15, 2024

hauntsaninja added the release-blocker label Sep 16, 2024

ezio-melotti added this to Release and Deferred blockers 🚫 Sep 16, 2024

github-project-automation bot moved this to Todo in Release and Deferred blockers 🚫 Sep 16, 2024

bedevere-app bot mentioned this issue Sep 17, 2024

gh-123940: Ensure force-terminated daemon threads can be joined #124150

Open

mpage added a commit to mpage/cpython that referenced this issue Sep 17, 2024

Merge branch 'main' into pythongh-123940

4d02a0b

mpage added a commit to mpage/cpython that referenced this issue Sep 18, 2024

Merge branch 'main' into pythongh-123940

d18c872

mpage added a commit to mpage/cpython that referenced this issue Sep 18, 2024

Merge branch 'main' into pythongh-123940

57a1c65

Yhg1s added deferred-blocker and removed release-blocker labels Sep 23, 2024

gpshead mentioned this issue Sep 26, 2024

pthread_exit & PyThread_exit_thread from PyEval_RestoreThread etc. are harmful #87135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daemonic threads not killed in some circumstances in python 3.13 #123940

Daemonic threads not killed in some circumstances in python 3.13 #123940

rhpvorderman commented Sep 11, 2024 •

edited by bedevere-app bot

Loading

y5c4l3 commented Sep 12, 2024 •

edited

Loading

rhpvorderman commented Sep 13, 2024

y5c4l3 commented Sep 14, 2024 •

edited

Loading

hauntsaninja commented Sep 15, 2024 •

edited

Loading

mpage commented Sep 15, 2024

mpage commented Sep 15, 2024 •

edited

Loading

hauntsaninja commented Sep 15, 2024

colesbury commented Sep 18, 2024

rhpvorderman commented Sep 20, 2024

colesbury commented Sep 20, 2024

rhpvorderman commented Sep 23, 2024

hauntsaninja commented Sep 23, 2024

Daemonic threads not killed in some circumstances in python 3.13 #123940

Daemonic threads not killed in some circumstances in python 3.13 #123940

Comments

rhpvorderman commented Sep 11, 2024 • edited by bedevere-app bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

y5c4l3 commented Sep 12, 2024 • edited Loading

rhpvorderman commented Sep 13, 2024

y5c4l3 commented Sep 14, 2024 • edited Loading

Simpler Reproducible with atexit

hauntsaninja commented Sep 15, 2024 • edited Loading

mpage commented Sep 15, 2024

mpage commented Sep 15, 2024 • edited Loading

hauntsaninja commented Sep 15, 2024

colesbury commented Sep 18, 2024

rhpvorderman commented Sep 20, 2024

colesbury commented Sep 20, 2024

rhpvorderman commented Sep 23, 2024

hauntsaninja commented Sep 23, 2024

rhpvorderman commented Sep 11, 2024 •

edited by bedevere-app bot

Loading

y5c4l3 commented Sep 12, 2024 •

edited

Loading

y5c4l3 commented Sep 14, 2024 •

edited

Loading

Simpler Reproducible with `atexit`

hauntsaninja commented Sep 15, 2024 •

edited

Loading

mpage commented Sep 15, 2024 •

edited

Loading