Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization #105805

Merged
merged 17 commits into from
Oct 2, 2024

Conversation

jbms
Copy link
Contributor

@jbms jbms commented Jun 14, 2023

This splits off the change from #28525 to hang threads that attempt to acquire the GIL during interpreter shutdown, but does not introduce any new public APIs.


📚 Documentation preview 📚: https://cpython-previews--105805.org.readthedocs.build/

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as I wrote in issue #87135, I dislike switching to this behavior by default :-(

I would prefer to keep the current behavior by default, but give the ability to hang for people impacted by pthread_exit() issues.

Python/thread_pthread.h Show resolved Hide resolved
@gpshead
Copy link
Member

gpshead commented Sep 26, 2024

I would prefer to keep the current behavior by default, but give the ability to hang for people impacted by pthread_exit() issues.

My problem with that and reason why we need this PR is that the answer to "who is going to use that ability" is unclear. End users of Python applications don't have a way to debug random thread crashes and know that they need it. Extension module authors do not know that they need it. A few big application owners will simply blindly turn it on for everything because they happen to have people aware of the consequences of not going it. But there is no good point to turn such a conditional of "hang threads rather than allow them to randomly crash or refuse to run C++ finalizers despite releasing the associated memory meaning other threads can crash because multiple other things are using the previously allocated and then freed memory". Those concurrency scribbling messes are far worse things to attempt to debug than a simple hung thread, hung within a clear C symbol saying "hi there, i'm going to hang this thread".

So I'm all for undoing our mistaken thread exiting API and use of it in favor of this least bad option of hanging threads that would otherwise have failed to cleanup and left ambiguious process state behind as we have had in the past.

I do still think we want APIs to allow error handling upon re-entering an interpreter or acquiring the GIL and actually failing instead of this... but we don't have those. and most of the world's Python C API using code won't be using such new APIs for a looong time while the random crashing problem would persist. So this remains the least bad option that makes a viable improvement.

@gpshead
Copy link
Member

gpshead commented Sep 26, 2024

@colesbury and/or @mpage, and @ericsnowcurrently - can you take a look at this PR? (FWIW, Eric's already gone over it with my on screen at the core dev sprint here)

@mpage
Copy link
Contributor

mpage commented Sep 27, 2024

Blocking daemon threads after the VM has been finalized seems less error prone than forcing the threads to exit. I believe this is also how the JVM handles the same situation, so there's some precedence for this approach.

While we're here, I think this might be a good time to also fix the issues with threads referencing their PyThreadStates after they have been freed. This would allow us to avoid having to introduce new APIs to put threads to sleep. Instead, threads could block on a lock in their PyThreadState that is never released (also what the JVM does). I believe @colesbury suggested refcounting PyThreadStates as a solution to the PyThreadState lifetime problem.

Modules/_testcapimodule.c Outdated Show resolved Hide resolved
Modules/_testcapimodule.c Outdated Show resolved Hide resolved
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what @gpshead and @mpage wrote. I think this is the right choice for default behavior.

Doc/c-api/init.rst Outdated Show resolved Hide resolved
Include/pythread.h Outdated Show resolved Hide resolved
Linking to the filed pre-existing known issue.
@gpshead
Copy link
Member

gpshead commented Oct 2, 2024

While we're here, I think this might be a good time to also fix the issues with threads referencing their PyThreadStates after they have been freed. This would allow us to avoid having to introduce new APIs to put threads to sleep. Instead, threads could block on a lock in their PyThreadState that is never released (also what the JVM does). I believe @colesbury suggested refcounting PyThreadStates as a solution to the PyThreadState lifetime problem.

That goes beyond the scope of what I want to accomplish with this PR - which I believe might mostly be a candidate for backporting as a bug fix (I'll see what the RMs think).

I think it is a good followup issue though - #124878 for starters but that might turn into a couple of different things to chase down as well.

@gpshead gpshead merged commit 8cc5aa4 into python:main Oct 2, 2024
37 checks passed
@gpshead
Copy link
Member

gpshead commented Oct 2, 2024

@Yhg1s what's your opinion on the viability of backporting this as a bugfix to 3.13 and 3.12 patch releases? People who actually encounter this in the form of strange crashes from threads during shutdown have been wanting it for years. I wouldn't call it common, just a thorn in some classes of applications side.

(In terms of impact I'd elide marking the deprecated C API as deprecated in the backports - otherwise I expect most of this to apply relatively easily given that it was originally authored on a 3.12 or earlier base and I only merged it forward to current main last week)

@gpshead gpshead added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Oct 28, 2024
@miss-islington-app
Copy link

Thanks @jbms for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @jbms for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @jbms and @gpshead, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 8cc5aa47ee464ddfd8da5461edecf4a5c72df2ff 3.12

@miss-islington-app
Copy link

Sorry, @jbms and @gpshead, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 8cc5aa47ee464ddfd8da5461edecf4a5c72df2ff 3.13

@gpshead
Copy link
Member

gpshead commented Oct 28, 2024

I expected the backport automation to fail and need edits regardless even if it hadn't. I'll be taking care of them. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants