-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-123321: Fix Parser/myreadline.c to prevent a segfault during a multi-threaded race #123323
Conversation
By the way, I believe there's an issue with the pdb tests, I'm not sure if the doctests actually run or not. |
Lib/test/test_pdb.py
Outdated
@@ -2731,6 +2731,46 @@ def test_pdb_issue_gh_103225(): | |||
(Pdb) continue | |||
""" | |||
|
|||
if not SKIP_ASYNCIO_TESTS: | |||
def test_pdb_issue_gh_123321(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is failing in the free-threaded build, probably because the initialization of the _PyOS_ReadlineLock
lock isn't thread-safe without the GIL (and a few other related issues).
That shouldn't be too hard to fix, but we should do it after this PR has landed because the bug you are fixing exists in 3.12, and the free-threading fixes should only go back to 3.13.
In the meantime, you can add a @support.requires_gil_enabled()
decorator to skip this test in the free-threaded build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks, I'll do that.
I'm changing the tests now and moving them to the readline or builtins module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was fixed when I took the lock inside the GIL.
I was worried about a potential deadlock, but I don't see any other uses for the lock except this function so it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it does deadlock:
First thread:
GIL.acq() -> lock.acq() -> GIL.rls() -> |GIL.acq()| -> lock.rls()
Second thread:
GIL.acq() -> |lock.acq()| -> GIL.rls() -> GIL.acq() -> lock.rls()
Taking the lock before releasing the GIL will require releasing the lock without the GIL, which is very dangerous if the internal function acquires the GIL in any way. i.e. this will work, but dangerous if GIL is taken again after the release:
GIL.acq() -> lock.acq() -> GIL.rls() -> |dangerous if GIL taken here| -> lock.rls() -> GIL.acq()
I will have to take the lock outside the GIL and we need to ensure that taking the lock is threadsafe outside the GIL.
PyThread_acquire_lock(_PyOS_ReadlineLock, 1); | ||
Py_BEGIN_ALLOW_THREADS | ||
_PyOS_ReadlineTState = tstate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe set _PyOS_ReadlineTState while you are still holding the GIL, before Py_BEGIN_ALLOW_THREADS. It should not make a big difference, since other threads should be blocked at PyThread_acquire_lock(_PyOS_ReadlineLock, 1) :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have to release the lock outside the GIL and preferably take the lock too, otherwise it deadlocks.
It fails on the free-threaded as according to @colesbury the taking of the lock isn't safe outside the GIL.
I'm a little confused as theoretically if it isn't safe it should fail on the normal build too but I'm not entirely sure how that works.
…ks. Shifted tests to the readline module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
If I understood @colesbury correctly, the @requires_gil_enabled()
decorator avoids a test failure on Free Threaded build for now, and so the fix is good enough for "Python with GIL (enabled)".
So here's the thing - if it requires the GIL in order to take the lock then it shouldn't work in the normal build after I fear that it isn't safe outside the GIL and then we create a new issue that we aren't aware of. If it is then the Free Threaded should work too. Either way, how would one solve it in the free threaded build? Shall I open a new issue for that or will there be a large undertaking to take care of everything containing that decorator at a later time? |
|
||
// gh-123321: Must set the variable and then release the lock before | ||
// taking the GIL. Otherwise a deadlock or segfault may occur. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can a deadlock occur due to this variable not being NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release lock then take GIL.
If you take GIL and then release lock there's a deadlock.
If you set the variable before releasing the lock you segfault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the link you posted is exactly what I wanted to know. Thanks!
Yeah, the fix for the free-threaded build will require some code that's not available in 3.12. So my thinking is that we fix it first of for the default build like this and backport it all the way to 3.12, and then I can put up a PR that fixes it for the free-threaded build (and removes the skip annotation). That one won't get backported to 3.12. |
@colesbury @pablogsal: Are you ok with this change? |
… a multi-threaded race (pythonGH-123323) (cherry picked from commit a4562fe) Co-authored-by: Bar Harel <[email protected]>
… a multi-threaded race (pythonGH-123323) (cherry picked from commit a4562fe) Co-authored-by: Bar Harel <[email protected]>
GH-123676 is a backport of this pull request to the 3.13 branch. |
GH-123677 is a backport of this pull request to the 3.12 branch. |
Sweet. I'd appreciate a mention and a link to this issue in the free threaded build. Thanks everyone for the quick review and suggestions, amazing work 🥂 And to @amit-w for helping me debug it ❤️ |
…g a multi-threaded race (GH-123323) (#123677) * gh-123321: Fix Parser/myreadline.c to prevent a segfault during a multi-threaded race (GH-123323) (cherry picked from commit a4562fe) Co-authored-by: Bar Harel <[email protected]> * Remove @requires_gil_enabled for 3.12 --------- Co-authored-by: Bar Harel <[email protected]> Co-authored-by: Sam Gross <[email protected]>
The variable is now set and reset only within the lock.
Previously, it was a global variable that was reset outside of the lock and caused an access violation at line 265:
Happened with higher chance during thread closure.
tstate
at line 260 turned null, which caused theconfig
to be null.I'd appreciate help with the tests, I chose to test it on PDB but it's a difficult catch.
I also do not have a Windows machine readily available which is required to trigger.
PyOS_Readline
crashes in a multi-threaded race #123321