Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127049: Fix asyncio.subprocess.Process race condition killing an unrelated process on Unix #127051

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gschaffner
Copy link

Proposal that fixes GH-127049. The idea is to revert GH-121126 and re-fix GH-87744 in a way that doesn't introduce GH-127049. The proposed approach here is to change child watchers to behave closer to their analog on Windows (_WindowsSubprocessTransport), i.e. let subprocess handle the PID lifetime management instead of having subprocess do the allocation and asyncio do the deallocation, i.e. follow the guidance of #86724 (comment).

In the _ThreadedChildWatcher case this borrows the waitid + WNOWAIT idea from Trio. (See also #82811 (comment).) Note that in the _ThreadedChildWatcher case, while it would ideally be safe to just call Popen.wait in the thread instead of involving waitid + WNOWAIT, Popen currently has some thread-unsafeties, so running Popen.wait or Popen.poll in the thread in practice resulted in unsafe kill calls and broken transport._process_exited(returncode=None) calls. See the longer commit message.

I have marked this as a draft because it does not yet have regression tests. Both cases (_PidfdChildWatcher and _ThreadedChildWatcher) can have (nearly-)deterministic (but consistent regardless) regression tests, but I am not sure how folks would want them implemented, as the reproducers I put together for the report involve monkeypatching os.waitpid and os.kill (which is nontrivial in part because subprocess holds strong references to os.waitpid, and which will miss any os.waitid calls made without WNOWAIT unless we patch that too).

Anyway, if this PR is pursued, I'd appreciate some help with adding tests. This is also my first PR proposed to CPython so any feedback/pointers are appreciated. :)

…_signal in asyncio (python#121126)"

This reverts the non-test changes of commit bd473aa.
Note that we call Popen.poll/wait in the event loop thread to avoid
Popen's thread-unsafety. Without this workaround, when testing
_ThreadedChildWatcher with the reproducer shared in the linked issue on my
machine:
* Case 1 of the known race condition ignored in pythonGH-20010 is met (and an
  unsafe kill call is issued) about 1 in 10 times.
* The race condition pythonGH-127050 is met (causing _process_exited's assert
  returncode is not None to raise) about 1 in 30 times.
Copy link

cpython-cla-bot bot commented Nov 20, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant