-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProcessPoolExecutor shutdown hangs after future cancel was requested #94440
Comments
- documentation enhancement
Submitted a PR with test and fix. It can be easily backported, and I’ll be happy to get on it too once we merge on the main line :) It took some time to pin point the issue, but once done, the fix is pretty surgical. The hang scenario happens when the shutdown sequence begins when the pending work items contains futures, and the all get canceled before the next wait. The addition makes sure there are always some running futures when we wait post shutdown, or none at all, in which case the children stopping kicks off. |
@AlexWaygood do you know which expert in multiprocessing I could tag here? I'm eager to see this resolved in the next versions :) |
Hi @brianquinlan, |
I experience the same issue (Python 3.8.16). It would be great if a fix (presumably #94468) got merged and backported! @brianquinlan, with the understanding that maintainers are busy and appreciated, would you have time to consider the PR? |
FYI, 3.8 is end of life (3.9 is also) — it now only receives fixes for security-related issues, and this isn't security-related. If you want bugfixes, you'll have to upgrade to at least Python 3.10, which is the oldest Python version that still has bugfixes backported to it. |
I'm not active on Python right now so I would not use my commit powers in any case :-( |
That’s good to know, thanks anyway! What if the way forward to get this merged? @AlexWaygood do you know maybe? |
I'm sorry you've had to wait such a long time for a review :( unfortunately we don't really have any multiprocessing experts on the core dev team at the moment, and it's a complex module, so few core devs feel confident reviewing PRs relating to multiprocessing. I'll ask around the core dev team and see if anybody could take a look. |
Thanks! Would be great to get this moving. The bug has been hiding there for a very long time, but when it gets triggered, it is quite painful. |
That is quite valid. Our packages are intended to work for python 3.7-3.11, so the PR still solves part of the problem. I did not experience issues with python>=3.9 (where I can call |
Yeah, I’ve been doing exactly the same. |
Fix an issue of concurrent.futures ProcessPoolExecutor shutdown hanging. Co-authored-by: Alex Waygood <[email protected]>
…thonGH-94468) Fix an issue of concurrent.futures ProcessPoolExecutor shutdown hanging. (cherry picked from commit 2dc9463) Co-authored-by: yonatanp <[email protected]> Co-authored-by: Alex Waygood <[email protected]>
…thonGH-94468) Fix an issue of concurrent.futures ProcessPoolExecutor shutdown hanging. (cherry picked from commit 2dc9463) Co-authored-by: yonatanp <[email protected]> Co-authored-by: Alex Waygood <[email protected]>
Thanks for the fix! 3.9 and earlier only receive security fixes at this point. |
Fix an issue of concurrent.futures ProcessPoolExecutor shutdown hanging. (cherry picked from commit 2dc9463) Co-authored-by: yonatanp <[email protected]> Co-authored-by: Alex Waygood <[email protected]>
Fix an issue of concurrent.futures ProcessPoolExecutor shutdown hanging. (cherry picked from commit 2dc9463) Co-authored-by: yonatanp <[email protected]> Co-authored-by: Alex Waygood <[email protected]>
* main: (34 commits) pythongh-102701: Fix overflow in dictobject.c (pythonGH-102750) pythonGH-78530: add support for generators in `asyncio.wait` (python#102761) Increase stack reserve size for Windows debug builds to avoid test crashes (pythonGH-102764) pythongh-102755: Add PyErr_DisplayException(exc) (python#102756) Fix outdated note about 'int' rounding or truncating (python#102736) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102760) pythongh-99726: Improves correctness of stat results for Windows, and uses faster API when available (pythonGH-102149) pythongh-102192: remove redundant exception fields from ssl module socket (python#102466) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102743) pythongh-102737: Un-ignore ceval.c in the CI globals check (pythongh-102745) pythonGH-102748: remove legacy support for generator based coroutines from `asyncio.iscoroutine` (python#102749) pythongh-102721: Improve coverage of `_collections_abc._CallableGenericAlias` (python#102722) pythonGH-102653: Make recipe docstring show the correct distribution (python#102742) Add comments to `{typing,_collections_abc}._type_repr` about each other (python#102752) pythongh-102594: PyErr_SetObject adds note to exception raised on normalization error (python#102675) pythongh-94440: Fix issue of ProcessPoolExecutor shutdown hanging (python#94468) pythonGH-100112: avoid using iterable coroutines in asyncio internally (python#100128) pythongh-102690: Use Edge as fallback in webbrowser instead of IE (python#102691) pythongh-102660: Fix Refleaks in import.c (python#102744) pythongh-102738: remove from cases generator the code related to register instructions (python#102739) ...
…thon#94468) Fix an issue of concurrent.futures ProcessPoolExecutor shutdown hanging. Co-authored-by: Alex Waygood <[email protected]>
…thon#94468) Fix an issue of concurrent.futures ProcessPoolExecutor shutdown hanging. Co-authored-by: Alex Waygood <[email protected]>
I think it is still not working on Python3.10.11.
|
Bug report
With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to
shutdown(wait=True)
would hang indefinitely.This happens pretty much on all platforms and all recent Python versions.
Here is a minimal reproduction:
The first submission gets the executor going and creates its internal
queue_management_thread
.The second submission appears to get that thread to loop, enter a wait state, and never receive a wakeup event.
Introducing a tiny sleep between the second submit and its cancel request makes the issue disappear. From my initial observation it looks like something in the way the
queue_management_worker
internal loop is structured doesn't handle this edge case well.Shutting down with
wait=False
would return immediately as expected, but thequeue_management_thread
would then die with an unhandledOSError: handle is closed
exception.Environment
Additional info
When tested with
pytest-timeout
under Ubuntu and cpython 3.8.13, these are the tracebacks at the moment of timing out:Tracebacks in PyPy are similar on the
concurrent.futures.process
level. Tracebacks in Windows are different in the lower-level areas, but again similar on theconcurrent.futures.process
level.Linked PRs:
Linked PRs
The text was updated successfully, but these errors were encountered: