-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_concurrent_futures.test_deadlock: test_crash_big_data() hangs randomly on Windows #107219
Comments
By the way, test.regrtest doesn't work with test_concurrent_futures: when test_concurrent_futures is re-run in verbose mode, no tests is ran!
This second bug maybe hides the first bug (test_concurrent_futures hangs sometimes). |
Windows x64 job, also blocked on Similar threads state.
|
Azure Pipelines: Windows PR Tests win32 hangs on
In short, the main thread is waiting until the process completes and the process is killed after 20 minutes. On the same CI, the win64 job ran test_concurrent_futures in 1 min 33 sec.
|
I reported a similar issue in May 2022, but closed it as it seemed the issue stopped occurring in CI: |
many lines..
0:00:05 [ 9] test_concurrent_futures
test_crash_big_data (test.test_concurrent_futures.ProcessPoolForkExecutorDeadlockTest.test_crash_big_data) ... skipped 'require un
ix system'
test_crash_big_data (test.test_concurrent_futures.ProcessPoolForkserverExecutorDeadlockTest.test_crash_big_data) ... skipped 'requ
ire unix system'
test_crash_big_data (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data) ... Warning -- Uncaugh
t thread exception: InvalidStateError
Exception in thread Thread-9:
Traceback (most recent call last):
File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\threading.py", line 1059, in _bootstrap_inner
self.run()
File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\process.py", line 344, in run
self.terminate_broken(cause)
File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\process.py", line 492, in terminate_broken
work_item.future.set_exception(bpe)
File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\_base.py", line 559, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x2cf561cf260 state=cancelled>
0.55s Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 11880)>
Warning -- Dangling thread: <Thread(QueueFeederThread, started daemon 4112)>
ok
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 11880)>
Warning -- Dangling thread: <Thread(QueueFeederThread, started daemon 4112)> Sadly, but it's hard to reproduce. |
You can stress the system to make the issue more likely. For example, open a second terminal and run:
You can use |
Oh, that's right! With -j8 (in a separate terminal) I can reproduce bug more easily. |
(I'm also seeing this hang in the mingw fork after updating from 3.11.4 to 3.11.5) |
Error on Linux, not sure if it's related. aarch64 Fedora Stable LTO + PGO 3.x buildbot: https://buildbot.python.org/all/#/builders/524/builds/4310 Log (reformatted for readability):
|
Similar error on another Linux machine. ARM Raspbian 3.x: https://buildbot.python.org/all/#/builders/424/builds/4736 Logs (reformatted):
|
See python#107219 Once that is fixed this commit can be removed. This is a commit and not just an addition to the skip list, since we still run the skipped tests in CI and in this case everything would hang.
See python#107219 Once that is fixed this commit can be removed. This is a commit and not just an addition to the skip list, since we still run the skipped tests in CI and in this case everything would hang.
See also issue #105829. |
Would you mind to create a separated issue? |
@vstinner - you created this issue so probably makes more sense for you to do so? |
Fix a race condition in _ExecutorManagerThread.terminate_broken(): ignore the InvalidStateError on future.set_exception(). It can happen if the future is cancelled before the caller. Moreover, test_crash_big_data() now waits explicitly until the executor completes.
To reproduce the
I wrote PR #108974 to fix one of the bugs, InvalidStateError in terminate_broken(). |
By the way, if I interrupt this command with CTRL+C, sometimes... it hangs as well!
|
Fix a race condition in _ExecutorManagerThread.terminate_broken(): ignore the InvalidStateError on future.set_exception(). It can happen if the future is cancelled before the caller. Moreover, test_crash_big_data() now waits explicitly until the executor completes.
…utures Follow-up of pythongh-107219. * Only close the connection writer on Windows. * Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of WSA_OPERATION_ABORTED.
…GH-109780) Follow-up of gh-107219. * Only close the connection writer on Windows. * Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of WSA_OPERATION_ABORTED.
…utures (pythonGH-109780) Follow-up of pythongh-107219. * Only close the connection writer on Windows. * Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of WSA_OPERATION_ABORTED. (cherry picked from commit 0b4e090) Co-authored-by: Serhiy Storchaka <[email protected]>
…futures (GH-109780) (GH-109882) Follow-up of gh-107219. * Only close the connection writer on Windows. * Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of WSA_OPERATION_ABORTED. (cherry picked from commit 0b4e090) Co-authored-by: Serhiy Storchaka <[email protected]>
…utures (pythonGH-109780) Follow-up of pythongh-107219. * Only close the connection writer on Windows. * Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of WSA_OPERATION_ABORTED.
… (#109254) gh-107219: Fix concurrent.futures terminate_broken() (GH-109244) Fix a race condition in concurrent.futures. When a process in the process pool was terminated abruptly (while the future was running or pending), close the connection write end. If the call queue is blocked on sending bytes to a worker process, closing the connection write end interrupts the send, so the queue can be closed. Changes: * _ExecutorManagerThread.terminate_broken() now closes call_queue._writer. * multiprocessing PipeConnection.close() now interrupts WaitForMultipleObjects() in _send_bytes() by cancelling the overlapped operation. (cherry picked from commit a9b1f84) Co-authored-by: Victor Stinner <[email protected]>
…rrent_futures (pythonGH-109780) Follow-up of pythongh-107219. * Only close the connection writer on Windows. * Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of WSA_OPERATION_ABORTED. (cherry picked from commit 0b4e090) Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…current.futures (GH-114489) This was left out of the 3.12 backport for three related issues: - gh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - gh-109370 (which changes this to be only called on Windows) - gh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…utures (pythonGH-109780) Follow-up of pythongh-107219. * Only close the connection writer on Windows. * Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of WSA_OPERATION_ABORTED.
…ot concurrent.futures (pythonGH-114489) This was left out of the 3.12 backport for three related issues: - pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`) - pythongh-109370 (which changes this to be only called on Windows) - pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`) Without this change, ProcessPoolExecutor sometimes hangs on Windows when a worker process is terminated. Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
GHA Windows x86 job, test_crash_big_data() hangs on
ProcessPoolExecutor.shutdown()
: https://github.com/python/cpython/actions/runs/5651960914/job/15310873235?pr=107217ProcessPoolExecutor.shutdown()
Threading.join()
_feed()
=> connectionsend_bytes()
Linked PRs
The text was updated successfully, but these errors were encountered: