Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_concurrent_futures.test_deadlock: test_crash_big_data() hangs randomly on Windows #107219

Closed
vstinner opened this issue Jul 25, 2023 · 24 comments
Labels
OS-windows tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Jul 25, 2023

GHA Windows x86 job, test_crash_big_data() hangs on ProcessPoolExecutor.shutdown(): https://github.com/python/cpython/actions/runs/5651960914/job/15310873235?pr=107217

  • Main thread: ProcessPoolExecutor.shutdown()
  • Thread 2: Threading.join()
  • Thread 3: queue _feed() => connection send_bytes()
(...)
0:39:52 load avg: 0.06 running: test_concurrent_futures (19 min 2 sec)
0:40:22 load avg: 0.05 running: test_concurrent_futures (19 min 32 sec)
0:40:51 load avg: 0.03 [447/447/2] test_concurrent_futures crashed (Exit code 1)
Timeout (0:20:00)!
Thread 0x000007e8 (most recent call first):
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 282 in _send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 199 in send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\queues.py", line 246 in _feed
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00001738 (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x0000103c (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  File "D:\a\cpython\cpython\Lib\concurrent\futures\process.py", line 836 in shutdown
  File "D:\a\cpython\cpython\Lib\concurrent\futures\_base.py", line 647 in __exit__
  File "D:\a\cpython\cpython\Lib\test\test_concurrent_futures.py", line 1386 in test_crash_big_data
  (...)
  File "D:\a\cpython\cpython\Lib\test\support\__init__.py", line 1241 in run_unittest
  File "D:\a\cpython\cpython\Lib\test\libregrtest\runtest.py", line 294 in _test_module
  (...)

Linked PRs

@vstinner vstinner added the type-bug An unexpected behavior, bug, or error label Jul 25, 2023
@vstinner
Copy link
Member Author

By the way, test.regrtest doesn't work with test_concurrent_futures: when test_concurrent_futures is re-run in verbose mode, no tests is ran!

0:40:52 Re-running test_concurrent_futures in verbose mode

----------------------------------------------------------------------
Ran 0 tests in 0.001s

NO TESTS RAN

This second bug maybe hides the first bug (test_concurrent_futures hangs sometimes).

@vstinner
Copy link
Member Author

Windows x64 job, also blocked on test_crash_big_data(): https://github.com/python/cpython/actions/runs/5652189009/job/15311447694

Similar threads state.

0:22:04 load avg: 5.60 [447/447/2] test_concurrent_futures crashed (Exit code 1)
Timeout (0:20:00)!
Thread 0x00001be0 (most recent call first):
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 282 in _send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\connection.py", line 199 in send_bytes
  File "D:\a\cpython\cpython\Lib\multiprocessing\queues.py", line 246 in _feed
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x000002a0 (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  (...)
  File "D:\a\cpython\cpython\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00000c18 (most recent call first):
  File "D:\a\cpython\cpython\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\cpython\cpython\Lib\threading.py", line 1126 in join
  File "D:\a\cpython\cpython\Lib\concurrent\futures\process.py", line 836 in shutdown
  File "D:\a\cpython\cpython\Lib\concurrent\futures\_base.py", line 647 in __exit__
  File "D:\a\cpython\cpython\Lib\test\test_concurrent_futures.py", line 1386 in test_crash_big_data
  (...)

@vstinner
Copy link
Member Author

vstinner commented Jul 25, 2023

Azure Pipelines: Windows PR Tests win32 hangs on test_interpreter_shutdown(): https://dev.azure.com/Python/cpython/_build/results?buildId=133201&view=logs&j=d554cd63-f8f4-5b2d-871b-33e4ea76e915&t=5a14d0eb-dbd4-5b80-f5d0-7909f950a1cc

  • Main thread: run_python_until_end() => subprocess.Popen.communicate()
  • Thread 2 (stdout?): subprocess _readerthread()
  • Thread 3 (stderr?): subprocess _readerthread()

In short, the main thread is waiting until the process completes and the process is killed after 20 minutes.

On the same CI, the win64 job ran test_concurrent_futures in 1 min 33 sec.

(...)
test_hang_gh94440 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest.test_hang_gh94440)
shutdown(wait=True) doesn't hang when a future was submitted and ... skipped 'Tested platform does not support the alarm signal'
test_hang_issue12364 (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolShutdownTest.test_hang_issue12364) ... ok

Timeout (0:20:00)!
Thread 0x00001184 (most recent call first):
  File "D:\a\1\s\Lib\subprocess.py", line 1597 in _readerthread
  File "D:\a\1\s\Lib\threading.py", line 989 in run
  File "D:\a\1\s\Lib\threading.py", line 1052 in _bootstrap_inner
  File "D:\a\1\s\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00001224 (most recent call first):
  File "D:\a\1\s\Lib\subprocess.py", line 1597 in _readerthread
  File "D:\a\1\s\Lib\threading.py", line 989 in run
  File "D:\a\1\s\Lib\threading.py", line 1052 in _bootstrap_inner
  File "D:\a\1\s\Lib\threading.py", line 1009 in _bootstrap

Thread 0x00001560 (most recent call first):
  File "D:\a\1\s\Lib\threading.py", line 1146 in _wait_for_tstate_lock
  File "D:\a\1\s\Lib\threading.py", line 1126 in join
  File "D:\a\1\s\Lib\subprocess.py", line 1626 in _communicate
  File "D:\a\1\s\Lib\subprocess.py", line 1209 in communicate
  File "D:\a\1\s\Lib\test\support\script_helper.py", line 139 in run_python_until_end
  File "D:\a\1\s\Lib\test\support\script_helper.py", line 149 in _assert_python
  File "D:\a\1\s\Lib\test\support\script_helper.py", line 166 in assert_python_ok
  File "D:\a\1\s\Lib\test\test_concurrent_futures.py", line 302 in test_interpreter_shutdown
  (...)
  File "<frozen runpy>", line 198 in _run_module_as_main

@AlexWaygood
Copy link
Member

I reported a similar issue in May 2022, but closed it as it seemed the issue stopped occurring in CI:

@AlexWaygood AlexWaygood added tests Tests in the Lib/test dir OS-windows labels Jul 26, 2023
@Eclips4
Copy link
Member

Eclips4 commented Aug 24, 2023

./python -m test -v test_concurrent_futures -m test_crash_big_data --forever get me this:

many lines..
0:00:05 [  9] test_concurrent_futures
test_crash_big_data (test.test_concurrent_futures.ProcessPoolForkExecutorDeadlockTest.test_crash_big_data) ... skipped 'require un
ix system'
test_crash_big_data (test.test_concurrent_futures.ProcessPoolForkserverExecutorDeadlockTest.test_crash_big_data) ... skipped 'requ
ire unix system'
test_crash_big_data (test.test_concurrent_futures.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data) ... Warning -- Uncaugh
t thread exception: InvalidStateError
Exception in thread Thread-9:
Traceback (most recent call last):
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\threading.py", line 1059, in _bootstrap_inner
    self.run()
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\process.py", line 344, in run
    self.terminate_broken(cause)
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\process.py", line 492, in terminate_broken
    work_item.future.set_exception(bpe)
  File "C:\Users\KIRILL-1\CLionProjects\cpython\Lib\concurrent\futures\_base.py", line 559, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x2cf561cf260 state=cancelled>
0.55s Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 11880)>
Warning -- Dangling thread: <Thread(QueueFeederThread, started daemon 4112)>
ok
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 11880)>
Warning -- Dangling thread: <Thread(QueueFeederThread, started daemon 4112)>

Sadly, but it's hard to reproduce.

@vstinner
Copy link
Member Author

Sadly, but it's hard to reproduce.

You can stress the system to make the issue more likely. For example, open a second terminal and run:

python -m test -j2

You can use -j4 or more depending on the number of CPUs and how much you want your machine to be stressed :-)

@Eclips4
Copy link
Member

Eclips4 commented Aug 25, 2023

Sadly, but it's hard to reproduce.

You can stress the system to make the issue more likely. For example, open a second terminal and run:

python -m test -j2

You can use -j4 or more depending on the number of CPUs and how much you want your machine to be stressed :-)

Oh, that's right! With -j8 (in a separate terminal) I can reproduce bug more easily.

@lazka
Copy link
Contributor

lazka commented Aug 25, 2023

(I'm also seeing this hang in the mingw fork after updating from 3.11.4 to 3.11.5)

@vstinner
Copy link
Member Author

Error on Linux, not sure if it's related.

aarch64 Fedora Stable LTO + PGO 3.x buildbot: https://buildbot.python.org/all/#/builders/524/builds/4310

Log (reformatted for readability):

FAIL: test_interpreter_shutdown (test.test_concurrent_futures.test_shutdown.ProcessPoolForkserverProcessPoolShutdownTest.test_interpreter_shutdown)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/test/test_concurrent_futures/test_shutdown.py", line 49, in test_interpreter_shutdown
    self.assertFalse(err)

AssertionError: b'Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/threading.py", line 1059, in _bootstrap_inner
    self.run()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/concurrent/futures/process.py", line 339, in run
    self.add_call_item_to_queue()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/concurrent/futures/process.py", line 394, in add_call_item_to_queue
    self.call_queue.put(_CallItem(work_id,
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/queues.py", line 94, in put
    self._start_thread()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/queues.py", line 177, in _start_thread
    self._thread.start()
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/threading.py", line 978, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can\'t create new thread at interpreter shutdown
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto-pgo/build/Lib/multiprocessing/synchronize.py", line 115, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory
' is not false

@vstinner
Copy link
Member Author

Similar error on another Linux machine.

ARM Raspbian 3.x: https://buildbot.python.org/all/#/builders/424/builds/4736

Logs (reformatted):

FAIL: test_interpreter_shutdown (test.test_concurrent_futures.test_shutdown.ProcessPoolForkserverProcessPoolShutdownTest.test_interpreter_shutdown)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_concurrent_futures/test_shutdown.py", line 49, in test_interpreter_shutdown
    self.assertFalse(err)
AssertionError: b'Exception in thread Thread-1:
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/threading.py", line 1059, in _bootstrap_inner
    self.run()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/process.py", line 339, in run
    self.add_call_item_to_queue()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/concurrent/futures/process.py", line 394, in add_call_item_to_queue
    self.call_queue.put(_CallItem(work_id,
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/queues.py", line 94, in put
    self._start_thread()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/queues.py", line 177, in _start_thread
    self._thread.start()
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/threading.py", line 978, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can\'t create new thread at interpreter shutdown
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/multiprocessing/synchronize.py", line 115, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory
' is not false

lazka added a commit to msys2-contrib/cpython-mingw that referenced this issue Aug 26, 2023
See python#107219
Once that is fixed this commit can be removed.
This is a commit and not just an addition to the skip list, since
we still run the skipped tests in CI and in this case everything would hang.
lazka added a commit to msys2-contrib/cpython-mingw that referenced this issue Aug 27, 2023
See python#107219
Once that is fixed this commit can be removed.
This is a commit and not just an addition to the skip list, since
we still run the skipped tests in CI and in this case everything would hang.
@vstinner
Copy link
Member Author

See also issue #105829.

@cjw296
Copy link
Contributor

cjw296 commented Aug 30, 2023

@vstinner - test_crash_big_data and test_interpreter_shutdown look like they might be separate problems.
Might be worth splitting into separate issues?

Also, #105829 appears to be a different issue entirely when many wakeups being sent result in a deadlock.

@vstinner
Copy link
Member Author

Would you mind to create a separated issue?

@cjw296
Copy link
Contributor

cjw296 commented Aug 30, 2023

@vstinner - you created this issue so probably makes more sense for you to do so?

vstinner added a commit to vstinner/cpython that referenced this issue Sep 6, 2023
Fix a race condition in _ExecutorManagerThread.terminate_broken():
ignore the InvalidStateError on future.set_exception(). It can happen
if the future is cancelled before the caller.

Moreover, test_crash_big_data() now waits explicitly until the
executor completes.
@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

To reproduce the test_crash_big_data() hang, I use this command on Windows:

python -m test test_concurrent_futures.test_deadlock -v -m test.test_concurrent_futures.test_deadlock.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data --timeout=30

I wrote PR #108974 to fix one of the bugs, InvalidStateError in terminate_broken().

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

To reproduce the test_crash_big_data() hang, I use this command on Windows:
python -m test test_concurrent_futures.test_deadlock -v -m test.test_concurrent_futures.test_deadlock.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data --timeout=30

By the way, if I interrupt this command with CTRL+C, sometimes... it hangs as well!

0:04:06 [381] test_concurrent_futures.test_deadlock
test_crash_big_data (test.test_concurrent_futures.test_deadlock.ProcessPoolSpawnExecutorDeadlockTest.test_crash_big_data) ... 

^C

Traceback (most recent call last):
  File "<string>", line 1, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\__init__.py", line 16, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\__init__.py", line 16, in <module>
    from . import context
    from . import context
  File "C:\victor\python\main\Lib\multiprocessing\context.py", line 6, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\context.py", line 6, in <module>
    from . import reduction
    from . import reduction
  File "C:\victor\python\main\Lib\multiprocessing\reduction.py", line 16, in <module>
  File "C:\victor\python\main\Lib\multiprocessing\reduction.py", line 15, in <module>
    import pickle
    import socket
  File "C:\victor\python\main\Lib\pickle.py", line 34, in <module>
  File "C:\victor\python\main\Lib\socket.py", line 52, in <module>
    import re
    import _socket
KeyboardInterrupt
  File "C:\victor\python\main\Lib\re\__init__.py", line 125, in <module>
    from . import _compiler, _parser
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1000, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1133, in get_code

Timeout (0:00:30)!
Thread 0x000005bc (most recent call first):
  File "C:\victor\python\main\Lib\multiprocessing\connection.py", line 282 in _send_bytes
  File "C:\victor\python\main\Lib\multiprocessing\connection.py", line 199 in send_bytes
  File "C:\victor\python\main\Lib\multiprocessing\queues.py", line 246 in _feed
  File "C:\victor\python\main\Lib\threading.py", line 996 in run
  File "C:\victor\python\main\Lib\threading.py", line 1059 in _bootstrap_inner
  File "C:\victor\python\main\Lib\threading.py", line 1016 in _bootstrap

Thread 0x000012a4 (most recent call first):
  File "C:\victor\python\main\Lib\threading.py", line 1153 in _wait_for_tstate_lock
  File "C:\victor\python\main\Lib\threading.py", line 1133 in join
  File "C:\victor\python\main\Lib\multiprocessing\queues.py", line 199 in _finalize_join
  File "C:\victor\python\main\Lib\multiprocessing\util.py", line 224 in __call__
  File "C:\victor\python\main\Lib\multiprocessing\queues.py", line 151 in join_thread
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 560 in join_executor_internals
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 514 in terminate_broken
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 344 in run
  File "C:\victor\python\main\Lib\threading.py", line 1059 in _bootstrap_inner
  File "C:\victor\python\main\Lib\threading.py", line 1016 in _bootstrap

Thread 0x00001368 (most recent call first):
  File "C:\victor\python\main\Lib\threading.py", line 1153 in _wait_for_tstate_lock
  File "C:\victor\python\main\Lib\threading.py", line 1133 in join
  File "C:\victor\python\main\Lib\concurrent\futures\process.py", line 843 in shutdown
  File "C:\victor\python\main\Lib\concurrent\futures\_base.py", line 647 in __exit__
  File "C:\victor\python\main\Lib\test\test_concurrent_futures\test_deadlock.py", line 236 in test_crash_big_data
  (...)

vstinner added a commit that referenced this issue Sep 6, 2023
Fix a race condition in _ExecutorManagerThread.terminate_broken():
ignore the InvalidStateError on future.set_exception(). It can happen
if the future is cancelled before the caller.

Moreover, test_crash_big_data() now waits explicitly until the
executor completes.
@vstinner
Copy link
Member Author

vstinner commented Sep 7, 2023

I analyzed the test_interpreter_shutdown() bug and I created issue #109047 which my findings. Please continue the discussion on test_interpreter_shutdown() in issue #109047.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Multiprocessing issues Sep 13, 2023
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 23, 2023
…utures

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
serhiy-storchaka added a commit that referenced this issue Sep 26, 2023
…GH-109780)

Follow-up of gh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2023
…utures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)

Co-authored-by: Serhiy Storchaka <[email protected]>
serhiy-storchaka added a commit that referenced this issue Sep 26, 2023
…futures (GH-109780) (GH-109882)

Follow-up of gh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)

Co-authored-by: Serhiy Storchaka <[email protected]>
csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
…utures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
Yhg1s pushed a commit that referenced this issue Oct 2, 2023
… (#109254)

gh-107219: Fix concurrent.futures terminate_broken() (GH-109244)

Fix a race condition in concurrent.futures. When a process in the
process pool was terminated abruptly (while the future was running or
pending), close the connection write end. If the call queue is
blocked on sending bytes to a worker process, closing the connection
write end interrupts the send, so the queue can be closed.

Changes:

* _ExecutorManagerThread.terminate_broken() now closes
  call_queue._writer.
* multiprocessing PipeConnection.close() now interrupts
  WaitForMultipleObjects() in _send_bytes() by cancelling the
  overlapped operation.
(cherry picked from commit a9b1f84)

Co-authored-by: Victor Stinner <[email protected]>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Nov 10, 2023
…rrent_futures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)

Co-authored-by: Serhiy Storchaka <[email protected]>
serhiy-storchaka added a commit that referenced this issue Nov 10, 2023
…futures (GH-109780) (GH-111934)

Follow-up of gh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
(cherry picked from commit 0b4e090)
encukou added a commit to encukou/cpython that referenced this issue Jan 23, 2024
…ot concurrent.futures

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by:	Serhiy Storchaka <[email protected]>
encukou added a commit that referenced this issue Jan 24, 2024
…current.futures (GH-114489)

This was left out of the 3.12 backport for three related issues:
- gh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- gh-109370 (which changes this to be only called on Windows)
- gh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 21, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Jul 11, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Jul 11, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Jul 11, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
naveen521kk pushed a commit to msys2-contrib/cpython-mingw that referenced this issue Aug 5, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…utures (pythonGH-109780)

Follow-up of pythongh-107219.

* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
  WSA_OPERATION_ABORTED.
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Sep 4, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

5 participants