-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another deadlock in the preamble of WorkerState.execute
#6869
Comments
class Worker:
async def execute(self, ..., result_handler):
...
return result_handler(AlreadyCancelledEvent(...))
cc @graingert
Overall, I believe we should just drop |
Yes, I agree AlreadyCancelledEvent is superfluous |
I'm strongly opposed to this. It would solve only 1 out of the 4 async points of |
Where would that event come from? This is some code executing loop.call_soon during the last
If the threadpool has an idle worker I'd except the threadpool to win a race with call_soon I think doing something like: async def await_then_call(async_fn, fn):
fn(await async_fn())
create_task(await_then_call(execute, callback)) Rather than changing execute would be cleaner, if you do plan on doing that |
This is tightly related with #6867 and dask/dask#9330.
There is a deadlock which is triggered by this code path:
distributed/distributed/worker.py
Lines 2152 to 2157 in f43bc47
which in turn triggers:
distributed/distributed/worker_state_machine.py
Lines 2931 to 2939 in f43bc47
The deadlock should be reproducible as follows:
handle_stimulus(ComputeTaskEvent(key="x")
ts.state=executing; create asyncio task for Worker.execute
handle_stimulus(FreeKeysEvent(keys=["x"])
ts.state=cancelled
await asyncio.sleep(0)
Worker.execute runs and returns
AlreadyCancelledEvent
.This causes the
_handle_stimulus_from_task
callback to be appended to the end of the event loop.However, the test suite is before that in the event loop:
handle_stimulus(ComputeTaskEvent(key="x")
ts.state=resumed
await ...
(anything that releases the event loop)This runs
_handle_stimulus_from_task
,which runs
_handle_already_cancelled
,which returns
{ts: "released"}
,which triggers the (resumed, released) transition,
which sends the task to cancelled state, while the scheduler thinks it's running.
@fjetter @gjoseph92 my head is spinning.
The text was updated successfully, but these errors were encountered: