-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert that a fetch->cancelled->resumed->fetch cycle is impossible #6460
Conversation
d589739
to
b5beb4c
Compare
rerun tests Note: rerunning gpuCI tests since those errors should be fixed by #6434 |
b5beb4c
to
7a084a7
Compare
7a084a7
to
0f1b675
Compare
Yes, this should probably be asserted/validated as part of the if clause |
How about adding diff --git a/distributed/worker.py b/distributed/worker.py
index c306c5af5..73d519d57 100644
--- a/distributed/worker.py
+++ b/distributed/worker.py
@@ -4250,13 +4250,13 @@ class Worker(ServerNode):
def validate_task_cancelled(self, ts):
assert ts.key not in self.data
- assert ts._previous
+ assert ts._previous in {"long-running", "executing", "flight"}
assert ts._next is None # We'll always transition to released after it is done
def validate_task_resumed(self, ts):
assert ts.key not in self.data
assert ts._next
- assert ts._previous
+ assert ts._previous in {"long-running", "executing", "flight"}
def validate_task_released(self, ts):
assert ts.key not in self.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this assert. Not sure how valuable it will be but I agree if it fails, it should fail hard.
I opened #6488 for the other suggestion such that we can merge this faster
transition_resumed_fetch
calls_transition_from_resumed
, which in turn contains this:This would cause a deadlock in
transition_resumed_fetch
, as nothing would kick off_ensure_communicating
and the task would potentially stay hanging in fetch state forever.However, after further inspection of the code, I understand that the only possible values for
ts._previous
are None,executing
,long-running
, andflight
.