-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race conditions from fetch to compute while AMM requests replica #6248
Conversation
I took a brief look at this. No objection from me, but I didn't dive deeply into the logic. If tests pass and you're feeling confident about the added value @fjetter I think that it's ok to merge. If you can get someone like @crusaderky or @gjoseph92 to take a look that would be better of course. |
CI seems to get stuck on |
1a1b6bd
to
e1241ea
Compare
Unit Test Results 16 files ± 0 16 suites ±0 7h 17m 0s ⏱️ - 16m 32s For more details on these failures, see this check. Results for commit e6e4b50. ± Comparison against base commit 2286896. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't fully understand what these changes are doing (especially reordering the transition_memory_released
code). But if the test you've added don't pass without them, that seems like a good sign?
for dts in ts.waiters: | ||
if dts.state in ("no-worker", "processing"): | ||
recommendations[dts.key] = "waiting" | ||
elif dts.state == "waiting": | ||
dts.waiting_on.add(ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm: the reason for moving this code later is because transitions are insertion-ordered, so this way key
gets transitioned to forgotten
/waiting
before its waiters get transitioned to waiting
?
I wonder if this block should even be in transition_memory_released
at all? Why shouldn't this part be done in transition_released_waiting
and transition_released_forgotten
? The fact that we need to make this transition after we make the waiting
/forgotten
transition makes me think we're overstepping our job in this function, and this should be the job of the other transitions.
I guess I didn't know that the recommendations dict was considered ordered. (Obviously dicts are now ordered in Python, but did it used to be an OrderedDict in py<=3.6?) If there's some dependency structure in the recommendations (transition A has to happen before transition B), I'd think transition A should be responsible for recommending transition B, not that they should be mixed together. That seems easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm: the reason for moving this code later is because transitions are insertion-ordered,
Yes, dictionaries are insertion ordered and popitem pops from the end
In [1]: adict = {}
In [2]: for x in range(10):
...: adict[x] = x
...:
In [3]: adict.popitem()
Out[3]: (9, 9)
I wonder if this block should even be in transition_memory_released at all?
Yes, it should. If a task that was in memory is no longer in memory and a waiter, i.e. a task to be executed, exists, we need to ensure that this waiter is released. These few lines allow task to be resilient to worker failures.
Why shouldn't this part be done in transition_released_waiting
This transition is not there to reset/forget/release something but it schedules something for compute
and transition_released_forgotten
This transition should only be triggered after the scheduler deletes the entire graph. This should only ever have any scheduling consequences if there are multiple graphs scheduled that share keys. Otherwise this is simply a sophisticated pop task
If there's some dependency structure in the recommendations (transition A has to happen before transition B), I'd think transition A should be responsible for recommending transition B, not that they should be mixed together.
It's not about a dependency but about order.
In this specific case (see test
{'f1': 'waiting', 'f2': 'waiting'}
means:
- Transition f2 from processing back to waiting
- Then transition f1 from released to waiting (f1 was in memory/released before)
I don't see how we could ever infer "please transition f1 to waiting" after we released f1. From a causality perspective, I don't see how we could map this as a dependency
Edit: In a previous statement I argued that it's about finishing a chain of a tasks transitions but this was false. It's quite the opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two outstanding cosmetic issues for me
I think I'm seeing two genuine regressions in the unit tests |
The test regression that's left is fixed with #6297 |
@@ -647,6 +648,7 @@ def __init__( | |||
("ready", "released"): self.transition_generic_released, | |||
("released", "error"): self.transition_generic_error, | |||
("released", "fetch"): self.transition_released_fetch, | |||
("released", "missing"): self.transition_released_fetch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo - this should be transition_generic_missing
.
This causes an infinite transition loop (#6305):
- The scheduler calls
handle_compute_task
with an empty who_has. This is a broken use case to begin with, but it is happening and the worker should either cope with it gracefully or crash loudly inhandle_compute_task
itself. handle_compute_task
creates the dependencies withensure_task_exists
in state=released and requests a transition to fetch- The released->fetch transition, handled by
transition_released_fetch
, notices that who_has is empty, so instead of transitioning to fetch it returns{ts: missing}
and keeps the task in released state - The released->missing transition, erroneously handled by the same method, repeats point 3 forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "fixed" by #6318?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Closes #6244
Two things in here
The order in which we fill the recommendations impacts the order in which we send messages. This specific case could mean
depending on the order, at the time the
handle-compute T1
signal arrives at B, the state will be eitherT1: waiting
T2: waiting
(which is strictly speaking false but shouldn't be harmful)or just
T1:waiting
, i.e. T1 has no dependents from the POV of the workerThe problem of #6244 is that a task is requested to be fetched and has no dependents on the worker. Therefore, if the worker is asked to release this task, there is no reason to hold on to it and it transitions it to forgotten. There is currently no way out of forgotten other than through another handle_compute or acquire_replica (i.e. another ensure_task_exists).
The particular transition fetch->compute will expand to a fetch->released->compute and we do not want to forget about the task. There are two options to avoid this