Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler.reschedule() works only by accident #6339

Merged
merged 6 commits into from
Jun 30, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented May 13, 2022

Supersedes #6307
Closes #6340

@crusaderky crusaderky self-assigned this May 13, 2022
@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch from 78b81d8 to 0469601 Compare May 13, 2022 17:02
@crusaderky crusaderky marked this pull request as ready for review May 13, 2022 17:02
@mrocklin
Copy link
Member

Looks like a valid failing test

@crusaderky
Copy link
Collaborator Author

Hilarious. The test passed when the worker was NOT calling Scheduler.reschedule due to mismatched signature. Now that it does, it breaks the state machine. Which has suddenly bumped up the importance and scope of this PR.

This now closes #6340.
I'm parking it for the time being.

@crusaderky crusaderky removed their assignment May 13, 2022
@crusaderky crusaderky changed the title Support receiving stimulus_id's in Scheduler.reschedule Scheduler.reschedule() works only by accident May 13, 2022
@crusaderky crusaderky marked this pull request as draft May 13, 2022 20:41
@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch from d3dfa55 to 3f20835 Compare May 13, 2022 20:47
@github-actions
Copy link
Contributor

Unit Test Results

       15 files  ±0         15 suites  ±0   6h 52m 1s ⏱️ + 1m 22s
  2 775 tests ±0    2 695 ✔️ +1    78 💤  - 1  2 ±0 
20 587 runs  ±0  19 672 ✔️  - 7  906 💤 ±0  9 +7 

For more details on these failures, see this check.

Results for commit 3f20835. ± Comparison against base commit 50d2911.

@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch 2 times, most recently from 2e4522e to 611c85d Compare June 27, 2022 22:38
@github-actions
Copy link
Contributor

github-actions bot commented Jun 27, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   10h 23m 2s ⏱️ + 1m 37s
  2 898 tests +1    2 812 ✔️ ±0    83 💤 ±0  3 +1 
21 467 runs  +9  20 502 ✔️ +2  962 💤 +6  3 +1 

For more details on these failures, see this check.

Results for commit e7bb31b. ± Comparison against base commit a8eb3b2.

♻️ This comment has been updated with latest results.

@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch from 611c85d to b3ffa87 Compare June 28, 2022 12:52

for future in x:
s.reschedule(key=future.key)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test remained green if you removed these lines

await f
s.set_restrictions(worker={f.key: a.address})
assert s.tasks[f.key].worker_restrictions == {a.address}
s.reschedule(f)
Copy link
Collaborator Author

@crusaderky crusaderky Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was silently doing nothing (future f is not in s.tasks). The task was executed only once, on b.

async def test_reschedule(c, s, a, b):
await s.extensions["stealing"].stop()
@pytest.mark.slow
@pytest.mark.parametrize("long_running", [False, True])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for _transition_long_running_rescheduled

@crusaderky crusaderky self-assigned this Jun 28, 2022
@crusaderky crusaderky marked this pull request as ready for review June 28, 2022 13:55
@hendrikmakait hendrikmakait self-requested a review June 29, 2022 15:31
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, feel free to ignore the comments.

@@ -6568,7 +6568,9 @@ async def get_story(self, keys_or_stimuli: Iterable[str]) -> list[tuple]:

transition_story = story

def reschedule(self, key=None, worker=None):
def reschedule(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-scope comment: Is there a better name that highlights that reschedule does not actually reschedule, i.e., it does not schedule the task somewhere else, it merely cancels the previous scheduling decision? For example, deschedule might be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well no. A transition to released will automatically kick the task back to waiting.
The functional description of the method is accurate. The "released" bit is an implementation detail.
I'm adding a comment to explain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense from that perspective, I was thinking that the function doesn't include the rescheduling bit, but in the end you're right, the releasing/descheduling automatically achieves the aim of rescheduling.

distributed/tests/test_scheduler.py Outdated Show resolved Hide resolved
@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch from e7bb31b to f0bf899 Compare June 30, 2022 11:15
@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch from f0bf899 to 950e4bc Compare June 30, 2022 11:27
@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch from 950e4bc to 355cc0f Compare June 30, 2022 12:16
assert any(isinstance(ev, RescheduleEvent) for ev in a.state.stimulus_log)
assert all(f.key in b.data for f in futures)


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two tests below are new

@@ -2201,6 +2196,7 @@ def _transition_released_forgotten(
("cancelled", "missing"): _transition_cancelled_released,
("cancelled", "waiting"): _transition_cancelled_waiting,
("cancelled", "forgotten"): _transition_cancelled_forgotten,
("cancelled", "rescheduled"): _transition_cancelled_released,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this test_cancelled_reschedule would cause a RecursionError

@crusaderky crusaderky force-pushed the support-stimulus-id-in-reschedule branch from 355cc0f to 9410072 Compare June 30, 2022 12:21
@crusaderky
Copy link
Collaborator Author

@hendrikmakait there have been substantial changes from your review; could you give it a second pass please?

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the rescheduling of cancelled tasks!

distributed/tests/test_reschedule.py Outdated Show resolved Hide resolved
@crusaderky crusaderky merged commit ff250f2 into dask:main Jun 30, 2022
@crusaderky crusaderky deleted the support-stimulus-id-in-reschedule branch June 30, 2022 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler.reschedule() does not work
3 participants