Catching create job failures, updates to job definition #253

3coins · 2022-11-01T21:57:00Z

Current task runner will end the async loop if any exceptions are raised by the scheduler while creating the job. Fixed this by catching any exceptions during create_job step.
Resuming job definition after pausing will not add the task to the queue in certain scenarios. Fixed this by ensuring that on job definition update, task is added to the queue if not present. To reproduce
1. Create a scheduled job that runs every 2 mins, let the first job run.
2. Pause the job definition
3. Resume the job definition within the next 2 mins
4. Job will not run at the end of 2 mins

github-actions · 2022-11-01T21:57:15Z

👈 Launch a Binder on branch 3coins/jupyter-scheduler/catch-exceptions

dlqqq · 2022-11-01T22:53:00Z

Ah I see the issue with the current logic. Let's say creates a job definition to run every minute, and suppose the following actions occur:

t	action
0.0	job definition created
0.3	job definition paused => `cache.next_run_time = 1`
0.3 + poll_interval	`process_queue()` executes, removes task from heap because `active == False`
0.7	job definition resume => `next_run_time == 1 and next_run_time == cache.next_run_time` => task is not added back into the heap

To fix this, I don't think we need to scan the entire heap when updating a job definition. Instead we just need to change the condition to:

new_next_run_time = cached_next_run_time != next_run_time and active
resumed = model.active and not cache.active

... # update cache

if (new_next_run_time or resumed):
  ... # add task to heap

3coins · 2022-11-02T02:54:54Z

@dlqqq Updated as per our discussion.

3coins added the bug Something isn't working label Nov 1, 2022

3coins self-assigned this Nov 1, 2022

3coins added this to the 1.0 Release milestone Nov 1, 2022

3coins force-pushed the catch-exceptions branch from 49c3a4e to 170ee8e Compare November 1, 2022 22:04

3coins marked this pull request as ready for review November 1, 2022 22:07

3coins requested a review from dlqqq November 1, 2022 22:13

3coins removed this from the 1.0 Release milestone Nov 1, 2022

dlqqq mentioned this pull request Nov 1, 2022

task runner scheduling logic concerns #119

Open

Catching create job failures, updates to job definition

db3b513

3coins force-pushed the catch-exceptions branch from 170ee8e to db3b513 Compare November 2, 2022 02:44

dlqqq approved these changes Nov 2, 2022

View reviewed changes

3coins merged commit dbe2808 into jupyter-server:main Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catching create job failures, updates to job definition #253

Catching create job failures, updates to job definition #253

3coins commented Nov 1, 2022 •

edited

Loading

github-actions bot commented Nov 1, 2022

dlqqq commented Nov 1, 2022

3coins commented Nov 2, 2022

Catching create job failures, updates to job definition #253

Catching create job failures, updates to job definition #253

Conversation

3coins commented Nov 1, 2022 • edited Loading

github-actions bot commented Nov 1, 2022

dlqqq commented Nov 1, 2022

3coins commented Nov 2, 2022

3coins commented Nov 1, 2022 •

edited

Loading