You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: job preemption in fluxion, as discussed in #5739, might take a while to get done and there is a demand for preemptible jobs now.
As a stopgap, we could consider implementing a jobtap plugin that
tracks preemptibility of running jobs (e.g. preempt-after is set and the specified runtime has elapsed)
tracks pending jobs
cancels selected preemptible jobs when there is queue pressure
Not being a part of the scheduler makes it hard to select the minimum set of jobs to cancel. But maybe heuristics could provide a passable stopgap implementation.
A very dumb version could just cancel all preemptible jobs whenever one or more non-preemptible jobs have been pending longer than some period of time. It seems like there should be plenty of simple ways to improve upon that by considering pending and running job sizes, canceling jobs one by one until the queue pressure disappears, phone a friend, etc..
The text was updated successfully, but these errors were encountered:
Problem: job preemption in fluxion, as discussed in #5739, might take a while to get done and there is a demand for preemptible jobs now.
As a stopgap, we could consider implementing a jobtap plugin that
preempt-after
is set and the specified runtime has elapsed)Not being a part of the scheduler makes it hard to select the minimum set of jobs to cancel. But maybe heuristics could provide a passable stopgap implementation.
A very dumb version could just cancel all preemptible jobs whenever one or more non-preemptible jobs have been pending longer than some period of time. It seems like there should be plenty of simple ways to improve upon that by considering pending and running job sizes, canceling jobs one by one until the queue pressure disappears, phone a friend, etc..
The text was updated successfully, but these errors were encountered: