Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: jobtap plugin as a stopgap for proper scheduler-driven preemption #6524

Open
garlick opened this issue Dec 18, 2024 · 0 comments
Open

Comments

@garlick
Copy link
Member

garlick commented Dec 18, 2024

Problem: job preemption in fluxion, as discussed in #5739, might take a while to get done and there is a demand for preemptible jobs now.

As a stopgap, we could consider implementing a jobtap plugin that

  • tracks preemptibility of running jobs (e.g. preempt-after is set and the specified runtime has elapsed)
  • tracks pending jobs
  • cancels selected preemptible jobs when there is queue pressure

Not being a part of the scheduler makes it hard to select the minimum set of jobs to cancel. But maybe heuristics could provide a passable stopgap implementation.

A very dumb version could just cancel all preemptible jobs whenever one or more non-preemptible jobs have been pending longer than some period of time. It seems like there should be plenty of simple ways to improve upon that by considering pending and running job sizes, canceling jobs one by one until the queue pressure disappears, phone a friend, etc..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant