-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-shutdown: need option to force fast shutdown #5843
Comments
To review the current situation, in
When shudown begins, either due to a SIGTERM from The Here's a straw man proposal:
Also: I think some relief may be had once we get #5818 worked out. In that proposal, jobs transition to INACTIVE before the housekeeping script completes. If housekeeping gets hung, it doesn't prevent the instance from stopping, and when it restarts, any still running housekeeping scripts are ignored. We probably need a way to reacquire any running housekeeping tasks on restart and avoid scheduling on those nodes, but the proposed behavior is probably a step in the right direction. |
FYI - I think this particular issue was fixed by flux-framework/flux-coral2#141 |
There are several things that currently block an orderly flux shutdown, including slow epilogs that hold jobs in CLEANUP, bugs in jobtap plugins that leave jobs needing manual cleanup, etc.
It would be nice to have an option to bypass waiting for jobs in CLEANUP until Flux supports a restart with running/cleanup jobs.
Perhaps we could also add another shutdown script that automatically "fixes" any jobs in cleanup by forcing missing epilog-finish events.
The text was updated successfully, but these errors were encountered: