-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable system instance to be restarted without affecting running jobs #3801
Comments
Let's use this issue as a tracking issue for getting basic support for restarting brokers without affecting running jobs. If that is ok then I will open some missing individual issues, and link in those and existing issues in a checklist in the initial post. To get things started, here's a list of items off the top of my head (I'll open issues on some of these and move them to a checklist above) Anyone should feel free to edit and add to this or a checklist above.
|
It occurs to me we could develop support for this issue in several phases:
For the job shell, we could first support restart of single shell jobs, then tackle restart of multi-shell jobs. So for a first step, we should target a restart of a leaf broker running a single shell rank job. |
As we work through failure modes in the system instance, it would be helpful if brokers or broker subtrees could be rebooted without affecting workloads.
The text was updated successfully, but these errors were encountered: