We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: if the flux rank 0 broker crashes with running jobs and then restarts, the fluxion scheduler may fail when processing the hello protocol.
2023-01-17T16:10:17.222950Z sched-fluxion-qmanager.err[0]: hello: error loading R for id=202046916802905088: No such file or directory
To get fluxion loaded, those jobs have to be removed from the KVS.
Until we can handle recovering running jobs, we should probably just force these jobs into inactive state.
Caveat: tasks belonging to the job could still be running.
Edit: the fluxion bug that is triggered here is flux-framework/flux-sched#991
The text was updated successfully, but these errors were encountered:
I think #4894 addresses the short term issue here so closing.
Sorry, something went wrong.
No branches or pull requests
Problem: if the flux rank 0 broker crashes with running jobs and then restarts, the fluxion scheduler may fail when processing the hello protocol.
To get fluxion loaded, those jobs have to be removed from the KVS.
Until we can handle recovering running jobs, we should probably just force these jobs into inactive state.
Caveat: tasks belonging to the job could still be running.
Edit: the fluxion bug that is triggered here is flux-framework/flux-sched#991
The text was updated successfully, but these errors were encountered: