-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rc3: exits immediately on command failure, resulting in many errors #2672
Comments
Problem: If any step in rc3 fails, the shell immediately exits, bypassing the rest of the shutdown. At best this results in a slew of errors about "module was not cleanly shutdown", but at worst could result in data loss as persistence rules could be skipped. Remove the -e option from the rc3 shebang line so that the rc3 script produces errors but attempts to run to completion. Fixes flux-framework#2672
Problem: If any step in rc3 fails, the shell immediately exits, bypassing the rest of the shutdown. At best this results in a slew of errors about "module was not cleanly shutdown", but at worst could result in data loss as persistence rules could be skipped. Remove the -e option from the rc3 shebang line so that the rc3 script produces errors but attempts to run to completion. Fixes flux-framework#2672
Sorry to come into this late. This was my bad - the behavior of The benefit of -e is to make the instance fail if something important like flushing the content cache to disk fails. Exiting immediately when anything goes wrong is not a great way to handle that, but ignoring all errors may be going too far. I'm fine with dropping the -e now and restoring some error handling as part of resolving #2650. Does that make sense? |
Yes, that seems reasonable! |
Since
rc3
is run with-e
, it exits immediately on the first command failure. When flux-sched is installed along with flux-core, theqmanager-start
rc1 script removessched-simple
in order to load theqmanager
module, but this results influx module remove sched-simple
failing for every instance. Sincerc3
then exits before unloading all other modules.E.g. on fluke:
I can't recollect the rationale for running
rc3
underbash -e
. If there isn't a good reason, I propose we remove that flag. I think we wantrc3
to try harder to run to completion.This may be fixed by upcoming initialization changes. However, I would consider the current flux RPMs broken, so perhaps we should throw in a fix for this issue and generate some new packages.
The text was updated successfully, but these errors were encountered: