-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell: do not fail on oom init failure #6004
Conversation
A couple quick thoughts.
|
Ahhh, lets pass a |
36d6788
to
c853a4b
Compare
re-pushed, tweaking per some comments above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one very minor nit inline.
ERRNO_SAFE_WRAP (free, path); | ||
return -1; | ||
shell_warn ("disabling oom detection: %s", error.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If (very unlikely) flux_plugin_get_shell()
fails, then error.text
will be used uninitialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh and I guess I didn't set an error message for the potential calloc()
failure. I moved the !shell
check into oom_create()
and set an appropriate error message there.
Also noticed some tabbing was wrong, so added a commit for that as well.
7bd8549
to
d1fda0e
Compare
Problem: Some function parameters in the oom_create() function were not tabbed over correctly. Fix the tabbing.
Problem: When inotify fails for the oom plugin, the job-shell outputs a fairly useless error message of: 1.202s: flux-shell[98]: ERROR: plugin 'oom': shell.init failed 1.202s: flux-shell[98]: FATAL: shell_init Solution: Add extra messages so the user has more of a clue why the job did not run. For the inotify error, the user should now see someting like the following. 7.111s: flux-shell[11]: WARN: oom: max number of user inotify instances has been reached
Problem: The oom detector in the job shell is not critical for job execution. However, if it fails to be initialized, the job shell will exit with a shll.init failure and the job will fail. Solution: Output a warning that oom detection is disabled when initialization fails. Do not fail to execute the job-shell. The user should now see something like the following for an inotify error. 2.636s: flux-shell[113]: WARN: oom: disabling oom detection: max number of user inotify instances has been reached Fixes flux-framework#5920
d1fda0e
to
79b4d5c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6004 +/- ##
==========================================
- Coverage 83.28% 83.28% -0.01%
==========================================
Files 518 518
Lines 83427 83439 +12
==========================================
+ Hits 69486 69492 +6
- Misses 13941 13947 +6
|
Problem: The oom detector in the job shell is not critical for job execution. However, if it fails to be initialized, the job shell
will exit with a shll.init failure and the job will fail.
Solution: Output a warning that oom detection is disabled when initialization fails. Do not fail to execute the job-shell. The user should now see something like the following for the inotify error.
2.220s: flux-shell[69]: WARN: oom: error setting up inotify: Too many open files
2.220s: flux-shell[69]: WARN: oom: oom detection disabled
I was in the general area of this code b/c of #6002, so decided to try and deal with this issue.
Question, should i log ERROR instead of WARN for the actual inotify error? B/c that is an actual error .. the warning is really that the detection is disabled.