-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libflux: drop child watchers and the FLUX_REACTOR_SIGCHLD flag #6543
Conversation
Problem: several subprocess unit tests check for file descriptor leaks but don't account for file descriptors opened by the reactor itself at runtime, such as for signalfd(2). Move fd sampling period to enclose reactor creation/destruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This LGTM!
Pushing a fix for this ASAN failure:
|
33ad02f
to
96df522
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6543 +/- ##
==========================================
+ Coverage 83.66% 83.96% +0.30%
==========================================
Files 523 515 -8
Lines 88086 86092 -1994
==========================================
- Hits 73693 72291 -1402
+ Misses 14393 13801 -592
|
Problem: libsubprocess unit tests occasionally hang when run with a reactor that uses EVFLAG_NOSIGMASK, which may be required. Block SIGCHLD in the test server before spawning threads to avoid it being delivered to the client thread occasionally.
Problem: child watchers are only used by libsubprocess, and probably do not need to be in the public API since libsubprocess is provided for managing child processes. In addition, we are considering porting Flux to libuv and libuv does not offer similar functionality. Register a signal watcher for SIGCHLD within libsubprocess that persists as long as there are subprocesses to monitor, and calls waitpid(2) to consume all subprocess state changes. Add a hash by pid and allow subprocess objects to register a callback to receive these changes for a given pid. Have all subprocess users create reactors without FLUX_REACTOR_SIGCHLD, as the default ev_loop registers a SIGCHLD watcher that conflicts with this one. Add EVFLAG_SIGNALFD to the non-default loop, as that flag was used with reactors created with FLUX_REACTOR_SIGCHLD, and appears to be be required to avoid sharness tests hanging randomly.
Problem: child watchers have no users, are likely not required in the public API, and inhibit changing the internal reactor to libuv. Drop them. Fixes flux-framework#6512
Problem: the FLUX_REACTOR_SIGCHLD flag has no users. Drop this flag. Update unit test.
Problem: child watchers are removed from the public API but man pages remain. Drop man pages.
Problem: FLUX_REACTOR_SIGCHLD has been dropped from the public API but it is still mentioned in the man page. Drop it from the man page.
Problem: signals should generally only be handled in the main thread of a multi-threaded program, but this is undocumented. Add a note to the signal watcher man page.
96df522
to
17e1b1f
Compare
Just did a bit of cleanup, simplifying the commit message of the main libsubprocess commit, and splitting out the libtestutil change to signal handling which is independent. I also added a note about signal watchers in multi-threaded programs to |
Alright, thanks for the review. I'll set MWP. |
This drops child watchers and the FLUX_REACTOR_SIGCHLD flag from the public API, as discussed in #6512. The assumption is that if flux users are managing subprocesses they should be using the subprocess API or making the documented
rexec
calls to the broker.Use a SIGCHLD signal watcher in libsubprocess instead of child watchers. This was the only known user.
Besides simplifying the public API, this eliminates a roadblock to swapping out the internal libev for libuv.