Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forward signals to entire process group in flux-imp run #195

Merged
merged 6 commits into from
Nov 6, 2024

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Nov 6, 2024

This PR fixes #194 by allowing the IMP to forward signals to the entire process group with flux-imp run.

Additionally, an issue with blocked signals (specifically in t2002-imp-run.t) was fixed here and a test for SIGINT handling added to the testsuite.

Problem: The current IMP signal interface does not allow a negative
PID to be passed to imp_set_signal_child(), and therefore signals can
only be forwarded to the direct child of the IMP. This works well with
`flux-imp exec`, where the child is a job shell which itself forwards
signals, but doesn't work as well in other situations where the IMP
is managing processes not running as a job.

Allow a negative PID less than -1 to be passed to
imp_set_signal_child().  This will allow the caller to request an
entire process group be signaled, instead of just a single child PID.
Problem: Before the removal of `flux-imp kill`, commands invoked by
`flux-imp run` were signaled by process group instead of an individual
process ID. This worked well because `flux-imp run` replaced itself
with the target command, and thus the command and its children were
in a process group (set up by flux-core's libsubprocess). Now, the
IMP itself will be in that process group, and if it tries to forward
signals to the entire process group it will end up re-signaling itself.

Call `setpgrp(2)` in the child to place it in its own process group.
The IMP now has a process group to which it can forward signals.
Problem: `flux-imp run` only forwards signals to its direct child.
When that child is a shell script running other processes, common
signals like SIGINT and SIGTERM are blocked. Even if the direct
child is not a shell, terminating only the IMP's child may leave
stray procesess which do not immediately exit as desired.

Deliver forwarded signals to the child's process group.

Fixes flux-framework#194
Problem: A test_when_finished() call in t2002-imp-run.t references
an unset variable.

Fix the test_when_finished() call.
Problem: The IMP does not reset signals that are ignored before
executing child processes, but SIG_IGN is inherited across exec(2),
so this may leave the IMP unable to forward certain signals like
SIGINT to its child processes.

Reset all signals from 1 (SIGHUP) to SIGRTMIN to SIG_DFL in
imp_sigunblock_all() so that no signals are ignored before calling
exec(2).
Problem: t2002-imp-run.t tests that `flux-imp run` handles SIGTERM
as a side effect of the "linger" test, but SIGINT isn't tested and
this signal has been shown to specifically have problems (when run
under sudo(1) SIGINT was ignored).

Add a test that ensures SIGINT can be forwarded by the IMP with
`flux-imp run`.
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! and my testing (before the last force push) seems to be fine except for the extra log messages which were removed in the force push.

@grondo
Copy link
Contributor Author

grondo commented Nov 6, 2024

Thanks! I set MWP.

@mergify mergify bot merged commit 1275e69 into flux-framework:master Nov 6, 2024
17 checks passed
@grondo grondo deleted the issue#194 branch November 6, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

imp run doesn't forward (most) signals to all progeny
2 participants