-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
imp run doesn't forward (most) signals to all progeny #194
Comments
A minor point is that the current behavior (introduced by #188) was actually a change in behavior from the previous behavior, which required users of |
Problem: flux run doesn't clean up reliably when aborted by signal Change flux run to forward signals to all children rather than just the direct child. Fixes flux-framework#194
Aw crud, the imp's diff --git a/src/imp/signals.c b/src/imp/signals.c
index 55b61de..555eb72 100644
--- a/src/imp/signals.c
+++ b/src/imp/signals.c
@@ -59,8 +59,10 @@ static void fwd_signal_all (int signum)
/* If cgroup wasn't used or fails, try with pid_kill_children
*/
- if (count < 0)
+ if (count < 0) {
+ (void)pid_kill_children (imp_child, signum);
count = pid_kill_children (getpid (), signum);
+ }
/* O/w, log an error, not much more to do
*/ |
Hm, I guess I was wrong about Lines 105 to 117 in deef6fe
We could rewrite |
I think I understand what's going on here now. When if (!(cmd = flux_cmd_create (0, NULL, environ))
|| flux_cmd_argv_append (cmd, imp_path) < 0
|| flux_cmd_argv_append (cmd, "kill") < 0
|| flux_cmd_argv_appendf (cmd, "%d", signum) < 0
|| flux_cmd_argv_appendf (cmd, "-%ld", (long) pid) < 0) {
fprintf (stderr,
"Failed to create flux-imp kill command for rank %d pid %d\n",
rank, pid);
return NULL;
} Since libsubprocess invokes subprocesses in their own process group, and the IMP called execve directly on the target of As I think @garlick alluded to above, in the new code, the privileged IMP only sends the signal to its direct child instead of a group of processes. We can correct this by having @@ -193,12 +193,18 @@ imp_run (struct imp_state *imp,
if ((child = fork ()) < 0)
imp_die (1, "run: fork: %s", strerror (errno));
- imp_set_signal_child (child);
+ imp_set_signal_child (-child);
if (child == 0) {
/* unblock all signals */
imp_sigunblock_all ();
+ /* Place child in its own process group, so that parent IMP
+ * can signal the pgrp as a whole
+ */
+ if (setpgrp () < 0)
+ imp_die (1, "setpgrp: %s", strerror (errno));
+
if (setuid (geteuid()) < 0
|| setgid (getegid()) < 0)
imp_die (1, "setuid: %s", strerror (errno));
diff --git a/src/imp/signals.c b/src/imp/signals.c
index 670398c..4c55042 100644
--- a/src/imp/signals.c
+++ b/src/imp/signals.c
@@ -67,7 +67,7 @@ static void fwd_signal (int signum)
if (count < 0)
imp_warn ("Failed to forward SIGKILL: %s", strerror (errno));
}
- else if (imp_child > 0)
+ else if (imp_child != -1)
kill (imp_child, signum);
}
|
Great! I put this change on my test cluster and my test also works now! |
Problem: `flux-imp run` only forwards signals to its direct child. When that child is a shell script running other processes, common signals like SIGINT and SIGTERM are blocked. Even if the direct child is not a shell, terminating only the IMP's child may leave stray procesess which do not immediately exit as desired. Deliver forwarded signals to the child's process group. Fixes flux-framework#194
Problem:
flux run
forwards most signals to its direct child, and SIGUSR1 (SIGKILL) to the whole family, which is appropriate for theflux shell
but less so for, say, a bourne shell script, where the bourne shell may be less diligent about cleaning up its children.Change
flux run
signal forwarding behavior to try a little harder to clean up.Note that this likely doesn't affect prolog, epilog, or housekeeping when they are configured to use systemd, since in that case, the shell script executes
systemctl stop
in response to signals. It does affect something likeflux exec --use-imp sh -c "sleep inf"
when the user types ^C to abort.The text was updated successfully, but these errors were encountered: