Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imp run doesn't forward (most) signals to all progeny #194

Closed
garlick opened this issue Nov 5, 2024 · 5 comments · Fixed by #195
Closed

imp run doesn't forward (most) signals to all progeny #194

garlick opened this issue Nov 5, 2024 · 5 comments · Fixed by #195

Comments

@garlick
Copy link
Member

garlick commented Nov 5, 2024

Problem: flux run forwards most signals to its direct child, and SIGUSR1 (SIGKILL) to the whole family, which is appropriate for the flux shell but less so for, say, a bourne shell script, where the bourne shell may be less diligent about cleaning up its children.

Change flux run signal forwarding behavior to try a little harder to clean up.

Note that this likely doesn't affect prolog, epilog, or housekeeping when they are configured to use systemd, since in that case, the shell script executes systemctl stop in response to signals. It does affect something like flux exec --use-imp sh -c "sleep inf" when the user types ^C to abort.

@grondo
Copy link
Contributor

grondo commented Nov 5, 2024

A minor point is that the current behavior (introduced by #188) was actually a change in behavior from the previous behavior, which required users of flux-imp run to signal the process with flux-imp kill (now removed). The behavior of flux-imp kill was to signal the entire process tree (using pid_kill_children()) when the target of kill was an IMP process. Therefore, #188 introduced this bug via a change in behavior -- flux-imp run should forward signals to all its descendants by default to preserve existing behavior.

garlick added a commit to garlick/flux-security that referenced this issue Nov 5, 2024
Problem: flux run doesn't clean up reliably when aborted by signal

Change flux run to forward signals to all children rather than just the
direct child.

Fixes flux-framework#194
@garlick
Copy link
Member Author

garlick commented Nov 6, 2024

Aw crud, the imp's /proc/PID/task/PID/children only includes direct children, so calling pid_kill_children() is the same as killing the one direct child. We might need a change like this too:

diff --git a/src/imp/signals.c b/src/imp/signals.c
index 55b61de..555eb72 100644
--- a/src/imp/signals.c
+++ b/src/imp/signals.c
@@ -59,8 +59,10 @@ static void fwd_signal_all (int signum)
 
     /* If cgroup wasn't used or fails, try with pid_kill_children
      */
-    if (count < 0)
+    if (count < 0) {
+        (void)pid_kill_children (imp_child, signum);
         count = pid_kill_children (getpid (), signum);
+    }
 
     /* O/w, log an error, not much more to do
      */

@grondo
Copy link
Contributor

grondo commented Nov 6, 2024

Hm, I guess I was wrong about flux-imp kill, it only called pid_kill_children() on the IMP itself:

/* If pid is owned by root and is a flux-imp process, then deliver
* signal to child process instead (presumably a job shell)
*/
if (p->pid_owner == 0
&& strcmp (p->command, "flux-imp") == 0) {
int count = pid_kill_children (pid, sig);
if (count < 0)
imp_die (1,
"kill: failed to signal flux-imp children: %s",
strerror (errno));
else if (count == 0)
imp_die (1, "kill: killed 0 flux-imp children");
}

We could rewrite pid_kill_children() to operate recursively, but now I'm confused why the current tests break. Sorry if I led us into the weeds on this one.

@grondo
Copy link
Contributor

grondo commented Nov 6, 2024

I think I understand what's going on here now.

When flux exec invokes (invoked?) flux-imp kill, it does so using the negative of the subprocess pid, so that the signal is delivered to the entire process group (from imp_kill() in flux-rexec.c).

    if (!(cmd = flux_cmd_create (0, NULL, environ))
        || flux_cmd_argv_append (cmd, imp_path) < 0
        || flux_cmd_argv_append (cmd, "kill") < 0
        || flux_cmd_argv_appendf (cmd, "%d", signum) < 0
        || flux_cmd_argv_appendf (cmd, "-%ld", (long) pid) < 0) {
        fprintf (stderr,
                 "Failed to create flux-imp kill command for rank %d pid %d\n",
                 rank, pid);
        return NULL;
    }

Since libsubprocess invokes subprocesses in their own process group, and the IMP called execve directly on the target of flux-imp run, the shell (script) and sleep command were in the same process group and thus the signal is delivered to the script and the sleep (in the tests we're talking about here.

As I think @garlick alluded to above, in the new code, the privileged IMP only sends the signal to its direct child instead of a group of processes.

We can correct this by having flux-imp run call setpgrp(2) in the child and then signal the entire process group instead of a single child PID, more closely mimicking the previous behavior. For instance, I think this patch fixes the failure (at least the version of a reproducer I've been using locally):

@@ -193,12 +193,18 @@ imp_run (struct imp_state *imp,
     if ((child = fork ()) < 0)
         imp_die (1, "run: fork: %s", strerror (errno));
 
-    imp_set_signal_child (child);
+    imp_set_signal_child (-child);
 
     if (child == 0) {
         /* unblock all signals */
         imp_sigunblock_all ();
 
+        /* Place child in its own process group, so that parent IMP
+         * can signal the pgrp as a whole
+         */
+        if (setpgrp () < 0)
+            imp_die (1, "setpgrp: %s", strerror (errno));
+
         if (setuid (geteuid()) < 0
             || setgid (getegid()) < 0)
             imp_die (1, "setuid: %s", strerror (errno));
diff --git a/src/imp/signals.c b/src/imp/signals.c
index 670398c..4c55042 100644
--- a/src/imp/signals.c
+++ b/src/imp/signals.c
@@ -67,7 +67,7 @@ static void fwd_signal (int signum)
         if (count < 0)
             imp_warn ("Failed to forward SIGKILL: %s", strerror (errno));
     }
-    else if (imp_child > 0)
+    else if (imp_child != -1)
         kill (imp_child, signum);
 }
 

@garlick
Copy link
Member Author

garlick commented Nov 6, 2024

Great! I put this change on my test cluster and my test also works now!

grondo added a commit to grondo/flux-security that referenced this issue Nov 6, 2024
Problem: `flux-imp run` only forwards signals to its direct child.
When that child is a shell script running other processes, common
signals like SIGINT and SIGTERM are blocked. Even if the direct
child is not a shell, terminating only the IMP's child may leave
stray procesess which do not immediately exit as desired.

Deliver forwarded signals to the child's process group.

Fixes flux-framework#194
@mergify mergify bot closed this as completed in #195 Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants