Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SIGHUP/SIGCONT sometimes reaching the child due to tty quirks #174

Merged
merged 2 commits into from
Aug 1, 2018

Conversation

chriskuehl
Copy link
Contributor

@chriskuehl chriskuehl commented Aug 1, 2018

Fixes #136

To summarize, the issue in #136 is that when dumb-init calls TIOCNOTTY, the kernel will send a SIGHUP and SIGCONT to the dumb-init process if dumb-init is the session leader. Quoth man tty:

    TIOCNOTTY
       Detach the calling process from its controlling terminal.

       If the process is the session leader, then SIGHUP and  SIGCONT  signals
       are  sent to the foreground process group and all processes in the cur‐
       rent session lose their controlling tty.

When this happens, there's a race with two possible outcomes:

  1. (Happy path) dumb-init pid1 receives the SIGHUP/SIGCONT and forward them to the child (pid2), but the child hasn't yet exec'd the real process. The pid2 dumb-init just ignores the signals, eventually execs the real process, and nobody notices this quirk.
  2. (Sad path) dumb-init pid2 has exec'd the real process by the time the pid1 dumb-init forwards it the SIGHUP/SIGCONT. The real process gets the SIGHUP and usually dies with status code 129 (killed by SIGHUP).

Typically (1) happens, but some environments cause (2) to happen more frequently, probably due to differences in how the scheduler behaves.

This is a fairly non-intrusive fix: it just ignores the first SIGHUP/SIGCONT. At some point I'd still like to see if there's a better fix (can we hand the session to the child when we're the session leader rather than creating a new one?), but this should be fairly safe.

Still working on writing a test. Unfortunately it's hard to test the actual functionality (hard to force the race without patching dumb-init's code) so it may end up just asserting debug messages :(

@asottile
Copy link
Contributor

asottile commented Aug 1, 2018

👍 idea seems good. We talked through some of the scenarios offline including duplicate SIGHUP / SIGCONT signals -- the nice thing is that just eating one of these signals is fine as the kernel / user signals here are indistinguishable (kernel / user send SIGHUP during fork, dumb-init eats one and forwards the other).

@chriskuehl chriskuehl force-pushed the fix-race branch 2 times, most recently from b2203fd to 3bc3d4f Compare August 1, 2018 22:14
@chriskuehl chriskuehl changed the title [WIP] Fix SIGHUP/SIGCONT sometimes reaching the child due to tty quirks Fix SIGHUP/SIGCONT sometimes reaching the child due to tty quirks Aug 1, 2018
@chriskuehl
Copy link
Contributor Author

Should be ready for final review / merge now. Seems like the ppc64le tests are failing though, and not producing a log, so I wonder if something is wrong with the Travis ppc64le infrastructure? :(

@ghatwala any idea why we'd be seeing that with the PPC tests?

Copy link
Contributor

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise looks good!

dumb-init.c Outdated
if (signum == SIGCHLD) {

if (signal_temporary_ignores[signum] == 1) {
DEBUG("Ignoring signal %d during its first receive.\n", signum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe Ignoring tty hand-off signal... so it's more clear why this is happening here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this wording better, changed to that.

@chriskuehl chriskuehl merged commit 9f490af into Yelp:master Aug 1, 2018
@chriskuehl chriskuehl deleted the fix-race branch August 1, 2018 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dumb-init child sometimes receives SIGHUP and/or SIGCONT right after start in setsid mode due to race
2 participants