Don't kill kthreads during bootup #15275

aszlig · 2016-05-06T15:10:08Z

This addresses #15226 and fixes killing of processes before switching from the initrd to the real root.

Right now, the pkill that is issued not only kills user space processes but also sends a SIGKILL to kernel threads as well. Usually these threads ignore signals, but some of these processes do handle signals, like for example the md module, which happened in #15226.

It also adds a small check for the swraid installer test and a standalone test which checks on just that problem, so in the future this shouldn't happen again.

Cc: @edolstra, @viric, @dezgeg, @grahamc, @7c6f434c

@dezgeg

Unfortunately, pkill doesn't distinguish between kernel and user space processes, so we need to make sure we don't accidentally kill kernel threads. Normally, a kernel thread ignores all signals, but there are a few that do. A quick grep on the kernel source tree (as of kernel 4.6.0) shows the following source files which use allow_signal(): drivers/isdn/mISDN/l1oip_core.c drivers/md/md.c drivers/misc/mic/cosm/cosm_scif_server.c drivers/misc/mic/cosm_client/cosm_scif_client.c drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c drivers/staging/rtl8188eu/core/rtw_cmd.c drivers/staging/rtl8712/rtl8712_cmd.c drivers/target/iscsi/iscsi_target.c drivers/target/iscsi/iscsi_target_login.c drivers/target/iscsi/iscsi_target_nego.c drivers/usb/atm/usbatm.c drivers/usb/gadget/function/f_mass_storage.c fs/jffs2/background.c fs/lockd/clntlock.c fs/lockd/svc.c fs/nfs/nfs4state.c fs/nfsd/nfssvc.c While not all of these are necessarily kthreads and some functionality may still be unimpeded, it's still quite harmful and can cause unexpected side-effects, especially because some of these kthreads are storage-related (which we obviously don't want to kill during bootup). During discussion at NixOS#15226, @dezgeg suggested the following implementation: for pid in $(pgrep -v -f '@'); do if [ "$(cat /proc/$pid/cmdline)" != "" ]; then kill -9 "$pid" fi done This has a few downsides: * User space processes which use an empty string in their command line won't be killed. * It results in errors during bootup because some shell-related processes are already terminated (maybe it's pgrep itself, haven't checked). * The @ is searched within the full command line, not just at the beginning of the string. Of course, we already had this until now, so it's not a problem of his implementation. I posted an alternative implementation which doesn't suffer from the first point, but even that one wasn't sufficient: for pid in $(pgrep -v -f '^@'); do readlink "/proc/$pid/exe" &> /dev/null || continue echo "$pid" done | xargs kill -9 This one spawns a subshell, which would be included in the processes to kill and actually kills itself during the process. So what we have now is even checking whether the shell process itself is in the list to kill and avoids killing it just to be sure. Also, we don't spawn a subshell anymore and use /proc/$pid/exe to distinguish between user space and kernel processes like in the comments of the following StackOverflow answer: http://stackoverflow.com/a/12231039 We don't need to take care of terminating processes, because what we actually want IS to terminate the processes. The only point where this (and any previous) approach falls short if we have processes that act like fork bombs, because they might spawn additional processes between the pgrep and the killing. We can only address this with process/control groups and this still won't save us because the root user can escape from that as well. Signed-off-by: aszlig <[email protected]> Fixes: NixOS#15226

This is a regression test for NixOS#15226, so that the test will fail once we accidentally kill one or more of the md kthreads (aka: if safe mode is enabled). Signed-off-by: aszlig <[email protected]>

We already have a small regression test for NixOS#15226 within the swraid installer test. Unfortunately, we only check there whether the md kthread got signalled but not whether other rampaging processes are still alive that *should* have been killed. So in order to do this we provide multiple canary processes which are checked after the system has booted up: * canary1: It's a simple forking daemon which just sleeps until it's going to be killed. Of course we expect this process to not be alive anymore after boot up. * canary2: Similar to canary1, but tries to mimick a kthread to make sure that it's going to be properly killed at the end of stage 1. * canary3: Like canary2, but this time using a @ in front of its command name to actually prevent it from being killed. * kcanary: This one is a real kthread and it runs until killed, which shouldn't be the case. Tested with and without 67223ee and everything works as expected, at least on my machine. Signed-off-by: aszlig <[email protected]>

We don't want to push out a channel update whenever this test fails, because that might have unexpected and confused side effects and it *really* means that stage 1 of our boot up is broken. Signed-off-by: aszlig <[email protected]>

mention-bot · 2016-05-06T15:10:11Z

By analyzing the blame information on this pull request, we identified @edolstra, @wkennington and @dezgeg to be potential reviewers

viric · 2016-05-06T16:16:25Z

In stage 1 we don't have to avoid killing ourselves, because it is PID 1, init, that cannot be killed.

One typical process that would remain, for example, if not killed is the "tee" process that prefixes every log line with "stage-1-init:" string. That completely deluded me, because adding cmdline "boot.trace" gave me the "set -x" sh with lots of stage-1-init lines which I thought to be COMPLETE until switch_to_root, but they were not. They were there until the kill.

If we could get a COMPLETE log of stage1 with "boot.trace" (not only until kill) it would be great, but maybe topic of another issue.

edolstra · 2016-05-06T16:24:58Z

Wow, I didn't know you could kill kernel processes from user space.

An alternative solution woulf be to only kill known processes that we need to get rid of, e.g. udev. (Are there any others?)

edolstra · 2016-05-06T16:26:24Z

The alternative solution would have the advantage of not requiring a special kernel module in the test, which seems painful to maintain.

aszlig · 2016-05-06T16:44:03Z

@viric: I've added an exempt of the init process just to be absolutely sure, as detailed in 67223ee. The kill is shortly before switching to stage 2, so I don't think we need to exempt tee from the killing.

@edolstra: Well, there could be plenty of others, anything that goes into the hooks prior to the killing spree. As for the kernel module: The implementation is quite simple and it's not exclusively useful for just stage 1 but for other things during the boot process. I'd keep it to make sure that regardless of this bug we don't kill kernel threads. If the maintenance burden gets too high, we can still drop it.

aszlig · 2016-05-06T16:49:54Z

@edolstra: Btw. if we switch our initrd to systemd some day, it also kills all of the processes the same way as we have now in this PR (aka: kill everything except kthreads and @-prefixed processes).

viric · 2016-05-06T17:31:54Z

well, you say "killing kthreads", but it's not that they die and disappear. As for the RAID thread case, it's about signaling them.

At some point in history, RAIDs had to be deactivated before poweroff to be properly later activated, but as people started to use rootfs, it was not always possible to deactivate the array, and they thought of the slow safe mode. As most shutdown sequences do a killall, that would hit the kthreads of raid that would put the RAIDs in safemode not requiring the deactivation.

viric · 2016-05-06T17:35:09Z

@aszlig couldn't the tee die just by closing an fd, maybe in stage 2, and we would get the whole log? As for the issue that raised all this, I thought that what put RAID in safe mode was the udev control exit, because that was the last line in the stage1 "set -x" log. I didn't expect that stage1 went further in silence.

aszlig · 2016-05-06T17:37:30Z

@viric: The "killing threads" is not about md only, there are other kthreads which do shut down on kill, for example there is jffs2_gcd_mtd, which is stopped at SIGKILL.

viric · 2016-05-06T17:38:08Z

Ah I didn't know. Good that you mentioned it!

aszlig · 2016-05-06T17:43:02Z

As for tee, there is

nixpkgs/nixos/modules/system/boot/stage-1-init.sh

Lines 435 to 438 in 5ac997b

    
           # Reset the logging file descriptors. 
        
           # Do this just before pkill, which will kill the tee process. 
        
           exec 1>&$logOutFd 2>&$logErrFd 
        
           eval "exec $logOutFd>&- $logErrFd>&-"

, which should close the file descriptors.

viric · 2016-05-06T18:09:38Z

right, why not close the descriptors in stage2, for example? Then we have the full stage1 log

aszlig · 2016-05-06T18:52:36Z

@viric: This would involve a lot more work, because we'd need to pause the tee (plus relocate the pipes) when moving the mountpoints to the new root filesystem. I don't think it's worth the effort, because you only gain a few little log lines (if any) which are only for switching to the new root filesystem, and even if we do pause we need to implement buffering while doing that. As this PR is also meant to be backported to stable, it's out of its scope because if we would do that we'll need more extensive testing plus we risk breaking existing systems that run on stable.

viric · 2016-05-06T19:10:16Z

@aszlig Don't worry then. Probably I don't understand all the consequences of that, and it's not the topic of this PR. Let others review this and get it into master and 16.03 (and 15.09? :)

@edolstra

As @edolstra pointed out that the kernel module might be painful to maintain. I strongly disagree because it's only a small module and it's good to have such a canary in the tests no matter how the bootup process looks like, so I'm going the masochistic route and try to maintain it. If it *really* becomes too much maintenance burden, we can still drop or disable kcanary. Signed-off-by: aszlig <[email protected]>

@edolstra

Merges pull request #15275: This addresses #15226 and fixes killing of processes before switching from the initrd to the real root. Right now, the pkill that is issued not only kills user space processes but also sends a SIGKILL to kernel threads as well. Usually these threads ignore signals, but some of these processes do handle signals, like for example the md module, which happened in #15226. It also adds a small check for the swraid installer test and a standalone test which checks on just that problem, so in the future this shouldn't happen again. This has been acked by @edolstra on IRC.

@edolstra

Merges pull request #15275: This addresses #15226 and fixes killing of processes before switching from the initrd to the real root. Right now, the pkill that is issued not only kills user space processes but also sends a SIGKILL to kernel threads as well. Usually these threads ignore signals, but some of these processes do handle signals, like for example the md module, which happened in #15226. It also adds a small check for the swraid installer test and a standalone test which checks on just that problem, so in the future this shouldn't happen again. This has been acked by @edolstra on IRC. The reason I'm merging this to 16.03 is that this branch fixes #15226 and thus also fixes mdraid setups out there. Tested using the boot-stage1.nix NixOS test against release-16.03.

@edolstra

Merges pull request #15275: This addresses #15226 and fixes killing of processes before switching from the initrd to the real root. Right now, the pkill that is issued not only kills user space processes but also sends a SIGKILL to kernel threads as well. Usually these threads ignore signals, but some of these processes do handle signals, like for example the md module, which happened in #15226. It also adds a small check for the swraid installer test and a standalone test which checks on just that problem, so in the future this shouldn't happen again. This has been acked by @edolstra on IRC. The reason I'm merging this to 15.09 is that this branch fixes #15226 and thus also fixes mdraid setups out there. Tested using the boot-stage1.nix NixOS test against release-15.09.

This test has been introduced by NixOS/nixpkgs@e936f7d and was part of NixOS/nixpkgs#15275. The check attribute is always true for this test, because it has to be run no matter which configuration you're using. It basically makes sure that boot stage 1 is working correctly. Signed-off-by: aszlig <[email protected]>

cstrahan · 2016-05-06T20:58:53Z

Interesting. I wonder if that had anything to do with my occasionally broken LUKS bootup: #14199

@edolstra

Merges pull request NixOS#15275: This addresses NixOS#15226 and fixes killing of processes before switching from the initrd to the real root. Right now, the pkill that is issued not only kills user space processes but also sends a SIGKILL to kernel threads as well. Usually these threads ignore signals, but some of these processes do handle signals, like for example the md module, which happened in NixOS#15226. It also adds a small check for the swraid installer test and a standalone test which checks on just that problem, so in the future this shouldn't happen again. This has been acked by @edolstra on IRC. The reason I'm merging this to 15.09 is that this branch fixes NixOS#15226 and thus also fixes mdraid setups out there. Tested using the boot-stage1.nix NixOS test against release-15.09.

@edolstra

Merges pull request NixOS#15275: This addresses NixOS#15226 and fixes killing of processes before switching from the initrd to the real root. Right now, the pkill that is issued not only kills user space processes but also sends a SIGKILL to kernel threads as well. Usually these threads ignore signals, but some of these processes do handle signals, like for example the md module, which happened in NixOS#15226. It also adds a small check for the swraid installer test and a standalone test which checks on just that problem, so in the future this shouldn't happen again. This has been acked by @edolstra on IRC. The reason I'm merging this to 16.03 is that this branch fixes NixOS#15226 and thus also fixes mdraid setups out there. Tested using the boot-stage1.nix NixOS test against release-16.03.

aszlig added 4 commits May 6, 2016 16:24

nixos/tests/installer/swraid: Check for safemode

dc6d003

This is a regression test for NixOS#15226, so that the test will fail once we accidentally kill one or more of the md kthreads (aka: if safe mode is enabled). Signed-off-by: aszlig <[email protected]>

nixos/release-combined: Add boot-stage1 test

eb6e366

We don't want to push out a channel update whenever this test fails, because that might have unexpected and confused side effects and it *really* means that stage 1 of our boot up is broken. Signed-off-by: aszlig <[email protected]>

aszlig self-assigned this May 6, 2016

aszlig added 0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 9.needs: reporter feedback This issue needs the person who filed it to respond 6.topic: kernel The Linux kernel labels May 6, 2016

aszlig mentioned this pull request May 6, 2016

Broken swraid boot/shutdown sequence #15226

Closed

aszlig added the 9.needs: port to stable A PR needs a backport to the stable release. label May 6, 2016

aszlig merged commit 64ca91c into NixOS:master May 6, 2016

samueldr removed the 9.needs: port to stable A PR needs a backport to the stable release. label Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't kill kthreads during bootup #15275

Don't kill kthreads during bootup #15275

aszlig commented May 6, 2016

mention-bot commented May 6, 2016

viric commented May 6, 2016

edolstra commented May 6, 2016

edolstra commented May 6, 2016

aszlig commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

viric commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

cstrahan commented May 6, 2016

Don't kill kthreads during bootup #15275

Don't kill kthreads during bootup #15275

Conversation

aszlig commented May 6, 2016

mention-bot commented May 6, 2016

viric commented May 6, 2016

edolstra commented May 6, 2016

edolstra commented May 6, 2016

aszlig commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

viric commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

aszlig commented May 6, 2016

viric commented May 6, 2016

cstrahan commented May 6, 2016