Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dracut fails to boot with Clevis 20 #456

Open
MrRoy opened this issue Mar 9, 2024 · 12 comments
Open

Dracut fails to boot with Clevis 20 #456

MrRoy opened this issue Mar 9, 2024 · 12 comments
Labels

Comments

@MrRoy
Copy link

MrRoy commented Mar 9, 2024

When I rebuild my initramfs with Clevis 20, my system is unable to boot. Though strangely Dracut is able to unlock my LUKS partition, but fails to boot after unlocking it:

Unlocked /dev/nvme0n1p2 (UUID=...) successfully
/lib/dracut-lib.sh: line 147: CMDLINE_PROC: unbound variable
/lib/dracut-lib.sh: line 198: _newoption: unbound variable

dracut Warning: Signal caught!

/lib/dracut-lib.sh: line 147: CMDLINE_PROC: unbound variable
/lib/dracut-lib.sh: line 198: _newoption: unbound variable
/lib/dracut-lib.sh: line 147: CMDLINE_PROC: unbound variable
/lib/dracut-lib.sh: line 198: _newoption: unbound variable
/lib/dracut-lib.sh: line 913: DRACUT_SYSTEMD: unbound variable
[    4.847995] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[    4.847995] CPU: 3 PID: 1 Comm: init Not tainted 6.6.16 #1
[    4.847991] Hardware name: LENOVO 21D4CT01WW/21D4CT01ww, BIOS N3GET66W (1.66 ) 02/02/2024
[    4.848009] Call Trace:
[    4.848020]  <TASK>
[    4.848030]  dump_stack_lvl+0x32/0x50
[    4.848048]  panic+0x172/0x310
[    4.848067]  do_exit+0x85f/0x9a0
[    4.848080]  ? srso_alias_return_thunk+0x39/0x90
[    4.848095]  ? __count_memcg_events+0x39/0x90
[    4.848111]  do_group_exit+0x28/0x80
[    4.848123]  __x64_sys_exit_group+0xf/0x10

If I recreate my initramfs using Clevis 19, using the same dracut version, parameters, kernel and cmdline, this does not happen and my system boots successfully. Likewise using dracut without clevis also boots successfully.

OS: Gentoo (OpenRC)
Kernel: 6.6.16
Dracut: 060 (commit 4980bad34775da715a2639b736cba5e65a8a2604)

N.B. on Clevis 19, I apply the patch from PR #347 in order to get Clevis to work without systemd

@sarroutbi sarroutbi added the bug label Mar 10, 2024
@sergio-correia
Copy link
Collaborator

Would you please provide some steps so I can try reproducing the issue? Thanks in advance.

@BohdanTkachenko
Copy link

I faced exactly the same issue. It seems to be coming from dracut/modules.d/99base/dracut-lib.sh and triggered by afe91eb.

I initially faced this issue when trying to use ZFSBootMenu + Clevis v20 on Fedora. Then I tried applying afe91eb and cfefdde to v19 on Debian and got exactly the same issue.

I think set -eu is too strict and causes this error. I tried to change set -eu to set -e and it fixed the problem for me.

I will try to create some minimal setup to reproduce the issue and share it here.

@sergio-correia
Copy link
Collaborator

sergio-correia commented Apr 19, 2024

I will try to create some minimal setup to reproduce the issue and share it here.

Any news on this front, so I can try to investigate further?

We can probably relax that set -eu, but at a first sight, it seems like something that should be addressed in dracut.

@MrRoy
Copy link
Author

MrRoy commented Apr 19, 2024

On my side I tried to patch clevis 20 to use only set -e but I still couldn't boot (different error though).

When I have some time, I will try to setup a minimal alpine system to see if I can't get you reproducible instructions @sergio-correia or maybe even share with you a VM with the problem.

@BohdanTkachenko
Copy link

Sorry for the delay. This gist provides a minimal setup of ZFSBootMenu + Dracut + Clevis to reproduce the issue.

For simplicity, you can use the following steps:

  1. Install QEMU, Docker, OVMF.

  2. Build a Docker image with ZFSBootMenu build environment:

curl https://gist.githubusercontent.com/BohdanTkachenko/98e6c2aa8b923a73948a185af0d3accb/raw/Dockerfile \
  | docker build . -f - -t zbm-fedora                     
  1. Use this Docker image to build an actual ZFSBootMenu EFI and run it in QEMU. The following command should work on Fedora, but you might need to adjust it for other distros:
bash <(curl -s https://gist.githubusercontent.com/BohdanTkachenko/98e6c2aa8b923a73948a185af0d3accb/raw/build-and-run.sh)

@oldium
Copy link
Contributor

oldium commented May 5, 2024

Just from checking dracut modules in /usr/lib/dracut/modules.d it looks like when you use set -e, you have to use also set +e afterwards (see 30convertfs/convertfs.sh). So if the clevis module calls set -eu, it should probably call set +eu at the end too. Or better use a subshell to have this setting only temporary like in 98syslog/rsyslogd-start.sh.

@sergio-correia
Copy link
Collaborator

Just from checking dracut modules in /usr/lib/dracut/modules.d it looks like when you use set -e, you have to use also set +e afterwards (see 30convertfs/convertfs.sh). So if the clevis module calls set -eu, it should probably call set +eu at the end too. Or better use a subshell to have this setting only temporary like in 98syslog/rsyslogd-start.sh.

I think this might work. I will do some testing with the reproducer from @BohdanTkachenko (thanks, by the way!)

@oldium
Copy link
Contributor

oldium commented Jun 16, 2024

Dracut sources all hook files, it does not execute them, so any changes made by the hooks are visible to all other Dracut scripts. To fix this, it should be sufficient to remove set -eu from the Dracut hook (i.e. clevis-hook.sh). The unlocking script (clevis-luks-unlocker) is then executed in a separate environment and setting set -eu there should be safe.

@oldium
Copy link
Contributor

oldium commented Jun 23, 2024

Clevis v20 unlocking with Dracut without SystemD completely ignores /etc/crypttab and other options supplied via host-only and kernel command-line, so I reworked the unlocking in #462 (work is done in commits 2 and 3). The unlocking now uses a pipe to send password to cryptsetup. Feel free to try it.

@oldium
Copy link
Contributor

oldium commented Jun 23, 2024

I have created Debian package with #462 work (new TPM 1.2 feature, Clevis unlock via pipe) for Debian 12 (bookworm) on amd64 arch can be found here. I just took Trixie v20 sources, updated them and compiled them on Debian 12.

Edit: rebuilt packages with +tpm1 suffix.

@oldium
Copy link
Contributor

oldium commented Oct 6, 2024

Latest Debian 11 (bullseye), 12 (bookworm) and Fedora v39, v40 and v41 packages are available here https://github.com/oldium/clevis/releases/tag/v21_tpm1u2.

@oldium
Copy link
Contributor

oldium commented Oct 10, 2024

Latest Debian 11 (bullseye), 12 (bookworm) and Fedora v39, v40 and v41 packages are available here https://github.com/oldium/clevis/releases/tag/v21_tpm1u3.

This version includes also latest PKCS#11 updates from master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants