Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TPM- and Tang-based disk encryption #1560

Merged
merged 22 commits into from
Mar 14, 2024

Conversation

simoncampion
Copy link
Contributor

@simoncampion simoncampion commented Jan 10, 2024

Add support for TPM- and Tang-based disk encryption

This PR adds support for TPM- and Tang-based disk encryption, by adding an app-crypt/clevis package and dependencies and an updated version of sys-kernel/bootengine. It is accompanied by PRs in the bootengine and mantle repositories. It includes work by @krishjainx towards supporting these features, plus a few further changes that I needed to make for the features to fully work. I left changes by @krishjainx largely untouched where no modifications were needed to make it work, but I'm happy to address comments about any of the changes in this PR.

How to use

With these changes the luks.clevis.tpm2 and luks.clevis.tang Ignition options should be fully supported. See the automated tests in the mantle pull request for examples.

I have no prior experience making changes to Flatcar and look forward to hearing how to do things better. One specific aspect that I'd like feedback on is the following issue: When encrypting the ROOT partition, it must be decrypted in the initramfs before the root pivot. However, Ignition will add an /etc/crypttab entry for the ROOT partition, leading to the generation of a systemd-cryptsetup service for the encrypted root. Unfortunately, due to how this service is configured, it will fail if the root disk is already decrypted. This is a well-known issue. This does not impact the correct functioning of the system, but is annoying and also leads to the kola tests for root disk encryption to fail since the test system always checks for failed services. I'm unsure how to best fix this issue and would be thankful for suggestions.

Testing done

I tested the luks.clevis.tpm2 and luks.clevis.tang config options for the ROOT partition and non-root disks using the new tests in kola.

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

Once this is merged, I'd also be happy to update the documentation and create a new Butane spec for the newly supported config options.

@pothos
Copy link
Member

pothos commented Jan 10, 2024

Thanks!
Let's keep the rootfs out of scope for now because it will also be tricky to get the dependencies right.

When a package from Gentoo needs no changes, it should be in portage-stable. The profile file can be used for simple tweaks. If the package needs changes, we have it in coreos-overlay but in two commits: the first "syncs from Gentoo" which is deleting any files in the package folder and brings in new upstream files, and the second commit "applies Flatcar changes" and this is where commented changes are introduced. Only then we can correctly update the package.

Copy link

github-actions bot commented Jan 10, 2024

@simoncampion
Copy link
Contributor Author

Thanks for the speedy comment, @pothos !

When a package from Gentoo needs no changes, it should be in portage-stable. The profile file can be used for simple tweaks. If the package needs changes, we have it in coreos-overlay but in two commits: the first "syncs from Gentoo" which is deleting any files in the package folder and brings in new upstream files, and the second commit "applies Flatcar changes" and this is where commented changes are introduced. Only then we can correctly update the package.

That makes sense 👍 I reworked the Git history accordingly and cleaned things up a bit. Let me know if there's more to change; I did my best to infer how things are done from existing commits, but might have missed some bits.

If I see correctly, Krish found some of the packages we need (app-crypt/clevis, dev-libs/jose, and dev-libs/luksmeta) in GURU rather than the official Gentoo repo. I used the from Gentoo commits for these packages too. If you prefer to treat packages from GURU differently in terms of how they are committed, let me know.

@krishjainx
Copy link
Contributor

krishjainx commented Jan 26, 2024

Hi @simoncampion ,

From my understanding based on my internship with the team, coreos-overlay should be where you put the packages from GURU (to be maintained by the Flatcar team), and portage-stable is where you should put packages (mostly unmodified and synced) from upstream Gentoo's portage tree.

In terms of the commit message, I'm not too sure. But the location as to where they're placed does help.

@simoncampion
Copy link
Contributor Author

Okay, I moved the packages from GURU to coreos-overlay.

@pothos If there is anything else I can do for this PR, let me know.

</maintainer>
<upstream>
<remote-id type="github">latchset/clevis</remote-id>
</upstream>
<use>
<flag name="luks">Enable LUKS support</flag>
<flag name="tpm">Enable TPM support</flag>
<flag name="dracut">Enable dracut support</flag>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a new use flag here but can directly depend on dracut

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I removed the use flag.

@pothos
Copy link
Member

pothos commented Feb 27, 2024

Looks ok but needs some cleanup: All unmodified ebuild folders should go under portage-stable not coreos-overlay, e.g., jose.

(This replaces #909)

@ader1990
Copy link
Contributor

Hello @simoncampion, can you please rebase on top of flatcar/scripts main branch so that we can trigger a CI run too?

Thanks.

@simoncampion
Copy link
Contributor Author

Hello @simoncampion, can you please rebase on top of flatcar/scripts main branch so that we can trigger a CI run too?

Thanks.

@ader1990 Will do! I suggest we wait for the associated changes of the bootengine to be finalized. I'll then update this PR to point to the correct version of bootengine, move unmodified ebuilds from coreos-overlay to portage-stable, as suggested by @pothos , and rebase on master.

@simoncampion
Copy link
Contributor Author

Looks ok but needs some cleanup: All unmodified ebuild folders should go under portage-stable not coreos-overlay, e.g., jose.

I moved all unmodified ebuild folders (all but app-crypt/clevis) to portage-stable.

I also rebased on main. I'll build this tomorrow morning locally and check the initramfs size and make sure all disk encryption tests still pass.

@pothos
Copy link
Member

pothos commented Mar 13, 2024

The CI failed with emerge: there are no ebuilds to satisfy "sys-devel/autoconf-archive". - maybe a fallout from the rebase on main?

@simoncampion
Copy link
Contributor Author

The CI failed with emerge: there are no ebuilds to satisfy "sys-devel/autoconf-archive". - maybe a fallout from the rebase on main?

Yes, packages that the added packages depend on have been moved, and therefore the added packages now fail to install. I'm working on fixing this by redoing all the package-adding commits with the most up-to-date versions of the packages.

@simoncampion
Copy link
Contributor Author

Before we hit merge, I'd like to see the new tests for disk encryption pass on the most recent version of this PR. The QEMU image I built locally hangs in boot, failing to modprobe btrfs ("Too many levels of symbolic links") and therefore failing to mount the OEM partition, which has recently been switched to btrfs. This is probably an issue with my local build / setup. I'll troubleshoot this and report back once I see the tests passing.

@ader1990
Copy link
Contributor

Before we hit merge, I'd like to see the new tests for disk encryption pass on the most recent version of this PR. The QEMU image I built locally hangs in boot, failing to modprobe btrfs ("Too many levels of symbolic links") and therefore failing to mount the OEM partition, which has recently been switched to btrfs. This is probably an issue with my local build / setup. I'll troubleshoot this and report back once I see the tests passing.

@simoncampion I think you also hit this bug flatcar/Flatcar#1389.

Copy link
Contributor

@ader1990 ader1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is quite heavy on the initrd size increase - ~2MB. Is it okay to merge it as is (we will again get very close to the 50% used space of the /boot partition, runway left ~2MB)?

flatcar/Flatcar#1381

@pothos
Copy link
Member

pothos commented Mar 14, 2024

Can you rebase once more? There is still a conflict with the bootengine ebuild

Simon Campion added 20 commits March 14, 2024 12:08
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from Gentoo commit 2f6a333fb9bed9c7ab9b5a49065d157b62e48420
It's from GURU commit 05abdcd720bc767a152082750d9c7a044d638059
It's from GURU commit 05abdcd720bc767a152082750d9c7a044d638059
app-crypt/clevis includes dracut modules that must be installed before the initramfs is built
@simoncampion
Copy link
Contributor Author

@pothos I rebased again.

@ader1990 Many thanks for pointing me to that issue! I checked the logs and, indeed, before the btrfs error I get exactly the error you describe in the issue. I'll see whether I can make some progress fixing it.

@pothos pothos merged commit 2ea7f40 into flatcar:main Mar 14, 2024
@pothos
Copy link
Member

pothos commented Mar 14, 2024

@simoncampion Thanks for working on this (much-requested!) feature. In the beginning we didn't really know how it would look and thanks to the willingness to do some extra explorations we now have something that's quite clean and also integrates well with systemd upstream tooling :)

@pothos
Copy link
Member

pothos commented Mar 15, 2024

I forgot that we could have (temporarily) used the mantle PR container image ref to get things tested already. The arm64 case needs some fixes: flatcar/mantle#509

Note that the systemd 255 update that was done in parallel also created a regression that couldn't be caught before merging because that branch wasn't using the new mantle tests yet even though it for testing it was rebased on this branch here.
Edit: it looks like a race, [email protected] fails on first boot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants