Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UKI #2753

Open
cgwalters opened this issue Oct 31, 2022 · 55 comments
Open

Support UKI #2753

cgwalters opened this issue Oct 31, 2022 · 55 comments
Labels
difficulty/hard hard complexity/difficutly issue reward/high Fixing this will result in significant benefit triaged This issue has been evaluated and is valid

Comments

@cgwalters
Copy link
Member

cgwalters commented Oct 31, 2022

See https://github.com/uapi-group/specifications/blob/main/specs/unified_kernel_image.md
and
https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1

There are two major points here:

UEFI only

We'll need to add a UEFI backend to ostree, which explicitly controls the UEFI boot ordering via e.g. efibootmgr instead of using the /boot/loader/entries stuff.

Kernel cmdline ➡️ rootfs

One goal of the UKI work is to have generic Linux distributions sign both the kernel and initramfs and stock kernel cmdline. However, ostree today embeds the target rootfs in the kernel cmdline - this creates a recursion issue.

Option: ostree=N and symlinks and using systemd-stub credentials

We can change ostree-prepare-root in the initramfs to automatically find the latest symlink in /sysroot/ostree - we effectively do almost this with /ostree/boot.[01] today.

(Something to debate here is whether we require an ostree= karg at all; our initramfs code is conservative today in making ostree opt-in, but for people who are requiring it, we could also just add a flag to default it to on, finding the latest deployment)

The interesting thing here is what it looks like to fetch a userspace only update.

That flow would look like this:

  • Initial system deployment has one UKI in ESP
  • ostree admin upgrade or bootc update or whatever, fetch new rootfs but not a new kernel UKI
  • ostree defaults to enabling rollback today, so for systemd-stub we'd copy the existing UKI, and add a credential that tells the initramfs to look for the previous deployment

Option: Parsing the UKI filename

See #2753 (comment)

@ricardosalveti
Copy link
Contributor

I think what we'd do instead is have the initramfs automatically find the latest symlink in /sysroot/ostree - we effectively do almost this with /ostree/boot.[01] today.

How would we know from the initramfs when a rollback was performed? Thinking on the use scenario in which the bootloader decides to rollback as the new deployment/update is not good enough (wasn't confirmed), as currently this can be done quite easily by booting the previous deployment (previous initramfs), which also has the previous ostree argument in place.

@dbnicholson
Copy link
Member

UEFI only

We'll need to add a UEFI backend to ostree, which explicitly controls the UEFI boot ordering via e.g. efibootmgr instead of using the /boot/loader/entries stuff.

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me. I'm pretty sure sd-boot deals with it fine as we've been using it with a UKI on a product for a few years. The only changes we have to make are horrifying ones to deal with the lack of symlinks on VFAT (#1719). I guess using UEFI boot variables would side step that issue, though.

@dbnicholson
Copy link
Member

Also, if you're building a UKI, the initramfs is part of it and there's no need for ostree to find it. Are you suggesting the ostree take the separate kernel and initramfs and generate a UKI?

@ericcurtin
Copy link
Collaborator

ericcurtin commented Nov 3, 2022

Is the "how to find the rootfs" problem, the chicken in the egg problem in that you need to populate the "ostree=" karg in the UKI, but you only know what that karg should be after you commit? So that you can only boot the n-1 commit in that case?

I had thoughts on this, you could write an extra value client side, maybe as a new entry "ostree" in the bls. So you could do:

title Fedora Linux 36.20221024.0 (Silverblue) (ostree:0)
version 2
options rhgb quiet root=UUID=7d8417b0-eb2d-4c6d-b0b1-ac72c11104d4 rootflags=subvol=root 
linux /ostree/fedora-3d1ddf0131c05a2bc1ea548f3ad426c25b03dfe672b7b5c0d725ad4417b062dc/vmlinuz-5.19.16-200.fc36.x86_64
initrd /ostree/fedora-3d1ddf0131c05a2bc1ea548f3ad426c25b03dfe672b7b5c0d725ad4417b062dc/initramfs-5.19.16-200.fc36.x86_64.img
ostree /ostree/boot.1/fedora/3d1ddf0131c05a2bc1ea548f3ad426c25b03dfe672b7b5c0d725ad4417b062dc/0
If (ostree_entry_exists) {
  read rootfs from "ostree" bls entry
}
else {
  do the previously existing read a karg way
}

But I'm happy with whatever works :)

Encountering the same issue in the UKI-like aboot bootloader.

@cgwalters
Copy link
Member Author

cgwalters commented Nov 15, 2022

I think the main reason to embed the rootfs in the kernel cmdline is basically integration with bootloader menus - e.g. to be able to choose the previous deployment in the GRUB GUI.

However, this is not a requirement. We could instead read a value from the target root, one could imagine something as simple as a symlink /ostree/deploy/fedora-coreos/deploy/current pointing to a deployment root. (And we could also have /ostree/current be a symlink pointing to the default stateroot so we basically have a default given a root filesystem)

Perhaps a strawman here is that specifying a bare ostree value on the kernel command line would mean "use the default". We could extend this to e.g. ostree_root=fedora-coreos to support specifying a stateroot (but I'm not sure how much we really care about the multiple stateroot stuff going forward)

And then to boot the previous deployment, we support an ostree=previous or more generally ostree=n=[0..] that takes an integer value.

@cgwalters
Copy link
Member Author

Is the "how to find the rootfs" problem, the chicken in the egg problem in that you need to populate the "ostree=" karg in the UKI, but you only know what that karg should be after you commit?

By default, we don't do client side commits. Hence, the digest is actually fully predictable and known in advance on the build server. But certainly there is a circular dependency here for any systems which are doing fully sealed kernel command lines - we'd need to generate the rootfs and compute its digest, then patch the kernel binary (which would in theory invalidate that digest, but OTOH nothing actually reads the kernel from the rootfs; the fallout would just be things like ostree fsck would fail on that file, but we could teach the client to ignore that).

Arguably perhaps, we should have better support on the client for something like "ghosting" the kernel/initramfs from /usr/lib/modules - i.e. we ship them in the ostree commit, but deploy time we actually prune the data from the rootfs to make clear it has migrated into the bootloader state (whether that's a separate /boot partition or UEFI).

@cgwalters
Copy link
Member Author

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me.

I think many users/organizations that want to deploy UKIs will want to do so without involving any bootloader at all. But yes, we should probably also support deployment with a bootloader.

@dbnicholson
Copy link
Member

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me.

I think many users/organizations that want to deploy UKIs will want to do so without involving any bootloader at all. But yes, we should probably also support deployment with a bootloader.

That does make sense. But what's actually unpacking the UKI in that case? Some other UEFI program? I don't believe the linux kernel itself supports booting directly from a combined kernel+initramfs PE program. Ah, sd-stub. I missed that. I guess if you're all in on UEFI and want the minimal boot environment, then even sd-boot is superfluous.

@ljrk0
Copy link

ljrk0 commented Jan 9, 2023

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me.

I think many users/organizations that want to deploy UKIs will want to do so without involving any bootloader at all. But yes, we should probably also support deployment with a bootloader.

That does make sense. But what's actually unpacking the UKI in that case? Some other UEFI program? I don't believe the linux kernel itself supports booting directly from a combined kernel+initramfs PE program. Ah, sd-stub. I missed that. I guess if you're all in on UEFI and want the minimal boot environment, then even sd-boot is superfluous.

It depends on the use-case and the hardware mostly. Adding new bootloader entries to the EFI menu works only so-so on some hardware/firmware and often switching b/w different entries on boot is cumbersome (not to mention vendor-specific). Thus, having different boot environments with different kernels/deployments would be much easier with sd-boot than with "native" UEFI boot loading. This is the reason why I use sd-boot on all my systems in combination with UKIs.

@ericcurtin
Copy link
Collaborator

One thing that's not clear to me is how do we deliver a UKI (is it it's own rpm?), because it would be built on the osbuild-side rather than the end device...

@dbnicholson
Copy link
Member

Sure, the same way you build the kernel and initramfs on an ostree system. They're just bundled together for a UKI. I think the only thing in there that doesn't fit that model is the kernel command line since ostree currently allows you to manage that locally and it often contains root, which is inherently local. Certainly you can come up with a default command line, but at a minimum you'd have to rely on something like systemd's discoverable partitions setup to not have root in there.

@dbnicholson
Copy link
Member

I guess the way we do it right now at Endless is that the initramfs is generated in our ostree builder. For our systems that use a unified kernel with sd-boot, that's also generated in our ostree builder.

There's no reason they couldn't be packaged except that generating the initramfs requires installing all the dracut modules that you want in there. We decided that wasn't worth the effort and it was easier to do that in the ostree builder since it would by definition have all the modules installed.

@alexlarsson
Copy link
Member

Just to be explicity, i didn't propose just shipping the commit in the detached metadata, as that is not trusted. What i proposed it to ship some public key in the initrd, sign the commit id with the private part, store it in detached metadata, and then throw away the private key.

Then the initrd can validate the commit it reads from somewhere.

@travier
Copy link
Member

travier commented Mar 29, 2023

Posting here the result of several discussions that we've had recently:

The major change is the need to move the ostree deployment hash out of the kernel command line as the kernel command won't be modifiable in the UKI case.

The suggested design is that ostree would take the UKI from the ostree commit, move it to the EFI partition and rename it with the following convention:

<boot entry order>.<name of the UKI>.<ostree deployment hash>

For example: 0.fedora-6.1.11-200.fc37.x86_64.ostree=ac1613dda93a56bfbef…

We would then need to add support in the initramfs to read the ostree deployment hash from the name of the UKI that has been booted instead of reading it from the kernel command line. This could be done either by reading the name from EFI variables or from the TPM event log.

# efibootmgr -v -u
BootCurrent: 0001
Timeout: 0 seconds
BootOrder: 0001
...
Boot0001* redhat HD(2,GPT,0a368044-fab0-914a-9500-218489723cfd,0x2800,0x7e000)/File(\EFI\redhat\shimx64.efi)\EFI\Linux\vmlinuz-5.14.0-282.kpq0.el9.x86_64-virt.efi

We'll need to add a UEFI backend to ostree, which explicitly controls the UEFI boot ordering via e.g. efibootmgr instead of using the /boot/loader/entries stuff.

While this would be nice, I don't think it's strictly needed if we still have a bootloader (systemd-boot preferably) that is capable of booting BLS config entries.

@travier
Copy link
Member

travier commented Mar 29, 2023

The design above could be combined with the suggestion from #2753 (comment) and the use of composefs to verify the content of the deployment.

@ericcurtin
Copy link
Collaborator

Sorry I deleted that as I wanted to rewrite a portion, it belongs before @alexlarsson 's comment

So some automotive folk were discussing Android boot images, which are similar to UKIs in that it is a "kernel, initrd, cmdline and signature" that gets generated server-side and delivered to the client via ostree. This leave an issue of how do you deliver and boot the ostree SHA.

It is difficult to boot via karg because that has the recursive problem, how do you deliver that SHA without altering the SHA?

@alexlarsson suggested ostree detached metadata, that way you can deliver the SHA without altering the SHA, so we think this should solve the problem.

But this requires booting via an alternate means to booting via karg, and I will explore the symlink techniques @cgwalters suggested above.

Wondering what you guys think of this as a proposal? A similar technique could be used for UKIs and Android Boot Images.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Mar 29, 2023

@travier thanks for sharing the output of the discussions here, in the case where you don't have an EFI partition available which is the case in Android Boot Images, do you think it's reasonable to move forward with @cgwalters symlink approach suggested above?

#2753 (comment)

@bauen1
Copy link

bauen1 commented Mar 29, 2023

We would then need to add support in the initramfs to read the ostree deployment hash from the name of the UKI that has been booted instead of reading it from the kernel command line. This could be done either by reading the name from EFI variables or from the TPM event log.

The UEFI variable LoaderImageIdentifier is set by systemd-stub, that might be a much simpler way of reliably obtaining the filename that was booted, since you probably want to support systems without a TPM.

Reading this thread, I can't help but feel like this is getting over engineered (or rather "complex") ...

I'm personally not interested in building the UKI on the server and loosing the ability to specify command line arguments, however I think that's a requirement if you want the UKI to be signed e.g. by the distribution itself ?

Since that isn't my goal, I'm currently building the UKI on the host, supporting kernel arguments, If I need the image on the build server, e.g. for signing or attestation, I can simply take the kernel arguments and build the (fully reproducible) UKI.

@alexlarsson
Copy link
Member

For example: 0.fedora-6.1.11-200.fc37.x86_64.ostree=ac1613dda93a56bfbef…

We would then need to add support in the initramfs to read the ostree deployment hash from the name of the UKI that has been booted instead of reading it from the kernel command line. This could be done either by reading the name from EFI variables or from the TPM event log.

The problem with this is that it moves the indentifier for the rootfs from a trusted location (in signed uki) to a completely untrusted location (the filename). Anyone can just rename the FAT file and make it boot some other rootfs.

This is fine if you don't care about validationg, but it is nowhere enough for a secureboot trusted boot into the rootfs.

@cgwalters
Copy link
Member Author

The problem with this is that it moves the indentifier for the rootfs from a trusted location (in signed uki) to a completely untrusted location (the filename). Anyone can just rename the FAT file and make it boot some other rootfs.

Not any other rootfs; you'd include the key used to sign the composefs in the initramfs, and validate it from there.

So the problem then turns to rollback protection, and that's a nuanced topic because it's absolutely valid to want to roll back sometimes.

@ericcurtin
Copy link
Collaborator

I liked the symlink approach over EFI partition. The problem with using EFI features, is you start to depend on fully implemented UEFI, which would be nice, but it's not always the case, especially on non-x86 systems. If we could self-contain the solution as much as we can in the main rootfs partition it would be better (over using EFI partitions).

@cgwalters
Copy link
Member Author

I liked the symlink approach

To be clear "symlink approach" = #2753 (comment) ?

I edited that earlier comment to elaborate a bit about how rollbacks would work; so the previous bootloader entries would gain ostree=1 to mean the previous deployment (as opposed to ostree=0 being the default).

@ericcurtin
Copy link
Collaborator

I liked the symlink approach

To be clear "symlink approach" = #2753 (comment) ?

Yes, it removes the hard dependency on EFI.

I edited that earlier comment to elaborate a bit about how rollbacks would work; so the previous bootloader entries would gain ostree=1 to mean the previous deployment (as opposed to ostree=0 being the default).

@travier
Copy link
Member

travier commented Mar 29, 2023

We need a way to choose which deployment to boot as we need to support rollbacks (rollback protection is another topic that we are not covering here and would be implemented separately). As we can not change the command line, we need a way to pass that info to the initramfs. Using the filename of the UKI is one way of doing that.

Note that this deployment hash isn't particularly trusted data: it only makes sense if the deployment exists in the rootfs. Whether or not it's a valid deployment is thus a question of whether or not we have integrity for the rootfs and that's a composefs / LUKS discussion.

You can not use that to boot an arbitrary deployment that would not be in the rootfs already.

@cgwalters
Copy link
Member Author

@travier what problems do you see with #2753 (comment) ?

@travier
Copy link
Member

travier commented Mar 29, 2023

Perhaps a strawman here is that specifying a bare ostree value on the kernel command line would mean "use the default". We could extend this to e.g. ostree_root=fedora-coreos to support specifying a stateroot (but I'm not sure how much we really care about the multiple stateroot stuff going forward)

And then to boot the previous deployment, we support an ostree=previous or more generally ostree=n=[0..] that takes an integer value.

As far as I understand this involves modifying the kernel command line which is not compatible with UKIs.

@travier
Copy link
Member

travier commented Mar 29, 2023

If we do a mapping boot-entry-number -> deployment-hash in the rootfs then that could work but that would be an additional indirection layer:

  • Read boot entry number from UKI file name
  • Open rootfs, find ostree deployment hash from symlink: /ostree/deploy/fedora-coreos/deploy/0 -> 1234567890...
  • Use the deployment hash

@travier
Copy link
Member

travier commented Mar 29, 2023

Not sure how robust this would be in case of power failures as we would need to update two places at the same time every time we do a new deployment: UKI file name + deployment hash symlink.

@cgwalters
Copy link
Member Author

But we don't change the UKI for every deployment. We don't want to have to touch the kernel config when only userspace changes in general, right?

@travier
Copy link
Member

travier commented Mar 29, 2023

For systemd/systemd#24539 to work we need BLS Type 1 entries (config files) and bootloader support to extend the UKI kernel command line with the options passed into that config file.

@travier
Copy link
Member

travier commented Mar 29, 2023

If we have ostree=0 in the kernel command line, use the deployment root to which the symlink /ostree/0 is pointing.

How do you set that in the kernel command line and how do you update that when you change the order of deployments?

@travier
Copy link
Member

travier commented Mar 29, 2023

We could generate a random hash and include it both in the UKI kernel command line and setup the symlinks in the rootfs but that would be another indirection like I mentioned in #2753 (comment).

@cgwalters
Copy link
Member Author

You're right, I wasn't covering a detail here. At this point though the thread is unwieldy, so I've amended the initial comment here. I think systemd-stub credentials are already a way to pass this data and it's what it's designed for.

That said, I also do think we can't design solely for systemd-stub. A very interesting case that's entwined with all of this is whether systems using ostree want to explicitly support locally-initiated rollback.

If you don't (and I think that's valid!) then there's no need for a "fallback" UKI that would appear as a separate bootable entry at all. Instead, it'd be up to userspace (whether initramfs or real root) to verify health and locally initiate a change in the default UKI/rootfs pair.

@travier
Copy link
Member

travier commented Mar 29, 2023

Using credentials is indeed also an option.

Note that this requires systemd-stub so is UEFI only, so this is the same case as the filename option but easier to manage I agree and lets us share the UKIs.

@travier
Copy link
Member

travier commented Mar 29, 2023

For a kernel binary called foo.efi, it will look for files with the .cred suffix in a directory named foo.efi.extra.d/

Looks like this won't let us share UKI if I understand correctly.

@bauen1
Copy link

bauen1 commented Mar 30, 2023

I'll post my current test setup here, simply because it might be useful to someone, obviously it won't be usable for the use case discussed here (Firmware SecureBoot, i.e. with Microsofts keys).

  • It assumes, that the UEFI implementation is properly done, i.e. allows easily choosing between multiple Boot entries, creating/deleting a lot of Boot variables and modifying the BootOrder a lot isn't a problem, ...
  • The UKI image is around 64MiB+, and two copies are placed on the ESP for every BLS entry, if you wanted another UKI without the quiet option, or one for recovery, that can become very costly.
  • It supports local kernel cmdline arguments, which is incompatible with the use cases discussed here
  • The objcopy call appears to be the slowest part, but perhaps this could be optimized, by linking everything except the cmdline during build already.
  • The UKI build is reproducible
  • The actual deployment should be atomic by doing the following:
    1. Remove Boot variables pointing to a leftover /boot/efi/EFI/bauen1-uki.$new_bootnum/UKIs
    2. Remove any leftover /boot/efi/EFI/bauen1-uki.$new_bootnum/UKIs
    3. Copy the UKIs to /boot/efi/EFI/bauen1-uki.$new_bootnum/UKIs
    4. Synchronize the ESP filesystem
    5. Create new Boot variables pointing to /boot/efi/EFI/bauen1-uki.$new_bootnum/*
    6. Remove the now unnecessary Boot entries pointing to /boot/efi/EFI/$old_bootnum/*
    7. Remove /boot/efi/EFI/$old_bootnum/* (And I've just realized, that I forgot to implement this part ...)
    8. Synchronize the ESP filesystem
      This way there should always be a set of UKIs with associated Boot entries to boot from, but I'm not really sure how atomic the update of the EFI variables is, especially the automatic modification of BootOrder by efibootmgr.

Instead of doing the Boot-entry dance using systemd-boot would probably be easier, however I don't really like how much "magic" systemd-boot does, it seems easy to accidentally to build an actually insecure system.

Here is the grub-mkconfig script:

#!/bin/sh

set -eu

if [ "$1" != "-o" ]; then
    echo "Usage: $0 -o <cfg>"
    exit 1
fi

if [ -z "$2" ]; then
    echo "Usage: $0 -o <cfg>"
    exit 1
fi

# FIXME: assert, that _OSTREE_GRUB2_IS_EFI is not set, if it has been set, then
#        ostree will use different logic, which is probably incompatible.
# FIXME: replace by using _OSTREE_GRUB2_BOOTVERSION, which also checks that we have been called by ostree
# We get called like `grub-mkconfig -o /boot/loader.0/grub.cfg`, use $2 to obtain the /boot/loader.$bootnum directory
if [ "$2" = "/boot/loader.0/grub.cfg" ]; then
    OLD_BOOTNUM="1"
    NEW_BOOTNUM="0"
elif [ "$2" = "/boot/loader.1/grub.cfg" ]; then
    OLD_BOOTNUM="0"
    NEW_BOOTNUM="1"
else
    echo "Usage: $0 -o /boot/loader.[01]/grub.cfg"
    exit 3
fi

LOADER_DIR="$(dirname "$2")"

if [ -d "$LOADER_DIR/uki" ]; then
    # Might be a left over from e.g. a failed previous run.
    echo "Removing (old) $LOADER_DIR/uki"
    rm -r "$LOADER_DIR/uki"
fi
mkdir "$LOADER_DIR/uki"

for entry_file in "$LOADER_DIR"/entries/*.conf; do
    echo "Parsing BLS entry file '$entry_file':"

    # 1. Parse the BLS configfile:
    ENTRY_TITLE="$(grep "^title " "$entry_file" | sed 's/^title //')"
    ENTRY_VERSION="$(grep "^version " "$entry_file" | sed 's/^version //')"
    ENTRY_OPTIONS="$(grep "^options " "$entry_file" | sed 's/^options //')"
    ENTRY_LINUX="$(grep "^linux " "$entry_file" | sed 's/^linux //')"
    ENTRY_INITRD="$(grep "^initrd " "$entry_file" | sed 's/^initrd //')"

    # Technically the 'version' is supposed to be sorted using debian version sort style, but we assume
    # that the filenames generated by ostree are enough for ordering, which will probably break once you have 9+ deployments

    ENTRY_FILENAME="${entry_file##*/}"
    UKI_PATH="$LOADER_DIR/uki/${ENTRY_FILENAME%.conf}.efi"
    echo "Resulting UKI will be stored in '$UKI_PATH'"

    echo "$ENTRY_OPTIONS" > "$UKI_PATH.cmdline"

    # Build the actual UKI, note that it is always rebuild / shouldn't exist yet
    # --preserve-dates: For a reproducible timestamp in the PEI header
    objcopy \
        --preserve-dates \
        --add-section .cmdline="$UKI_PATH.cmdline" --change-section-vma .cmdline=0x30000 \
        --add-section .linux="/boot/$ENTRY_LINUX" --change-section-vma .linux=0x2000000 \
        --add-section .initrd="/boot/$ENTRY_INITRD" --change-section-vma .initrd=0x3000000 \
        /usr/lib/systemd/boot/efi/linuxx64.efi.stub \
        "$UKI_PATH"
done

# Sync build images to /boot/efi
# See also <https://bugzilla.gnome.org/show_bug.cgi?id=724246>

ESP_DIR="/boot/efi/EFI/bauen1-uki"

mkdir -p "$ESP_DIR.0" "$ESP_DIR.1"
sync --file-system "/boot/efi/EFI"

echo "OLD_BOOTNUM: $OLD_BOOTNUM"
echo "NEW_BOOTNUM: $NEW_BOOTNUM"

# We assume, that the currently used Boot variables point to "$ESP_DIR.$OLD_BOOTNUM", so we can safely
# remove "$ESP_DIR.$NEW_BOOTNUM"

# Figure out some values for modifiny UEFI Boot variables:
ESP_DEVICE="$(df /boot/efi | tail -1 | awk '{ print $1 }')"
ESP_PARTNUM="$(cat /sys/class/block/"$(basename "$ESP_DEVICE")"/partition)"
ESP_PARTUUID="$(blkid "$ESP_DEVICE" -o export | awk -F'=' '/PARTUUID=/ { print $2 }' )"
echo "device=$ESP_DEVICE partnum=$ESP_PARTNUM partuuid=$ESP_PARTUUID"

cleanup_bootvars() {
    # Removes any boot variables referencing a certain $ESP_DIR.$BOOTNUM
    # $1: bootnum

    # Now we know that we are looking for something similar to:
    # HD($ESP_PARTNUM,GPT,$ESP_PARTUUID,somehex,somehex)/File(\EFI\bauen1-uki.$BOOTNUM\.*)

    # efibootmgr outputs like:
    # BootXXXX* title with possible spaces\tActualEntry
    ENTRIES="$(efibootmgr -v | grep -E '^Boot[[:xdigit:]]{4}' | awk -F'\t' '/^[^\t]+\tHD\('"$ESP_PARTNUM,GPT,$ESP_PARTUUID"',.*\)\/File\(\\EFI\\bauen1-uki.'"$1"'\\.*\)$/ { print $0 }')"

    printf "Boot entries that will be removed:\n%s\n" "$ENTRIES"

    for entry in $(echo "$ENTRIES" | grep -E '^Boot[[:xdigit:]]{4}' --only-matching | sed 's/^Boot//'); do
        echo "Removing $entry"
        efibootmgr --delete-bootnum --bootnum "$entry"
    done
}

# 1. Cleanup any left over Boot variables still pointing to $ESP_DIR.$NEW_BOOTNUM
cleanup_bootvars "$NEW_BOOTNUM"

# 2. Cleanup $ESP_DIR.$NEW_BOOTNUM
if [ -e "$ESP_DIR.$NEW_BOOTNUM" ]; then
    echo "Removing $ESP_DIR.$NEW_BOOTNUM"
    rm -r "$ESP_DIR.$NEW_BOOTNUM"
    sync --file-system "/boot/efi/EFI"
else
    echo "Skipping removal of $ESP_DIR.$NEW_BOOTNUM, does not exist"
fi

# 3. Create new $ESP_DIR.$NEW_BOOTNUM
echo "Creating $ESP_DIR.$NEW_BOOTNUM"
mkdir "$ESP_DIR.$NEW_BOOTNUM"
cp -v "$LOADER_DIR/uki"/*.efi "$ESP_DIR.$NEW_BOOTNUM"/
sync --file-system "/boot/efi/EFI"

# 4. Create new Boot variables
for f in "$ESP_DIR.$NEW_BOOTNUM"/*; do
    echo "Creating Boot entry for file '$f':"
    efibootmgr \
        --create \
        --disk="$ESP_DEVICE" \
        --part="$ESP_PARTNUM" \
        --label="${f##*/}" \
        --loader="${f##/boot/efi}"
done

# 5. Set BootOrder (and maybe BootNext ?)
# FIXME: efibootmgr --create adds the entries to the currently defined BootOrder, however I need to verify
#        what order is used, and if that is already what is necessaery
#        It appears to already do everything correctly.

# 6. Remove now unused old Boot variables
cleanup_bootvars "$OLD_BOOTNUM"

# Finally actually touch the output file to make ostree happy
echo "Touching empty (fake) output file '$2'"
touch "$2"

@cgwalters cgwalters added difficulty/hard hard complexity/difficutly issue triaged This issue has been evaluated and is valid reward/high Fixing this will result in significant benefit labels May 2, 2023
coiby pushed a commit to coiby/kexec-tools that referenced this issue Aug 21, 2023
UKI are not supported on rpm-ostree based Fedora variants so let's use
recommend for binutils for now to let those not include the package
until needed.

See: coreos/fedora-coreos-tracker#1496
See: ostreedev/ostree#2753
See: https://src.fedoraproject.org/rpms/kexec-tools/c/ea7be0608ed719cc1cb134ecf6ef51a4b7e9f104?branch=rawhide
@ericcurtin
Copy link
Collaborator

ericcurtin commented Apr 23, 2024

Btw for the Android Boot Image implementation this is what we did (it's high level design is very similar to UKIs).

UKIs aren't designed to have as malleable a cmdline as a BLS file locally client-side, so we set ostree karg to simply:

ostree=true

Then we created symlinks like:

/ostree/root.a
/ostree/root.b

which pointed to two different sysroots (the ostree systemd generator parsed the osname/stateroot from this symlink also).

@travier
Copy link
Member

travier commented Jul 18, 2024

So it looks like we have 3 options:

So it looks like the only option in the end is to write the UKI in /boot/ostree/ and generate a bit of GRUB config in /boot/grub2 to chainload the UKI.

@vittyvk
Copy link

vittyvk commented Jul 29, 2024

@travier I may have missed something in discussions but why was the easiest "manage UEFI entries directly and avoid the need to have a bootloader at all" option abandoned? This is pretty much what we do for Fedora UKI image (see https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2). The automation is done via kernel-bootcfg tool (part of 'virt-firmware' package) and we do A/B booting upon UKI upgrade. If the new UKI boots, it becomes the default.
I can see two possible drawbacks:

  • Non-UEFI systems have to be supported. Not sure how important this is in 2024 but in case it is, chainloading from grub seems to be the only way to go as even sd-boot is UEFI-only.
  • Buggy UEFI implementations which break when we try managing UEFI entries. I know this was real with some early implementations but is it still a problem in the real world? FWIW, virtualized envs and public clouds seem to be working pretty reliably.

@ericcurtin
Copy link
Collaborator

@travier I may have missed something in discussions but why was the easiest "manage UEFI entries directly and avoid the need to have a bootloader at all" option abandoned? This is pretty much what we do for Fedora UKI image (see https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2). The automation is done via kernel-bootcfg tool (part of 'virt-firmware' package) and we do A/B booting upon UKI upgrade. If the new UKI boots, it becomes the default. I can see two possible drawbacks:

  • Non-UEFI systems have to be supported. Not sure how important this is in 2024 but in case it is, chainloading from grub seems to be the only way to go as even sd-boot is UEFI-only.

I've had these discussion with systemd people before. In an ideal world everybody would be UEFI, sadly we don't live in an ideal world. UEFI on ARM is uncommon, even some x86 platforms don't use UEFI still, there was even a new x86 device being hacked on recently that ran non-UEFI slimboot. Chainloading can be an option sometimes but not always. Lets say you are in Automotive and need to hit a strict 2 second boot KPI, you simply cannot chainload in this case because of boot time KPIs. Maybe chainloading is the solution....

  • Buggy UEFI implementations which break when we try managing UEFI entries. I know this was real with some early implementations but is it still a problem in the real world? FWIW, virtualized envs and public clouds seem to be working pretty reliably.

@cgwalters
Copy link
Member Author

may have missed something in discussions but why was the easiest "manage UEFI entries directly and avoid the need to have a bootloader at all" option abandoned?

It's definitely an option. The biggest source of complexity is how it interacts with the rootfs which was touched on above. For ostree we do A/B style of the (kernel, rootfs) pair, not just the kernel/initramfs.

@vittyvk
Copy link

vittyvk commented Jul 30, 2024

For RHEL/Fedora, distro-shipped UKI now comes in 'kernel-uki-virt' RPM and it places the UKI to /lib/modules//vmlinuz-virt.efi. This is part of rootfs but it can't be booted from there directly. kernel-install scripts have to copy it to the ESP (in theory - anywhere but BLS suggests /EFI/Linux/). So basically, to switch from A to B for the kernel+rootfs pair you will need to:

  • Switch rootfs
  • Make sure you have a copy of the UKI on the ESP and adjust BootXXXX/BootNext/BootOrder EFI variables accordingly.

Looking at the alternatives to the 'direct UKI boot without a bootloader', I don't think there's going to be a big difference. E.g. if GRUB can chainload it from /boot, someone still needs to place it there, sd-boot can only load binaries from ESP AFAIU and so on. So one way or another, we still need to do some extra actions when switching from 'rootfs+UKI A' to 'rootfs+UKI B'. ESP can probably be treated as implementation detail, same as UEFI variables in NVRAM).
Note that sd-stub now also supports signed cmdline extensions and these can be either global or per-UKI so in case we need rootfs-specific parameters to go from 'A' to 'B' (e.g. ostree=....), this can be a cmdline extension.

@travier
Copy link
Member

travier commented Jul 30, 2024

UKIs are PE (Windows Portable Executable) files and require either UEFI or a bootloader capable of chainloading EFI (PE) binaries. Android Boot Image is a completely different format and I don't think we should conflate support for it into this issue. It should be its own issue as it's likely going to require a different implementation.

"manage UEFI entries directly and avoid the need to have a bootloader at all"

This is indeed an option. I however requires a lot more code changes to ostree as it would mean integrating EFI boot entry management logic.

@travier
Copy link
Member

travier commented Jul 30, 2024

The downside of going full firmware is that you no longer get an easily accessible option to rollback on boot. You have to enter the firmware interface and find the previous boot entry.

@vittyvk
Copy link

vittyvk commented Jul 30, 2024

@travier Yes, and not every firmware will give you the menu. So if switching from A to B manually during boot time is a must, then we will need to inject something like sd-boot in the chain. For RHEL/Fedora CVMs we decided that it's not and 'kernel-bootcfg' does automatic A/B switch: the newly installed UKI (would be UKI+rootfs in your case) is set as BootNext and if it boots successfully, then BootOrder is changed. This covers the most important use-case why someone would want to have a boot menu: the newly installed UKI does not boot. There are some corner cases of course, e.g. the newly installed UKI boots and pretends to work but e.g. networking is broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty/hard hard complexity/difficutly issue reward/high Fixing this will result in significant benefit triaged This issue has been evaluated and is valid
Projects
None yet
Development

No branches or pull requests

9 participants