Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jepio/runtime revert to cgroupv1 #59

Closed
wants to merge 391 commits into from

Conversation

jepio
Copy link
Member

@jepio jepio commented Feb 7, 2022

[Title: describe the change in one sentence]

To be used with flatcar/bootengine#35

[ describe the change in 1 - 3 paragraphs ]

How to use

[ describe what reviewers need to do in order to validate this PR ]

Testing done

[Describe the testing you have done before submitting this PR. Please include both the commands you issued as well as the output you got.]

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)

Vito Caputo and others added 30 commits August 24, 2015 18:45
Instead of being a dracut hook, we also move it up a bit in the boot
process for better integration with ignition-files.service.
This no longer runs as a sourced dracut hook so it can't rely on
dracut library functions.

We're now running under systemd and the journal will capture the output
so this removes the /tmp/bootengine.out logging as well.
To keep consistent with the normal userspace mechanism, running
coreos-tmpfiles as-is for now.

Home directories for any users having home direcotires are an
outstanding issue, at the moment this is limited to the /home/core user
since /root is created via baselayout.conf.
This allows us to assume all initial coreos users and groups are present
in the initrd as well.
This gets us the home directories side of baselayout from the initrd.
Initialize etc shadowdb and create homes
dracut: use idiomatic method for installing rules
This snuck in from experimentation and should never have been merged.
It's being ignored by systemd-networkd, since .network files not
actually systemd units.
dracut: remove junk from zz-default.network
This is screwing with EC2's networking.
dracut: remove link-local addr from network config
As of systemd v225 mount no longer is invoked with -n making the /etc/mtab
symlink necessary from an early stage.
dracut: apply etc.conf in initrd-setup-root
ldconfig does not work on non-native arches. Use the ldconfig
present in the target chroot to properly create the library cache.

Signed-off-by: Andrej Rosano <[email protected]>
update-bootengine: use the native ldconfig
Our arm64 builds do not include SELinux and if this tmpfiles job fails
ignition fails causing the first boot to drop into an emergency shell.
dracut: add mkfs.xfs to ignition
initrd-setup-root: check selinux tmpfiles configs before using them
setup-root: fix ordering between selinux-base.conf and libsemanage.conf
dracut: parse coreos.oem.id for ignition
pothos and others added 21 commits July 23, 2021 16:44
The /usr partition should not be modified during mounting, even if it
has some corruption. Rewriting the filesystem log would cause dm-verity
errors when dm-verity is enabled later again. While the /usr partition
normally is on a dm-verity block device in read-only mode there is some
option to mount the partition without dm-verity and it wouldn't be a
read-only block device anymore.
Add the norecovery mount option which is supported for ext4 and btrfs.
The hash offset is found by looking at the filesystem size. When
e2size can't find the size it returns "Success" in stderr for whatever
reason and fortunately still returns an error exit code, stdout stays
empty. This means that the dm-verity device setup won't work because
the hash offset is the empty string. However, the hash offset is
actually fixed because the GPT disk layout has to stay the same in
Flatcar Container Linux as the partition contents are swapped out
when updating. In case another filesystem like btrfs is used, e2size
doesn't work and it makes sense to fall back to the only value which
is supported in general.
Hard code the hash offset value coming from the /usr filesystem size
defined in flatcar-scripts/build_library/disk_layout.json.
dracut: fall back to expected dm-verity hash offset
In the default mode dm-verity leads to failed read syscalls when
corruption is detected. This is hard to spot and also not secure
because it allows altering system behavior by introducing failures
to particular parts of the system and thus turning off security
mechanisms. Thus, the value of dm-verity isn't really utilized.

Issue a kernel panic instead of just failing a read syscall when
corruption is detected. This allows users to spot the issue and
take action and fully utilizes the advantages of having dm-verity.
The network-cleanup.service failed when it wasn't fast enough to
complete before the root filesystem is switched:
  network-cleanup.service: Current command vanished from the unit file,
  execution of the command list won't be resumed.
Similar to the sysroot-boot.service we need to be finished with the
network-cleanup.service before the switch is done.

Add the ordering and conflicts directives as done in
sysroot-boot.service.
dracut: issue a kernel panic on dm-verity corruption
dracut: let network-cleanup.service finish before entering the rootfs
Right now, when ignition fails during boot the following happens:

1. ignition service fails, which fails initrd.target fails and starts emergency.target
2. initrd-parse-etc.service is already starting and launches initrd-fs.target and initrd-cleanup.service
3. initrd-cleanup.service isolates initrd-swithc-root.target
4. emergency.service is stopped and starts default.target again
5. initrd.target starts ignition services again
6. loop back to 1.

This is a boot loop that makes it hard for users to figure out what is wrong
with their ignition config. This appears to be unintentional, and something
observed in CoreOS and Fedora CoreOS as well.

Apparently, for a brief while systemd did not allow OnFailure clauses in
targets. This was fixed in systemd, but coreos shipped this fix as well
coreos/ignition-dracut#188.

So add OnFailure clauses to ignition units to ensure boot stops right there.

Signed-off-by: Jeremi Piotrowski <[email protected]>
dracut: run emergency.target on ignition/torcx service unit failure
It's required to run ignition/v3 since this commit: coreos/ignition@894e2a4

Signed-off-by: Mathieu Tortuyaux <[email protected]>
dracut/ignition: add `wipefs` to the initramfs
The "create" action became "open", and "remove" became "close". Also
reorder the parameters accordingly (it's a bit different for "open" vs
"create"). Also put the options before specifying the action.
…usage

dracut: Stop using deprecated actions in veritysetup
Unlike with Kernel 5.10, dracut does not automatically install `loop.ko`
with Kernel 5.15. Explicitly install the loop module from the dracut
command line.
update-bootengine: make dracut install loop driver
The content that was supposed to be in the heredoc got interpreted as
parameters for "cat", causing it to fail.
Move the separator dahes into the next line to make sure it gets
treated as heredoc content.
The old CoreOS online validator does not have a new Flatcar home yet.
Use the repo link for the binary of it instead.
dracut/99emergency-timeout/timeout.sh: fix heredoc leak into arguments
network: Enable the RAs to fix IPv6 address assignment
t-lo and others added 2 commits February 18, 2022 17:09
dracut, when copying (installing( files into the initrd image, preserves
xattrs by default. Docker containers use overlayfs, which does not
support some extended atttributes.
When creating an initrd in a containerised build, dracut prints a number
of warnings and errors, and the resulting initrd is subtly broken (e.g.
soft-links aren't installed).

This patch sets the DRACUT_NO_XATTR env variable to prevent dracut from
tyring to preserve xattrs while copying.

We have validated in both the containerised SDK as well as in the
cork-based chroot SDK that no xattrs are used in /build/* by means of

    sudo find /build/amd64-usr/ -type f -exec getfattr -d {} \;

which came up empty.

Signed-off-by: Thilo Fromm <[email protected]>
update-bootengine: fix containerised builds
@jepio jepio force-pushed the jepio/runtime-revert-to-cgroupv1 branch 2 times, most recently from a2f9a6e to ae8ebc3 Compare February 21, 2022 12:12
Pass a custom init commands (init.wrapper) to 'systemctl switch-root'
when leaving the initramfs. This init.wrapper is responsible for
disabling all cgroup controllers and unmounting cgroup2, and injecting
(by means of a bind mount) custom kernel arguments so that sysroot
systemd sets up cgroupv1. We're adding this so that Flatcar users don't
have to reboot to switch back to cgroupv1, which is disruptive in many
deployments.

The usage of init.wrapper is conditional on two things:
* initramfs used up cgroupv2.
* user opted into to this behavior by creating the
  `/etc/flatcar-cgroupv1` flag file

There is a reason to not go through init.wrapper unconditionally, and
decide whether to take action-or-not there. When systemd runs in sysroot
and initramfs, it serializes some data structures and passes them from
the initramfs to allow some introspection. I believe this is primarily
systemd-analyze timing information. This doesn't work when we inject a
custom init between sysroot and initramfs (init.wrapper), so ensure we
only use the custom init when we really need to.

This is accomplished by passing the init.wrapper path through an INIT
environment variable, which is only defined when '/etc/flatcar-cgroupv1'
exists in the sysroot. An alternative would have been to conditionally
symlink the override snippet, which would get applied in the 'systemctl
daemon-reload' in 'initrd-parse-etc.service'. Neither option is
particularly obvious, but this one seems slightly easier to follow.
…fault

Moving controllers between cgroup v1 and v2 is difficult once they have been
used. So create a system.conf.d snippet that sets the Default*Accounting=no so
that moving controllers becomes possible.
@jepio jepio closed this Feb 21, 2022
@jepio jepio force-pushed the jepio/runtime-revert-to-cgroupv1 branch from ae8ebc3 to 4182225 Compare February 21, 2022 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.