Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay: add 15copy-installer-network dracut module #346

Merged

Conversation

dustymabe
Copy link
Member

@dustymabe dustymabe commented Apr 15, 2020

This dracut module delivers a coreos-copy-installer-network
systemd service and script that will detect when files have
been placed into /boot/ by coreos-installer install --copy-network
and appropriately copy them in place to be used by the initramfs
networking. If files are detected within /boot/coreos-installer-network/
then they will be considered to be the only source of networking
for that ignition boot (i.e. no networking kargs will be applied).

@dustymabe
Copy link
Member Author

In order to propagate networking from the live ISO installer environment the installer will place files into a directory in /boot/ (coreos/coreos-installer#212) and the systemd service/script from this PR will pick that up and apply it during the initramfs boot process on the ignition boot (firstboot) of the installed system.

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still feels wrong to me that this mechanism takes precedence over karg network options the user might've input. Though I guess we can fix that once we do coreos/coreos-assembler#1298 as discussed in coreos/fedora-coreos-tracker#460.

# location for NetworkManager to use the configuration.
echo "info: copying files from ${installer_network_dir} to ${initramfs_network_dir}"
mkdir -p $initramfs_network_dir
cp -v ${installer_network_dir}/* ${initramfs_network_dir}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to delete installer_network_dir here, no? Otherwise, it'll just hang around forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep /boot/ mounted read-only here. I don't think it hurts to keep the directory around especially since nothing after firstboot will consume it anyway.

Copy link
Member

@jlebon jlebon Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, forgot about the ro mount.

I don't think it hurts to keep the directory around especially since nothing after firstboot will consume it anyway.

Right yeah, not so much that it conflicts with anything. It's just bad optics IMO to have it stay in the root of /boot forever. For example, back when we used Anaconda, we would actively remove everything in /sysroot except what was needed (coreos/coreos-assembler#492). Nowadays we just have /boot, /ostree, and .coreos-aleph-version.json, which is nice. Similarly an ls /boot right now only has the files we actually need.

WDYT about adding a /run/tmpfiles.d/10-coreos-installer-network.conf file which has:

R /boot/coreos-installer-network - - - - -

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok added a second commit to clean up via tmpfiles.d. I can squash it if desired.

@dustymabe
Copy link
Member Author

It still feels wrong to me that this mechanism takes precedence over karg network options the user might've input.

I don't really feel like it's that wrong considering the user just did an install and explicitly told the installer to bake specific networking configuration in specifically for this first boot. We do have to do it this way for now (ignoring kargs completely) because we do embed default networking args, but as you mention below 👇🏼 we maybe be able to rework it when we no longer do that.

Though I guess we can fix that once we do coreos/coreos-assembler#1298 as discussed in coreos/fedora-coreos-tracker#460.

@dustymabe dustymabe force-pushed the dusty-copy-installer-network branch from 6016e33 to a873d2b Compare April 15, 2020 19:55
# [4] https://github.com/dracutdevs/dracut/blob/master/modules.d/35network-manager/module-setup.sh#L36
#
[Unit]
Description=Copy Live ISO Installer Networking Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Live ISO is a misnomer here. --copy-network can be done from live PXE, or in principle from a Fedora live image running the coreos-installer container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically you are right. How about Copy CoreOS Installer Networking Config ?

Copy link
Member

@jlebon jlebon Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also keep it generic as just a mechanism for baking in network configs in the image and coreos-installer just happens to use it. So e.g. it'd be Copy Network Configs and the directory would be ${bootmnt}/coreos-network-configs or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with either of those options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is similar to how we support adding a $boot/ignition/config.ign without the help of coreos-installer, and in fact kola knows how to do this today.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok if we want to change the name of the directory now is the time to do it. I named it something specific because I didn't see other uses for this (or at least not other encouraged uses). What do you all think we should do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshed: /boot/firstboot-network-configs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we decided in IRC to go with: /boot/coreos-firstboot-network

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coreos-installer PR to change the dir: coreos/coreos-installer#216

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this PR to use /boot/coreos-firstboot-network

dustymabe added a commit to dustymabe/coreos-installer that referenced this pull request Apr 16, 2020
In another code review [1] we agreed to make the name of the directory
more generic.

[1] coreos/fedora-coreos-config#346 (comment)
@dustymabe dustymabe force-pushed the dusty-copy-installer-network branch from a873d2b to 2fe49ad Compare April 16, 2020 16:48
@dustymabe dustymabe marked this pull request as ready for review April 16, 2020 16:49
@dustymabe
Copy link
Member Author

bringing this out of draft mode. testing looks good (combined with coreos/coreos-installer#216)

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, LGTM otherwise.

Comment on lines +30 to +63
echo "R ${realroot_firstboot_network_dir} - - - - -" > \
/run/tmpfiles.d/15-coreos-firstboot-network.conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out so we always do this if the directory exists, not just if it's not empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not. Deleting the files (presumably on success) will already make things hard to debug. This would make it worse I think.

@dustymabe
Copy link
Member Author

--- FAIL: ostree.remote (337.27s)

    --- PASS: ostree.remote/add (0.26s)

    --- PASS: ostree.remote/list (0.11s)

    --- PASS: ostree.remote/show-url (0.10s)

    --- PASS: ostree.remote/refs (1.01s)

    --- PASS: ostree.remote/summary (1.83s)

    --- PASS: ostree.remote/delete (0.41s)

        harness.go:731: Found emergency shell on machine 894d3b2a-eaff-4256-93bf-3d442af4d1fe console

This test passes for me when I run it locally so maybe let's retry.

@dustymabe
Copy link
Member Author

/retest

@dustymabe dustymabe force-pushed the dusty-copy-installer-network branch from 2fe49ad to dffc863 Compare April 16, 2020 20:09
# - i.e. Before=dracut-cmdline.service
# - Another is to run *after* nm-config.sh [3] in dracut-cmdline [4]
# and just delete all the files created by nm-initrd-generator.
# - i.e. After=dracut-cmdline.service, but Before=dracut-initqueue.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clear and obvious downside of the "running NM via dracut initqueue"...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Though when we do finally get NM running via systemd we'll need to make sure we work with the team such that there are two separate services: one for running nm-initrd-generator and one for starting NM. This will enable us to hook in between them OR maybe be able to just say Before=nm-initrd-generator.service

@dustymabe
Copy link
Member Author

The CI failure This appears to be a race condition. The After=...*boot.device in the systemd unit should be enough, but apparently not (kernel race, fun!):

  [    4.045181] systemd[1]: Found device /dev/disk/by-label/boot.                                                                     
  [  OK  ] Found device /dev/disk/by-label/boot                                                                                        
  [    4.051500] systemd[1]: Starting Copy CoreOS Firstboot Networking Config...                                                       
        Starting  Copy CoreOS Firstboot Networking Config                                                                              
  [    4.060573]  vda: vda1 vda2 vda3 vda4                                                                                             
  [    4.063296] coreos-copy-firstboot-network[479]: mount: /mnt/boot_partition: special device /dev/disk/by-label/boot does not exist.

will add a short loop to wait for the device before continuing

@dustymabe dustymabe force-pushed the dusty-copy-installer-network branch 2 times, most recently from add674d to 03390ec Compare April 16, 2020 21:17
@dustymabe
Copy link
Member Author

yay! CI passed now

Copy link
Contributor

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This dracut module delivers a coreos-copy-firstboot-network
systemd service and script that will detect when files have
been placed into /boot/ by `coreos-installer install --copy-network`
and appropriately copy them in place to be used by the initramfs
networking. If files are detected within /boot/coreos-firstboot-network/
then they will be considered to be the only source of networking
for that ignition boot (i.e. no networking kargs will be applied).
…t-network

This will clean up those files if they existed and were used.
@dustymabe dustymabe force-pushed the dusty-copy-installer-network branch 2 times, most recently from 8d81ffa to 1d881ac Compare April 17, 2020 14:00
@dustymabe
Copy link
Member Author

CI re-ran again because I pushed up a comment change for #346 (comment) and it failed. It appears the same fallacy that plagues After=dev-disk-by\x2dlabel-boot.device also plagues [ -e /dev/disk/by-label/boot ] so I switched the loop to just retry the actual mount command instead.

@dustymabe
Copy link
Member Author

if CI does pass I'll look at the logs and hopefully we'll see at least one info: retrying ${bootdev} mount in 1 second... message

Add short while loop to handle a race condition where
After=dev-disk-by\x2dlabel-boot.device doesn't seem to be sufficient
always.
@dustymabe dustymabe force-pushed the dusty-copy-installer-network branch from 1d881ac to ba00a21 Compare April 17, 2020 14:09
dustymabe added a commit to dustymabe/coreos-installer that referenced this pull request Apr 17, 2020
In another code review [1] we agreed to make the name of the directory
more generic.

[1] coreos/fedora-coreos-config#346 (comment)
@dustymabe
Copy link
Member Author

if CI does pass I'll look at the logs and hopefully we'll see at least one info: retrying ${bootdev} mount in 1 second... message

It did pass! and we do see it!

ostree.unlock/c0895ef6-5c96-417c-a6d7-171050a4c29a/journal.txt:Apr 17 14:21:24.260616 coreos-copy-firstboot-network[481]: info: retrying /dev/disk/by-label/boot mount in 1 second...
ostree.unlock/c0895ef6-5c96-417c-a6d7-171050a4c29a/journal.txt:Apr 17 14:21:25.269617 coreos-copy-firstboot-network[481]: info: retrying /dev/disk/by-label/boot mount in 1 second...
ostree.unlock/c0895ef6-5c96-417c-a6d7-171050a4c29a/console.txt:[    3.755254] coreos-copy-firstboot-network[481]: info: retrying /dev/disk/by-label/boot mount in 1 second...
ostree.unlock/c0895ef6-5c96-417c-a6d7-171050a4c29a/console.txt:[    4.764259] coreos-copy-firstboot-network[481]: info: retrying /dev/disk/by-label/boot mount in 1 second...

👏 👏 👏 👏

@dustymabe dustymabe merged commit 49de0c7 into coreos:testing-devel Apr 17, 2020
@dustymabe dustymabe deleted the dusty-copy-installer-network branch April 17, 2020 14:51
@cgwalters
Copy link
Member

I think the races may be due to coreos-gpt-setup@dev-disk-by\x2dlabel-root.service. We need more ordering against that.

jlebon added a commit to jlebon/ignition-dracut that referenced this pull request May 25, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
@cgwalters
Copy link
Member

@cgwalters
Copy link
Member

Filed as coreos/fedora-coreos-tracker#523

jlebon added a commit to jlebon/ignition-dracut that referenced this pull request Jun 9, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to jlebon/ignition-dracut that referenced this pull request Jun 11, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to jlebon/ignition-dracut that referenced this pull request Jun 17, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to jlebon/ignition-dracut that referenced this pull request Jun 18, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to jlebon/ignition-dracut that referenced this pull request Jun 18, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to coreos/ignition-dracut that referenced this pull request Jun 18, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants