Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initramfs network configuration #460

Closed
jlebon opened this issue Apr 15, 2020 · 6 comments · Fixed by #691
Closed

Initramfs network configuration #460

jlebon opened this issue Apr 15, 2020 · 6 comments · Fixed by #691

Comments

@jlebon
Copy link
Member

jlebon commented Apr 15, 2020

We want to rework networking in the initramfs so that:

  1. we allow conditional networking (Don't bring up networking in the initramfs on first boot by default #443)
  2. we allow platform-specific network configs to be injected (e.g. [WIP] providers/vmware: add an experimental rd-net-kargs afterburn#379)
  3. we enable the live ISO + coreos-installer network config path (install: UI for forwarding network information into next boot  coreos-installer#205)

Chatted with @dustymabe and @lucab about this, and the proposal we came up with is the following:

  • We remove the hardcoded kargs from the grub config (grub: drop default rd.neednet=1 ip=dhcp,dhcp6 kargs coreos-assembler#1298)
  • Instead, we split it into two separate concerns: (1) whether networking is required or not, and (2) what networking configuration to use if required. So e.g. for (1), Ignition can request networking without worrying about what the network config should be. For (2), we move it to Afterburn.
  • We represent these via /etc/cmdline.d dropins.

So the end state would look something like this:

  • Canonical network configuration kargs live in Afterburn. This will be ip=dhcp,dhcp6 on most platforms but may be fed from a platform-specific channel if available (e.g. VMWare guestinfo). It writes this to e.g. /etc/cmdline.d/50-afterburn-network-kargs. It defers to any ip= kargs explicitly provided on the kernel cmdline by the user (or any networking config installed via the coreos-installer --copy-network-config path). Note though Afterburn does not write rd.neednet=1.
  • Services like ignition-fetch.service can separately request networking via /etc/cmdline.d/50-ignition-neednet, which only contains rd.neednet=1. And in the future, any other initrd service that may need networking can do the same thing as Ignition.
@cgwalters
Copy link
Member

I'm overall fine with this.

However, one thing I'd like to investigate at some point is whether we actually need to default to DHCP on platforms where metadata comes from the link local address - which is almost all the important cloud providers.

If all we need to do in the initramfs is bring up "the" NIC enough to fetch that, that would allow us to uniformly support encoding network config in Ignition.

@dustymabe
Copy link
Member

However, one thing I'd like to investigate at some point is whether we actually need to default to DHCP on platforms where metadata comes from the link local address - which is almost all the important cloud providers.

As part of the proposal it includes:

2. we allow platform-specific network configs to be injected (e.g. https://github.com/coreos/afterburn/pull/379)

Which means each platform can have it's own default so we could do something clever.

If all we need to do in the initramfs is bring up "the" NIC enough to fetch that, that would allow us to uniformly support encoding network config in Ignition.

Right now I don't think we're currently differentiating between initramfs networking and real root networking. i.e. link local may be enough for initramfs networking to grab an ignition config from the provider but not for real root. Though, another thing to think about is that your ignition config could have remote references which would need more than link local networking.

Overall I think it's just safer to bring it all the way up to the point you can resolve hostnames and curl.

@jlebon
Copy link
Member Author

jlebon commented Apr 21, 2020

Which means each platform can have it's own default so we could do something clever.

It's a bit trickier than that though, because e.g. Ignition might be able to fetch the config over link-local, but still needs full networking to fetch remote resources specified in the config. And we only have a single synchronization point for yes/no to full networking.

So I think this is something like: on supported platforms, we always bring up networking enough for link-local. If Ignition needs full networking, it can request it. This is what this bit is about:

Whether to only attempt fetches which can be performed offline. This currently only includes the "data" scheme. Other schemes will result in ErrNeedNet. In the future, we can improve on this by dropping this and just making sure that we canonicalize all "insufficient network"-related errors to ErrNeedNet. That way, distro integrators could distinguish between "partial" and full network bring-up.

@dustymabe
Copy link
Member

I had a chat with some of the openstack provisioning folks today (aka OpenShift IPI). Out of that I had a specific question:

  • Will the work for this issue mean that we won't attempt to bring up the network if an openstack config drive is provided that has no remote references?

@jlebon
Copy link
Member Author

jlebon commented Jun 11, 2020

Will the work for this issue mean that we won't attempt to bring up the network if an openstack config drive is provided that has no remote references?

Short answer: yes.

Long answer: yes, but if possible, I would strongly advise using the metal image instead and the new coreos-installer to inject the Ignition config. While all the images are just one transform step away from each other right now, that may not always be the case (see e.g. coreos/fedora-coreos-config#407). I think we want to reserve the right to change that.

Also, the OpenStack Ignition provider in specific is not great because it has to query for both config drives and the metadata server in parallel, and we've established that in general it's not a good idea to have this sort of timeout (see a lot of discussions around this in coreos/ignition#928).

Your question though made me realize that I need to adapt the Ignition OpenStack provider code in light of the fetch-offline work so that we don't just error out if the metadata server fails, but signal neednet.

jlebon added a commit to jlebon/fedora-coreos-tracker that referenced this issue Dec 2, 2020
This documents the design in coreos#460 with some more implementation details.
This came up in discussions today while talking about coreos#689, so let's
write it down somewhere so it's easier to reference in the future.

Closes: coreos#460
@jlebon
Copy link
Member Author

jlebon commented Dec 3, 2020

This is done now:

See also #691.

jlebon added a commit that referenced this issue Dec 3, 2020
This documents the design in #460 with some more implementation details.
This came up in discussions today while talking about #689, so let's
write it down somewhere so it's easier to reference in the future.

Closes: #460
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants