-
Notifications
You must be signed in to change notification settings - Fork 27
[spec2x] Persist the first-boot hostname if not otherwise set #156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this service supposed to be CoreOS specific? If not I'd suggest renaming coreos-persist-hostname.service to ignition-persist-hostname.service..
The We already have another unit with the |
|
||
# Only trigger when the hostname is NOT A default. | ||
ConditionHost=!localhost | ||
ConditionHost=!localhost.localdomain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why a separate unit versus doing this as part of the existing logic for coreos-teardown-initramfs-network.service
?
Which currently only triggers effectively when the kernel cmdline has a non-default ip=
- i.e. basically a static IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separate unit made the logic simplifier, e.g. ConditionHost=localhost
and because this also applies to the DHCP situation. For RHCOS we don't want transient names to be used -- the firstboot hostname should be persisted. This ensures that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we really sure this applies to DHCP too? That scopes this code into running in e.g. AWS/GCP etc. as well as the baremetal environments that use DHCP.
Now, I don't think it's at all supported to change the hostnames returned from DHCP for OpenShift/Kubernetes today. So, it's probably not going to hurt to persist it and thereafter ignore what DHCP says, but...no one has been asking us to do that either - the filed bug was about bonding specified on the kernel cmdline. The bugs in DHCP cases just boiled down to kubelet needing to start after NM had finished DHCP.
One thing to bear in mind here is that Kubernetes also requires the hostname to be routable - which implies it has to be correct in the infrastructure and not just the OS. Hence, source-of-truth is the infrastructure.
For DHCP cases, there's already a node-external source of truth that people know how to manage.
For cases where the IP address (and hostname) are provided on the kernel cmdline...well, source of truth may need to be updated in two places (infra and hosts) if any IP addresses need to change. The core bug we're fixing here is ensuring that we persist that kernel cmdline across reboots (but we could also fix it by parsing the kernel commandline in the real root on subsequent boots).
The way I'd been thinking of this is that the argument for why this is RHCOS specific is that in the future in FCOS (and ideally RHCOS) with NM-in-initrd, it would parse the kernel arguments and handle this itself, correctly transferring state across the initrd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we really sure this applies to DHCP too?
Better question: do we want to allow changing the host via DHCP for OCP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we all agree that doesn't work. The question is more - does fixing the hostname for DHCP help anything either?
Now, I know for sure that if we had shipped this patch from the start it would have completely avoided the need for the kubelet waiting on NM patch. But, we are on track to ship that patch long before this one.
I see the argument here, but the counter I think is:
|
@darkmuggle I think that Ignition requirement is in direct conflict with expectations in the DHCP and DNS cases. Here the source of truth is external to the host, and for OpenShift we must have the host agree with the external source of truth about the hostname. If the hostname is locked on install, and the external source of truth changes but the node doesn't follow that change, we have a problem. For DHCP your hostname may change any time the lease is renewed or re-acquired. In practice this doesn't happen often but is certainly a possibility. Same for DNS; that's entirely out of control of the host. For OpenShift, if a node has a hostname out of sync with DHCP/DNS, kubelet will post that hostname as the node name, and other components in the cluster will try to resolve that hostname and fail. The workaround is to use IP addresses but that requires specific configuration on each node and is pretty ugly. TLDR; my uninformed opinion is that it should work as such:
|
@cgwalters we worked hard to make that happen like 10 years ago, IIRC we needed to make xauth not care about hostname by hardcoding localhost or something like that. |
To be 100% clear I am not opposed to shipping this as is. I just would feel a lot more comfortable scoping things to only fixing the reported bug (static IP on kernel cmdline). It's a lot simpler to explain from an architectural PoV. |
Okay, I've repushed this to deal just with the The rationale for dropping the holistic approach makes sense. Reducing this to scope of asserted identity to The problem that I have with the use of transient names for OCP is simple: it breaks certificates and makes node identity external to the cluster. One of the selling points of OCP is that the cluster is self-driving, but we make the node identity external to the cluster. |
Yeah. I've mentioned a few times I want to support tying identity to TPM2 devices, but that isn't universal and will never be. We also have the core problem that the control plane is provisioned via Terraform and we don't have mechanisms yet to "adopt" that into being owned by machineAPI (where applicable). And further, there will always be non-machineAPI platforms (manual bare metal etc.) where there just aren't APIs to control the infrastructure. So this doesn't seem likely to change really ever, but I am hopeful we can say in some cases we support doing something better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from one comment.
Did you test this?
Okay, re-pushed with handling for all the weird edge-cases. |
dracut/30ignition/persist-ifcfg.sh
Outdated
# defining the IP on the commandline. | ||
hpath="/sysroot/etc/hostname" | ||
hname=$(< /proc/sys/kernel/hostname) | ||
for iface in $(ls /sys/class/net) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to skip this loop if hpath
exists already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed up.
dracut/30ignition/persist-ifcfg.sh
Outdated
if [ "${iface_hostname}" == "${hname}" ] && [ ! -f "${hpath}" ]; then | ||
echo "${hname}" > "${hpath}" | ||
echo -ne '/etc/hostname/\0' >> /sysroot/etc/selinux/ignition.relabel | ||
info "persisting hostname set by kargs for ${iface}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info "persisting hostname set by kargs for ${iface}" | |
info "persisting hostname '${hname}' set by kargs for ${iface}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final suggestion, but LGTM as is too!
dracut/30ignition/persist-ifcfg.sh
Outdated
# Ensure that the hostname used by the kernel is the same one | ||
# used by this interface. This guard is intended to protect against | ||
# mixed `ip=` or where one is set via dhcp. | ||
if [ "${iface_hostname}" == "${hname}" ] && [ ! -f "${hpath}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, would be cleaner to do the [ ! -f "${hpath}" ]
check upfront so we can avoid entering the loop entirely, but not a blocker!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was trying to avoid the nested if's, and figured the cost of the loop was trivial. I agree, though.
For RHCOS, if an admin uses the legacy kargs to set the hostname, then we need to persist the name into /sysroot. This change persist the hostname ONLY if the hostname originated from `ip=...<HOSTNAME>`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a forward port of 1f8184e (coreos#156). If the admin specified a hostname in the ip= karg static networking config then we'll want to make sure that gets applied to the real root (persistently) as well. Ideally in the future there will be better support for this in NetworkManager itself as the `network-legacy` dracut module did at least provide more support for setting the hostname via ip= kargs than `network-manager` currently does. The discussion about this problem is in [1]. The fix for that will most likely implicate changes to the propagation code introduced here. Fixes: coreos/fedora-coreos-tracker#466 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
This is a forward port of 1f8184e (coreos#156). If the admin specified a hostname in the ip= karg static networking config then we'll want to make sure that gets applied to the real root (persistently) as well. Ideally in the future there will be better support for this in NetworkManager itself as the `network-legacy` dracut module did at least provide more support for setting the hostname via ip= kargs than `network-manager` currently does. The discussion about this problem is in [1]. The fix for that will most likely implicate changes to the propagation code introduced here. Fixes: coreos/fedora-coreos-tracker#466 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
This is a forward port of 1f8184e (coreos#156). If the admin specified a hostname in the ip= karg static networking config and no hostname was specified via Ignition then we'll make sure the hostname from the ip= karg gets applied to the real root (persistently). Ideally in the future there will be better support for this in NetworkManager itself as the `network-legacy` dracut module did at least provide more support for setting the hostname via ip= kargs than `network-manager` currently does. The discussion about this problem is in [1]. The fix for that will most likely implicate changes to the propagation code introduced here. Fixes: coreos/fedora-coreos-tracker#466 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
This is a forward port of 1f8184e (coreos#156). If the admin specified a hostname in the ip= karg static networking config and no hostname was specified via Ignition then we'll make sure the hostname from the ip= karg gets applied to the real root (persistently). Ideally in the future there will be better support for this in NetworkManager itself as the `network-legacy` dracut module did at least provide more support for setting the hostname via ip= kargs than `network-manager` currently does. The discussion about this problem is in [1]. The fix for that will most likely implicate changes to the propagation code introduced here. Fixes: coreos/fedora-coreos-tracker#466 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
This is a forward port of 1f8184e (coreos#156). If the admin specified a hostname in the ip= karg static networking config and no hostname was specified via Ignition then we'll make sure the hostname from the ip= karg gets applied to the real root (persistently). Ideally in the future there will be better support for this in NetworkManager itself as the `network-legacy` dracut module did at least provide more support for setting the hostname via ip= kargs than `network-manager` currently does. The discussion about this problem is in [1]. The fix for that will most likely implicate changes to the propagation code introduced here. Fixes: coreos/fedora-coreos-tracker#466 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
This is a forward port of 1f8184e (coreos#156). If the admin specified a hostname in the ip= karg static networking config and no hostname was specified via Ignition then we'll make sure the hostname from the ip= karg gets applied to the real root (persistently). Ideally in the future there will be better support for this in NetworkManager itself as the `network-legacy` dracut module did at least provide more support for setting the hostname via ip= kargs than `network-manager` currently does. The discussion about this problem is in [1]. The fix for that will most likely implicate changes to the propagation code introduced here. Fixes: coreos/fedora-coreos-tracker#466 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
This resolves the problem where
ip=...
defines a hostname but the hostname is not persisted.