Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OKD 4.4 Baremetal UPI Install: FCOS does not persist its hostname after reboot #466

Closed
cgruver opened this issue Apr 18, 2020 · 6 comments
Closed

Comments

@cgruver
Copy link

cgruver commented Apr 18, 2020

I opened this in https://github.com/openshift/okd for visibility to folks working on OKD 4.4

okd-project/okd#153

OKD version: 4.4.0-0.okd-2020-04-07-175212
OS Version: Fedora CoreOS 31.20200406.20.0

Cluster installed as Baremetal UPI

FCOS installed with iPXE and fixed IP using kernel params:

ip=10.11.11.71::10.11.11.1:255.255.255.0:okd4-worker-1.my.domain.org:eth0:none

The cluster installed and ran just fine.

I shutdown the cluster for transport of the lab machines:

  1. Cordon the workers:
    for i in 0 1 2 ; do oc adm cordon okd4-worker-${I}.my.domain.org ; done
    for i in 0 1 2 ; do oc adm drain okd4-worker-${I}.my.domain.org --ignore-daemonsets --force --grace-period=60 --delete-local-data; done
  2. Shutdown the workers
  3. Shutdown the masters

When the master nodes were restarted. They all came up with their hostname set to localhost. They were obviously unable to join the cluster since etcd did not know them.

I was able to recover the cluster by correcting the hostname on each master and worker node:

for i in 0 1 2 ; do ssh core@okd4-master-${I}.my.domain.org "sudo hostnamectl set-hostname okd4-master-${I}.my.domain.org && sudo shutdown -r now"; done
for i in 0 1 2 ; do ssh core@okd4-worker-${I}.my.domain.org "sudo hostnamectl set-hostname okd4-worker-${I}.my.domain.org && sudo shutdown -r now"; done
@lucab
Copy link
Contributor

lucab commented Apr 22, 2020

It looks like this a symptom, where the underlying root cause is https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419.

@dustymabe dustymabe added the jira for syncing to jira label Apr 22, 2020
dustymabe added a commit to dustymabe/ignition-dracut that referenced this issue Apr 23, 2020
This is a forward port of 1f8184e (coreos#156). If the admin specified
a hostname in the ip= karg static networking config then we'll
want to make sure that gets applied to the real root (persistently)
as well.

Ideally in the future there will be better support for this in
NetworkManager itself as the `network-legacy` dracut module did
at least provide more support for setting the hostname via ip=
kargs than `network-manager` currently does. The discussion
about this problem is in [1]. The fix for that will most likely
implicate changes to the propagation code introduced here.

Fixes: coreos/fedora-coreos-tracker#466

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
dustymabe added a commit to dustymabe/ignition-dracut that referenced this issue Apr 23, 2020
This is a forward port of 1f8184e (coreos#156). If the admin specified
a hostname in the ip= karg static networking config then we'll
want to make sure that gets applied to the real root (persistently)
as well.

Ideally in the future there will be better support for this in
NetworkManager itself as the `network-legacy` dracut module did
at least provide more support for setting the hostname via ip=
kargs than `network-manager` currently does. The discussion
about this problem is in [1]. The fix for that will most likely
implicate changes to the propagation code introduced here.

Fixes: coreos/fedora-coreos-tracker#466

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
dustymabe added a commit to dustymabe/ignition-dracut that referenced this issue Apr 23, 2020
This is a forward port of 1f8184e (coreos#156). If the admin specified
a hostname in the ip= karg static networking config and no hostname
was specified via Ignition then we'll make sure the hostname from
the ip= karg gets applied to the real root (persistently).

Ideally in the future there will be better support for this in
NetworkManager itself as the `network-legacy` dracut module did
at least provide more support for setting the hostname via ip=
kargs than `network-manager` currently does. The discussion
about this problem is in [1]. The fix for that will most likely
implicate changes to the propagation code introduced here.

Fixes: coreos/fedora-coreos-tracker#466

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
dustymabe added a commit to dustymabe/ignition-dracut that referenced this issue Apr 23, 2020
This is a forward port of 1f8184e (coreos#156). If the admin specified
a hostname in the ip= karg static networking config and no hostname
was specified via Ignition then we'll make sure the hostname from
the ip= karg gets applied to the real root (persistently).

Ideally in the future there will be better support for this in
NetworkManager itself as the `network-legacy` dracut module did
at least provide more support for setting the hostname via ip=
kargs than `network-manager` currently does. The discussion
about this problem is in [1]. The fix for that will most likely
implicate changes to the propagation code introduced here.

Fixes: coreos/fedora-coreos-tracker#466

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
@dustymabe
Copy link
Member

Proposed fix in: coreos/ignition-dracut#174

dustymabe added a commit to dustymabe/ignition-dracut that referenced this issue Apr 23, 2020
This is a forward port of 1f8184e (coreos#156). If the admin specified
a hostname in the ip= karg static networking config and no hostname
was specified via Ignition then we'll make sure the hostname from
the ip= karg gets applied to the real root (persistently).

Ideally in the future there will be better support for this in
NetworkManager itself as the `network-legacy` dracut module did
at least provide more support for setting the hostname via ip=
kargs than `network-manager` currently does. The discussion
about this problem is in [1]. The fix for that will most likely
implicate changes to the propagation code introduced here.

Fixes: coreos/fedora-coreos-tracker#466

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/419
@dustymabe
Copy link
Member

The fix for this landed upstream. It needs a pull+rebuild of the ignition rpm (open PR). It is now pending a testing stream release.

@dustymabe dustymabe added the status/pending-testing-release Fixed upstream. Waiting on a testing release. label Apr 26, 2020
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Apr 27, 2020
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Apr 28, 2020
cgwalters pushed a commit to coreos/fedora-coreos-config that referenced this issue May 8, 2020
@dustymabe
Copy link
Member

The fix for this went into testing stream release 31.20200505.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels May 8, 2020
@dustymabe
Copy link
Member

The fix for this went into stable stream release 31.20200505.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label May 20, 2020
@cgruver
Copy link
Author

cgruver commented May 24, 2020

I verified this with a new OKD 4.4 build.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants