Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure updates #1950

Merged
merged 4 commits into from
Apr 25, 2024
Merged

azure updates #1950

merged 4 commits into from
Apr 25, 2024

Conversation

jepio
Copy link
Member

@jepio jepio commented Apr 22, 2024

azure updates

  • refresh waagent patch to fix error that happens when deprovisioning
  • add compat symlink to old oem waagent.conf path, because some test system of the waagent team depends on this
  • fully disable interface restarting. this just causes slowdowns and test flakiness on flatcar, but we configure hostname from initrd so we don't need this.
  • add azure-nvme-utils to base OS, for the next instance generation with os and temp disk on NVMe. this can't be in a sysext because udev rules

How to use

[ describe what reviewers need to do in order to validate this PR ]

Testing done

Testing here: http://jenkins.infra.kinvolk.io:8080/job/container/job/test/22112/cldsv/

[Describe the testing you have done before submitting this PR. Please include both the commands you issued as well as the output you got.]

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

@jepio jepio requested a review from a team April 22, 2024 16:23
jepio added 4 commits April 24, 2024 16:03
When CoreosCommonUtil was factored out, we missed updating the class
name in a call to super(). This results in an error when executing
`/usr/sbin/waagent -force -deprovision+user`. Fix the class name.

Create a compatibility symlink at the old config file location
(/usr/share/oem/waagent.conf) to handle the case of enabling
auto-updates on the agent. The upstream version of the agent does not
have our downstream patch so doesn't know about the updated config file
location. We should upstream our changes.

Signed-off-by: Jeremi Piotrowski <[email protected]>
Flatcar prepares /etc/hostname from the initrd through afterburn. After
switching root, systemd-networkd fetches a dhcp lease with the correct
hostname already. This publishes the hostname to the vnet DNS server as
well. When WALinuxAgent starts, it tries to do the same steps: configure
the hostname, bounce the link to force dhcp lease renewal. This has
caused issues in the past with multi-nic configurations but also with
networked services that are trying to use the network (etcd/flanneld).

The link bouncing by WALinuxAgent is not necessary because of Flatcars
boot design, so return without bouncing the link. Tested that DNS from
other VMs in the same vnet works.

Signed-off-by: Jeremi Piotrowski <[email protected]>
This is a new package that is being developed to handle providing
symlinks for nvme disks (os,data,temporary) on newer Azure instances.
It needs to be part of the OS, and not oem-azure, because it carries
udev rules.

Signed-off-by: Jeremi Piotrowski <[email protected]>
Copy link

github-actions bot commented Apr 24, 2024

@jepio jepio merged commit 0d40f3c into main Apr 25, 2024
7 checks passed
@jepio jepio deleted the jepio/azure-fixes branch April 25, 2024 13:34
@jepio
Copy link
Member Author

jepio commented May 3, 2024

@krnowak @pothos I don't have time to undo this, but I would revert this: 9556c7f
This would break the vm changing hostname, which is common on Azure.

@pothos
Copy link
Member

pothos commented May 3, 2024

This would break the vm changing hostname, which is common on Azure.

Do you mean live changing through the Azure API without a reboot? Sounds like previously this resulted in a transient hostname being set, i.e., /etc/hostname would still use the old one?

Not sure if it's good to revert because preventing the network restart was also meant as bug fix, or? What we could try is to see whether we can trigger a new DHCP request instead of the if-up-down hack. My naive thinking is that the DHCP request would keep the same IP address and be a no-op except that it transfers the hostname to systemd-networkd. I guess networkctl renew IFNAME or networkctl forcerenew IFNAME should work.

@pothos
Copy link
Member

pothos commented May 3, 2024

Do you mean live changing through the Azure API without a reboot? Sounds like previously this resulted in a transient hostname being set, i.e., /etc/hostname would still use the old one?

Looks like it's the other way round, changing internally and waagent noticed this and issues the if-up-down to propagate to the read-only VM properties.

I guess networkctl renew IFNAME or networkctl forcerenew IFNAME should work.

Apparently they do not but networkctl reconfigure IFNAME worked. We should change the patch to use this.

@pothos
Copy link
Member

pothos commented May 3, 2024

Opened a PR #1978

@pothos
Copy link
Member

pothos commented May 3, 2024

Different topic: For the deprovisioning to work despite Upholds we could use a flag file in /run/ that if exists prevents the service to start (Should be created before the systemctl stop).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants