Skip to content

Commit

Permalink
Introduce kubelet-dependencies.target and firstboot-osupdate.target
Browse files Browse the repository at this point in the history
The primary motivation here is to stop pulling
container images `Before=network-online.target` because it
creates complicated dependency loops.

This is aiming to fix
https://issues.redhat.com/browse/OCPBUGS-15087

A lot of our services are "explicitly coupled" with ordering
relationships; e.g. some had `Before=kubelet.service` but not
`Before=crio.service`.

systemd .target units are explicitly designed for this situation.

We introduce a new `kubelet-dependencies.target` - both `crio.service`
and `kubelet.service` are `After+Requires=kubelet-dependencies.target`.
And units which are needed for kubelet should now be both
`Before + RequiredBy=kubelet-dependencies.target`.

Similarly, we had a lot of entangling of the "node services"
and the firstboot OS updates, with things explicitly ordering
against `machine-config-daemon-pull.service` or poking into
the implementation details of the firstboot process with
`ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json`.

Create a new `firstboot-osupdate.target` that succeds after the
`machine-config-daemon-firstboot.service` today.  Then most of the
"infrastructure workload" that must run only on the second boot
(such as `gcp-hostname.service`, `openshift-azure-routes.path` etc)
can cleanly order after that.

This also aids with the coming work for bare metal installs to do
OS udpates at install time, because then we will "finalize" the OS
update and continue booting.
  • Loading branch information
cgwalters committed Oct 5, 2023
1 parent 30c28f3 commit 2141f4b
Show file tree
Hide file tree
Showing 27 changed files with 79 additions and 66 deletions.
5 changes: 5 additions & 0 deletions templates/common/_base/units/crio.service.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
name: crio.service
dropins:
- name: 05-mco-ordering.conf
contents: |
[Unit]
After=kubelet-dependencies.target
Requires=kubelet-dependencies.target
- name: 10-mco-default-madv.conf
contents: |
[Service]
Expand Down
11 changes: 11 additions & 0 deletions templates/common/_base/units/firstboot-osupdate.target
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: firstboot-osupdate.target
enabled: true
contents: |
[Unit]
Description=The firstboot OS update has completed
Documentation=https://github.com/openshift/machine-config-operator/
Requires=basic.target

[Install]
WantedBy=default.target

Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ contents: |
[Unit]
Description=Dynamically sets the system reserved for the kubelet
Wants=network-online.target
After=network-online.target ignition-firstboot-complete.service
Before=kubelet.service crio.service
After=network-online.target firstboot-osupdate.target
Before=kubelet-dependencies.target
[Service]
# Need oneshot to delay kubelet
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=/etc/node-sizing-enabled.env
ExecStart=/bin/bash /usr/local/sbin/dynamic-system-reserved-calc.sh ${NODE_SIZING_ENABLED} ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU} ${SYSTEM_RESERVED_ES}
[Install]
RequiredBy=kubelet.service
RequiredBy=kubelet-dependencies.target
8 changes: 8 additions & 0 deletions templates/common/_base/units/kubelet-dependencies.target.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: kubelet-dependencies.target
contents: |
[Unit]
Description=Dependencies necessary to run kubelet
Documentation=https://github.com/openshift/machine-config-operator/
Requires=basic.target network-online.target
Wants=NetworkManager-wait-online.service crio-wipe.service
Wants=rpc-statd.service
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ contents: |
# Removal of this file signals firstboot completion
ConditionPathExists=/etc/ignition-machine-config-encapsulated.json
After=machine-config-daemon-pull.service
Before=network-online.target crio.service kubelet.service ovs-configuration.service
Before=kubelet-dependencies.target
[Service]
Type=oneshot
Expand All @@ -23,5 +23,4 @@ contents: |
{{end -}}
[Install]
WantedBy=network-online.target
RequiredBy=crio.service kubelet.service ovs-configuration.service
RequiredBy=firstboot-osupdate.target
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ contents: |
ConditionPathExists=/etc/ignition-machine-config-encapsulated.json
# Run after crio-wipe so the pulled MCD image is protected against a corrupted storage from a forced shutdown
Wants=crio-wipe.service NetworkManager-wait-online.service
After=crio-wipe.service NetworkManager-wait-online.service network.service nodeip-configuration.service
After=crio-wipe.service NetworkManager-wait-online.service network.service
[Service]
Type=oneshot
Expand Down
4 changes: 2 additions & 2 deletions templates/common/_base/units/mtu-migration.service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ contents: |
Requires=openvswitch.service ovs-configuration.service
Wants=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service openvswitch.service network.service ovs-configuration.service
Before=network-online.target kubelet.service crio.service node-valid-hostname.service
Before=kubelet-dependencies.target node-valid-hostname.service
[Service]
# Need oneshot to delay kubelet
Expand All @@ -27,6 +27,6 @@ contents: |
StandardError=journal+console
[Install]
WantedBy=network-online.target
WantedBy=kubelet-dependencies.target
{{ end }}{{ end }}
7 changes: 3 additions & 4 deletions templates/common/_base/units/node-valid-hostname.service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ enabled: true
contents: |
[Unit]
Description=Wait for a non-localhost hostname
Before=network-online.target
Before=kubelet-dependencies.target
[Service]
Type=oneshot
Expand All @@ -15,7 +15,6 @@ contents: |
TimeoutSec=300
[Install]
WantedBy=multi-user.target
# Ensure that network-online.target will not complete until the node has a non-localhost hostname.
RequiredBy=network-online.target
# TODO: Change this to RequiredBy after we fix https://github.com/openshift/machine-config-operator/pull/3865#issuecomment-1746963115
WantedBy=kubelet-dependencies.target
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ enabled: {{if eq .Infra.Status.PlatformStatus.Type "None"}}true{{else}}false{{en
contents: |
[Unit]
Description=Writes IP address configuration so that kubelet and crio services select a valid node IP
Wants=NetworkManager-wait-online.service crio-wipe.service
After=NetworkManager-wait-online.service ignition-firstboot-complete.service crio-wipe.service
Before=kubelet.service crio.service ovs-configuration.service
Wants=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service firstboot-osupdate.target
Before=kubelet-dependencies.target ovs-configuration.service
[Service]
# Need oneshot to delay kubelet
Expand Down Expand Up @@ -44,4 +44,4 @@ contents: |
EnvironmentFile=-/etc/default/nodeip-configuration
[Install]
RequiredBy=kubelet.service
RequiredBy=kubelet-dependencies.target
7 changes: 3 additions & 4 deletions templates/common/_base/units/ovs-configuration.service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,12 @@ enabled: {{if eq .NetworkType "OVNKubernetes" "OpenShiftSDN"}}true{{else}}false{
contents: |
[Unit]
Description=Configures OVS with proper host networking configuration
# Removal of this file signals firstboot completion
ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json
# This service is used to move a physical NIC into OVS and reconfigure OVS to use the host IP
Requires=openvswitch.service
Wants=NetworkManager-wait-online.service
After=firstboot-osupdate.target
After=NetworkManager-wait-online.service openvswitch.service network.service nodeip-configuration.service
Before=network-online.target kubelet.service crio.service node-valid-hostname.service
Before=kubelet-dependencies.target node-valid-hostname.service
[Service]
# Need oneshot to delay kubelet
Expand All @@ -19,4 +18,4 @@ contents: |
StandardError=journal+console
[Install]
WantedBy=network-online.target
RequiredBy=kubelet-dependencies.target
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ contents: |
Description=Fetch the region and isntance id from Alibaba Metadata
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before kubelet
Before=kubelet.service
Before=kubelet-dependencies.target
[Service]
ExecStart=/usr/local/bin/alibaba-kubelet-nodename
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ contents: |
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before kubelet
Before=kubelet.service
Before=kubelet-dependencies.target
[Service]
# Mark afterburn environment file optional, due to it is possible that afterburn service was not executed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ contents: |
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before kubelet
Before=kubelet.service
Before=kubelet-dependencies.target
[Service]
# Mark afterburn environment file optional, due to it is possible that afterburn service was not executed
Expand Down
9 changes: 3 additions & 6 deletions templates/common/gcp/units/gcp-hostname.service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@ enabled: true
contents: |
[Unit]
Description=Set GCP Transient Hostname
# Removal of this file signals firstboot completion
ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json
# Block services relying on networking being up.
Before=network-online.target
# We don't need to do this on the firstboot
After=firstboot-osupdate.target
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before hostname checks
Expand All @@ -18,5 +16,4 @@ contents: |
ExecStart=/usr/local/bin/mco-hostname --gcp
[Install]
WantedBy=multi-user.target
WantedBy=network-online.target
RequiredBy=kubelet-dependencies.target
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ contents: |
# This only applies to VIP managing environments where the kubelet and crio IP
# address picking logic is flawed and may end up selecting an address from a
# different subnet or a deprecated address
Wants=NetworkManager-wait-online.service crio-wipe.service
After=NetworkManager-wait-online.service ignition-firstboot-complete.service crio-wipe.service
Before=kubelet.service crio.service ovs-configuration.service
Wants=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service firstboot-osupdate.target
Before=kubelet-dependencies.target ovs-configuration.service
[Service]
# Need oneshot to delay kubelet
Expand Down Expand Up @@ -47,4 +47,4 @@ contents: |
ExecStartPost=+/usr/local/bin/configure-ip-forwarding.sh
[Install]
WantedBy=multi-user.target
RequiredBy=kubelet-dependencies.target
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ contents: |
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before kubelet
Before=kubelet.service
Before=kubelet-dependencies.target
[Service]
ExecStart=/usr/local/bin/openstack-kubelet-nodename
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ enabled: {{if eq (len (onPremPlatformAPIServerInternalIPs .)) 0}}true{{else}}fal
contents: |
[Unit]
Description=Writes IP address configuration so that kubelet and crio services select a valid node IP
Wants=NetworkManager-wait-online.service crio-wipe.service
After=NetworkManager-wait-online.service ignition-firstboot-complete.service crio-wipe.service
Before=kubelet.service crio.service ovs-configuration.service
Wants=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service firstboot-osupdate.target
Before=kubelet-dependencies.target ovs-configuration.service
[Service]
# Need oneshot to delay kubelet
Expand Down Expand Up @@ -54,4 +54,4 @@ contents: |
EnvironmentFile=-/etc/default/nodeip-configuration
[Install]
RequiredBy=kubelet.service
RequiredBy=kubelet-dependencies.target
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ contents: |
ConditionVirtualization=vmware
Before=kubelet.service
Before=kubelet-dependencies.target
Before=node-valid-hostname.service
Before=NetworkManager.service
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ enabled: true
contents: |
[Unit]
Description=Watch for downfile changes
Before=kubelet.service
ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json
Before=kubelet-dependencies.service
After=firstboot-osupdate.target
[Path]
PathChanged=/run/cloud-routes/
MakeDirectory=true
[Install]
RequiredBy=kubelet.service
RequiredBy=kubelet-dependencies.service
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@ enabled: true
contents: |
[Unit]
Description=Watch for downfile changes
Before=kubelet.service
ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json
Before=kubelet-dependencies.target
[Path]
PathChanged=/run/cloud-routes/
MakeDirectory=true
[Install]
RequiredBy=kubelet.service
RequiredBy=kubelet-dependencies.target
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ enabled: false
contents: |
[Unit]
Description=Work around Azure load balancer hairpin
# We don't need to do this on the firstboot
After=firstboot-osupdate.target
[Service]
Type=simple
ExecStart=/bin/bash /opt/libexec/openshift-azure-routes.sh start
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ enabled: true
contents: |
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
Requires=crio.service kubelet-dependencies.target
After=kubelet-dependencies.target
After=ostree-finalize-staged.service
[Service]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ enabled: true
contents: |
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
Requires=crio.service kubelet-dependencies.target
After=kubelet-dependencies.target
After=ostree-finalize-staged.service
[Service]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ enabled: true
contents: |
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
Requires=crio.service kubelet-dependencies.target
After=kubelet-dependencies.target
After=ostree-finalize-staged.service
[Service]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ enabled: true
contents: |
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
Requires=crio.service kubelet-dependencies.target
After=kubelet-dependencies.target
After=ostree-finalize-staged.service
[Service]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ enabled: true
contents: |
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
Requires=crio.service kubelet-dependencies.target
After=kubelet-dependencies.target
After=ostree-finalize-staged.service
[Service]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ enabled: true
contents: |
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
Requires=crio.service kubelet-dependencies.target
After=kubelet-dependencies.target
After=ostree-finalize-staged.service
[Service]
Expand Down

0 comments on commit 2141f4b

Please sign in to comment.