-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-679: MCO-830: Managed boot images MVP #4083
MCO-679: MCO-830: Managed boot images MVP #4083
Conversation
This vendors in: - informers and listers for the machine api objects - stream-metadata-go for coreos stream objects
Skipping CI for Draft Pull Request. |
ae4735d
to
e6696b6
Compare
@djoshy: This pull request references MCO-679 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
7f9e952
to
3408c6c
Compare
3408c6c
to
bf9193a
Compare
/hold for QE verification |
verification $ oc get infrastructures/cluster -o jsonpath='{.status.platform}'
GCP check boot image info in MCD log $ oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-wsqv5 | grep -A6 'CoreOS aleph version'
I1220 07:26:08.690917 2258 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
{
"build": "415.92.202311241643-0",
"imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
"ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
"ref": ""
} do pre-upgrade snapshot for configmap/coreos-bootimages oc get cm coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_pre_upgrade.yaml upgrade cluster to the image built based on this PR $ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-1byts4t/release:latest --force --allow-explicit-upgrade
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requested update to release image registry.build05.ci.openshift.org/ci-ln-1byts4t/release:latest
$ oc get clusterversion version -o yaml | yq '.status.history[]|[.version, .state]'
[
"4.15.0-0.ci.test-2023-12-20-055728-ci-ln-1byts4t-latest",
"Completed"
]
[
"4.15.0-0.nightly-2023-12-19-033450",
"Completed"
] do post-upgrade snapshot for configmap/coreos-bootimages and do diff b/w snapshots $ oc get cm coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_post_upgrade.yaml
$ diff /tmp/coreos-bootimages_pre_upgrade.yaml /tmp/coreos-bootimages_post_upgrade.yaml
2a3,4
> MCOReleaseImageVersion: 4.15.0-0.ci.test-2023-12-20-055728-ci-ln-1byts4t-latest
> MCOVersionHash: 239869f9c3f7befd579a23f7b453e523a7084dd2
944c946
< resourceVersion: "845"
---
> resourceVersion: "64197" make sure featureGate $ oc get featuregate/cluster -o jsonpath='{.status.featureGates[?(@.version=="4.15.0-0.ci.test-2023-12-20-055728-ci-ln-1byts4t-latest")].disabled[?(@.name=="ManagedBootImages")]}'
{"name":"ManagedBootImages"} enable featureGate $ cat ~/mco_test/mc/featuregate_techpreview.yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
name: cluster
spec:
featureSet: TechPreviewNoUpgrade
$ oc apply -f ~/mco_test/mc/featuregate_techpreview.yaml
Warning: resource featuregates/cluster is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by oc apply. oc apply should only be used on resources created declaratively by either oc create --save-config or oc apply. The missing annotation will be patched automatically.
featuregate.config.openshift.io/cluster configured monitor MCC pod log to grep logs from $ oc logs -f -n openshift-machine-config-operator -c machine-config-controller machine-config-controller-85584647dc-dx5hv | grep machine_set_boot_image_controller.go
I1220 09:15:59.497802 1 machine_set_boot_image_controller.go:143] "FeatureGates changed" enabled=["AdminNetworkPolicy","AlibabaPlatform","AutomatedEtcdBackup","AzureWorkloadIdentity","BuildCSIVolumes","CSIDriverSharedResource","CloudDualStackNodeIPs","DNSNameResolver","DynamicResourceAllocation","ExternalCloudProvider","ExternalCloudProviderAzure","ExternalCloudProviderExternal","ExternalCloudProviderGCP","GCPClusterHostedDNS","GCPLabelsTags","GatewayAPI","InsightsConfigAPI","InstallAlternateInfrastructureAWS","MachineAPIProviderOpenStack","MachineConfigNodes","ManagedBootImages","MaxUnavailableStatefulSet","MetricsServer","MixedCPUsAllocation","NetworkLiveMigration","NodeSwap","OnClusterBuild","OpenShiftPodSecurityAdmission","PrivateHostedZoneAWS","RouteExternalCertificate","SignatureStores","SigstoreImageVerification","VSphereControlPlaneMachineSet","VSphereStaticIPs","ValidatingAdmissionPolicy"] disabled=["ClusterAPIInstall","DisableKubeletCloudCredentialProviders","EventedPLEG","MachineAPIOperatorDisableMachineHealthCheckController"]
I1220 09:15:59.497939 1 machine_set_boot_image_controller.go:156] Trigger a sync as this feature was turned on
I1220 09:15:59.719712 1 machine_set_boot_image_controller.go:272] configMap coreos-bootimages added, reconciling all machine sets
I1220 09:15:59.732187 1 machine_set_boot_image_controller.go:228] MachineSet rioliu-1220c-bz2gp-worker-a added
I1220 09:15:59.732206 1 machine_set_boot_image_controller.go:228] MachineSet rioliu-1220c-bz2gp-worker-b added
I1220 09:15:59.733350 1 machine_set_boot_image_controller.go:228] MachineSet rioliu-1220c-bz2gp-worker-c added
I1220 09:15:59.733379 1 machine_set_boot_image_controller.go:228] MachineSet rioliu-1220c-bz2gp-worker-f added
I1220 09:15:59.739239 1 machine_set_boot_image_controller.go:174] Starting MachineConfigController-MachineSetBootImageController
I1220 09:15:59.739355 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1220c-bz2gp-worker-a on GCP, with arch x86_64
I1220 09:15:59.741197 1 machine_set_boot_image_controller.go:405] No patching required for machineset rioliu-1220c-bz2gp-worker-a
I1220 09:15:59.741249 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1220c-bz2gp-worker-b on GCP, with arch x86_64
I1220 09:15:59.742440 1 machine_set_boot_image_controller.go:405] No patching required for machineset rioliu-1220c-bz2gp-worker-b
I1220 09:15:59.742482 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1220c-bz2gp-worker-c on GCP, with arch x86_64
I1220 09:15:59.743649 1 machine_set_boot_image_controller.go:405] No patching required for machineset rioliu-1220c-bz2gp-worker-c
I1220 09:15:59.743704 1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1220c-bz2gp-worker-f: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)
I1220 09:15:59.796857 1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1220c-bz2gp-worker-f: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)
I1220 09:15:59.807413 1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1220c-bz2gp-worker-f: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)
I1220 09:15:59.827610 1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1220c-bz2gp-worker-f: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)
I1220 09:15:59.882642 1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1220c-bz2gp-worker-f: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)
E1220 09:16:00.002887 1 machine_set_boot_image_controller.go:216] failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)
I1220 09:16:00.003003 1 machine_set_boot_image_controller.go:217] Dropping machineset "openshift-machine-api/rioliu-1220c-bz2gp-worker-f" out of the work queue: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>) checkout the updated boot image info from MCD log $ oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-t6dnd | grep -A6 'CoreOS aleph version'
I1220 09:09:52.338206 1510 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
{
"build": "415.92.202311241643-0",
"imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
"ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
"ref": ""
} No changes found. what's wrong /cc @sergiordlr @djoshy maybe I need to find a older nightly to test |
Thanks for the testing, Rio! Yes, you are correct, you will need to use an older nightly. There are no confimap changes in this case. The diff on the confimap right now only shows some keys that the MCO set up to avoid race conditions. Essentially, you need a change in this file in the installer repo to be different in the original nightly and new upgrade target: Another interesting thing I noticed is that a machine couldn't be found for the Machineset |
@djoshy I think this error is expected, because no machine provisioned by this machineset $ oc get machineset/rioliu-1220c-bz2gp-worker-f -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
rioliu-1220c-bz2gp-worker-f 0 0 3h47m Question: when the machineset is modified and scale up works, can we sync the boot image changes at that time. Open a bug to track OCPBUGS-25725 |
It looks like this PR last updated the boot images:
Yes, any time the machineset is modified, the boot images sync will happen again. I do think I can clean up the error to be less annoying though(it is not anything fatal right now) |
Per latest comment from https://issues.redhat.com/browse/OCPBUGS-25725, we have run into the issue of a machineset getting the older boot image when scaled up from 0 replicas for the very first time. Me and Jerry discussed this, and as this is under feature gate, we thought it best to fix this in separate PR against that bug after this merges. Is that ok with you, @sergiordlr @rioliu-rh ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first-pass and the general architecture makes sense, thanks for doing this David!
// Any machine from this slice will be of the same architecture, so grab the first one | ||
// Cycle through nodes, compare to annotations to find node with matching machine | ||
|
||
nodes, err := ctrl.nodeLister.List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already chatted about this, and I think our plan is to read it off the label first, then either the nodes here or from the control plane to match autoscaler, but I wonder if we can skip the node check altogether and just follow the same steps outlined there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We totally could, and I was debating that as I was reworking this. The labels were present in my regular 3+3 cluster today, so I think it may have been a newer change that I hadn't caught on to yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a PR which resolves this: #4088
This should also fix https://issues.redhat.com/browse/OCPBUGS-25725
// has completed an upgrade). This "stamp" is used by the machine set controller as a safety before | ||
// it updates boot images. | ||
|
||
func (optr *Operator) stampBootImagesCM(pool *mcfgv1.MachineConfigPool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is basically the same logic as the MCS served config, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the only difference being, here we gate it on 1 master node completing the update, while the MCS wait for at least 1 node in the pool to complete the update before serving it to new nodes.
Here's the actual commit for reference:
i'm ok, we only verify machineset pathing function in this PR |
2nd round testing $ oc logs -n openshift-machine-config-operator -c machine-config-daemon $(oc get pod -l k8s-app=machine-config-daemon -n openshift-machine-config-operator -ojsonpath='{.items[0].metadata.name}') | grep 'CoreOS aleph version'
I1221 04:58:14.425691 1476 coreos.go:54] CoreOS aleph version: mtime=2023-10-31 00:44:27.92 +0000 UTC build=415.92.202310310037-0 imgid=rhcos-415.92.202310310037-0-qemu.x86_64.qcow2 do pre-upgrade snapshot for cm/coreos-bootimages $ oc get cm/coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_pre_upgrade.yaml upgrade to CI image $ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-4xnxd1b/release:latest --force --allow-explicit-upgrade
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requested update to release image registry.build05.ci.openshift.org/ci-ln-4xnxd1b/release:latest
$ oc get clusterversion version -o yaml | yq '.status.history[]|[.version, .state]'
[
"4.15.0-0.ci.test-2023-12-21-012105-ci-ln-4xnxd1b-latest",
"Completed"
]
[
"4.15.0-ec.2",
"Completed"
] do post-upgrade snapshot for cm/coreos-bootimages, and do diff with pre-upgrade snapshot, boot image is updated $ oc get cm/coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_post_upgrade.yaml
$ cat /tmp/coreos-bootimages_pre_upgrade.yaml | yq -r '.data.stream' | jq '.architectures.x86_64.artifacts.gcp'
{
"release": "415.92.202310310037-0",
"formats": {
"tar.gz": {
"disk": {
"location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.15-9.2/builds/415.92.202310310037-0/x86_64/rhcos-415.92.202310310037-0-gcp.x86_64.tar.gz",
"sha256": "bb79cc15d8404e464610edd05219edbfcb8ec578f24cbf2a7a62f172203e86ff",
"uncompressed-sha256": "4f73296738173a253821696259ff388e8b16e2947cdce7db2e3b23e97ff14a4a"
}
}
}
}
$ cat /tmp/coreos-bootimages_post_upgrade.yaml | yq -r '.data.stream' | jq '.architectures.x86_64.artifacts.gcp'
{
"release": "415.92.202311241643-0",
"formats": {
"tar.gz": {
"disk": {
"location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.15-9.2/builds/415.92.202311241643-0/x86_64/rhcos-415.92.202311241643-0-gcp.x86_64.tar.gz",
"sha256": "c539cbb353ec8ff1ea7def11f20f5a38f3b1333c0173a810dee80c8ede703463",
"uncompressed-sha256": "4f58bbc44ea99af7047c77cfdd72b1d132f652aac0249b02aafcaeb5460dd27f"
}
}
}
} enable featureSet: $ oc get featuregate/cluster -o jsonpath='{.status.featureGates[?(@.version=="4.15.0-0.ci.test-2023-12-21-012105-ci-ln-4xnxd1b-latest")].enabled[?(@.name=="ManagedBootImages")]}'
{"name":"ManagedBootImages"} monitor MCC pod log I1221 06:41:15.781240 1 machine_set_boot_image_controller.go:156] Trigger a sync as this feature was turned on
I1221 06:41:15.781351 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1221c-h72r5-worker-a on GCP, with arch x86_64
I1221 06:41:15.783093 1 event.go:298] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-machine-config-operator", Name:"openshift-machine-config-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesModified' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AdminNetworkPolicy", "AlibabaPlatform", "AutomatedEtcdBackup", "AzureWorkloadIdentity", "BuildCSIVolumes", "CSIDriverSharedResource", "CloudDualStackNodeIPs", "DNSNameResolver", "DynamicResourceAllocation", "ExternalCloudProvider", "ExternalCloudProviderAzure", "ExternalCloudProviderExternal", "ExternalCloudProviderGCP", "GCPClusterHostedDNS", "GCPLabelsTags", "GatewayAPI", "InsightsConfigAPI", "InstallAlternateInfrastructureAWS", "MachineAPIProviderOpenStack", "MachineConfigNodes", "ManagedBootImages", "MaxUnavailableStatefulSet", "MetricsServer", "MixedCPUsAllocation", "NetworkLiveMigration", "NodeSwap", "OnClusterBuild", "OpenShiftPodSecurityAdmission", "PrivateHostedZoneAWS", "RouteExternalCertificate", "SignatureStores", "SigstoreImageVerification", "VSphereControlPlaneMachineSet", "VSphereStaticIPs", "ValidatingAdmissionPolicy"}, Disabled:[]v1.FeatureGateName{"ClusterAPIInstall", "DisableKubeletCloudCredentialProviders", "EventedPLEG", "MachineAPIOperatorDisableMachineHealthCheckController"}}
I1221 06:41:15.783234 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1221c-h72r5-worker-b on GCP, with arch x86_64
I1221 06:41:15.787995 1 machine_set_boot_image_controller.go:585] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I1221 06:41:15.788017 1 machine_set_boot_image_controller.go:586] Current image: projects/rhcos-cloud/global/images/rhcos-415-92-202310310037-0-gcp-x86-64
I1221 06:41:15.788226 1 machine_set_boot_image_controller.go:402] Patching machineset rioliu-1221c-h72r5-worker-b
I1221 06:41:15.805047 1 machine_set_boot_image_controller.go:585] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I1221 06:41:15.805072 1 machine_set_boot_image_controller.go:586] Current image: projects/rhcos-cloud/global/images/rhcos-415-92-202310310037-0-gcp-x86-64
I1221 06:41:15.820665 1 machine_set_boot_image_controller.go:402] Patching machineset rioliu-1221c-h72r5-worker-a
I1221 06:41:15.958282 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1221c-h72r5-worker-c on GCP, with arch x86_64
I1221 06:41:15.967055 1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1221c-h72r5-worker-f: failed to fetch architecture type of machineset rioliu-1221c-h72r5-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)
I1221 06:41:15.967131 1 machine_set_boot_image_controller.go:248] MachineSet rioliu-1221c-h72r5-worker-b updated, reconciling all machinesets
I1221 06:41:15.967146 1 machine_set_boot_image_controller.go:248] MachineSet rioliu-1221c-h72r5-worker-a updated, reconciling all machinesets
I1221 06:41:15.967197 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1221c-h72r5-worker-a on GCP, with arch x86_64
I1221 06:41:15.968450 1 machine_set_boot_image_controller.go:405] No patching required for machineset rioliu-1221c-h72r5-worker-a
I1221 06:41:15.968510 1 machine_set_boot_image_controller.go:560] Reconciling machineset rioliu-1221c-h72r5-worker-b on GCP, with arch x86_64
I1221 06:41:15.968863 1 machine_set_boot_image_controller.go:585] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I1221 06:41:15.968875 1 machine_set_boot_image_controller.go:586] Current image: projects/rhcos-cloud/global/images/rhcos-415-92-202310310037-0-gcp-x86-64
I1221 06:41:15.968914 1 machine_set_boot_image_controller.go:402] Patching machineset rioliu-1221c-h72r5-worker-c double check machinesets are patched with new boot image machineset.machine.openshift.io/rioliu-1221c-h72r5-worker-a
projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
machineset.machine.openshift.io/rioliu-1221c-h72r5-worker-b
projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
machineset.machine.openshift.io/rioliu-1221c-h72r5-worker-c
projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
machineset.machine.openshift.io/rioliu-1221c-h72r5-worker-f -- no ready replica
projects/rhcos-cloud/global/images/rhcos-415-92-202310310037-0-gcp-x86-64 EDIT: use new cluster to do following steps $ oc scale --replicas=2 machineset/rioliu-1221f-zs9zf-worker-a -n openshift-machine-api
machineset.machine.openshift.io/rioliu-1221f-zs9zf-worker-a scaled check boot image info on new provisioned node $ oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-lg8cm | grep -A6 'CoreOS aleph version'
I1221 12:08:08.708830 1504 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
{
"build": "415.92.202311241643-0",
"imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
"ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
"ref": ""
} |
/test unit |
/hold cancel |
@djoshy: This pull request references MCO-679 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
seems like Cluster-Bot does not support to build arm64 image. maybe we only can test this feature with arm64 nightly in post-merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for an MVP. Should be safe behind the featuregate and the remaining concerns are covered by the TODOs. Let's iterate on this
Thanks for working on this David!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/override ci/prow/e2e-gcp-op-single-node Overriding as failure is unrelated and this PR is under feature gate. The test is currently being investigated as it has been broken for some time |
@djoshy: Overrode contexts on behalf of djoshy: ci/prow/e2e-gcp-op-single-node In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
7dae64a
into
openshift:master
[ART PR BUILD NOTIFIER] This PR has been included in build openshift-proxy-pull-test-container-v4.16.0-202401082351.p0.g7dae64a.assembly.stream for distgit openshift-proxy-pull-test. |
@djoshy shall we backport this PR to 4.15 branch? we already verified the case for arm64 with 4.16 build. thanks @sergiordlr |
We don't typically backport features and this feature is only targeted for 4.16 as of now, so I don't think we need to (: |
This implements openshift/enhancements#1496, for the GCP platform.
What does it do?
Adds a new subcontroller
machine_set_boot_image_controller
to the MCC, thatcoreos-bootimages
configmap changesHow do I test this?
coreos-bootimages
configmap will get updated by a CVO manifest to the newest boot images.You should see all the machinesets being patched:
If this doesn't happen, or if logs say that no patching was required for the machinesets, then no skew was found between the boot images and the last upgrade.
Other notes:
Once the feature gate is removed, this reconciliation would take place when at least 1 master node has completed an upgrade(and no other upgrades are pending). This is by design, as one would expect boot image updates to take place during an upgrade.