-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KCP >= v1.2.8 and >= v1.3.0 doesn't work with certain Kubernetes versions #7833
Comments
cc @fabriziopandini @ykakarap @killianmuldoon @CecileRobertMichon @jackfrancis @furkatgofurov7 @oscr @Ankitasw @chrischdi @lentzi90 (just cc'ed everyone from the old issue in case this one is relevant for you as well) Please let me know if you saw errors cases not covered above. |
/triage accepted |
/assign |
PR has been merged. Now waiting for CI then merging cherry-picks as well: |
One further follow-up is this PR: #7914 Essentially:
|
Another follow-up is #7915 We found out during triaging that we have an unexpected double-rollout in our Cluster upgrade test (more details in the linked issue) |
The last - known - remaining edge case is the following. Example:
So tl;dr concurrently joining a Machine with a version using the old registry while upgrading KCP to a version with the new registry will lead to a failed kubeadm join of that machine. We won't fix this edge case because:
|
Focusing on this, it is recommended to avoid bumping the KCP version while KCP is not stable (e.g. a Machine is in the progress of joining. With ClusterClass, we try to avoid this by waiting for KCP to be stable before triggering upgrades. This is correctly implemented on the Cluster topology controller side, but unfortunately through a stale cache it can happen that the KCP controller writes inconsistent status information to a KCP object. I.e. the KCP object looks stable but in fact it isn't. We're currently trying to figure out how to improve patchHelper or how we call patchHelper to fix this issue. Stay tuned! :) |
Let's close this issue as the issue itself is resolved and we had no further reports. We'll follow-up in another issue regarding the patch helper. /close |
@sbueringer: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
As discussed in #7768 the new KCP versions don't work with all Kubernetes / kubeadm versions. This issue only occurs if
KCP.spec.kubeadmConfigSpec.clusterConfiguration.imageRepository
is one of "", "k8s.gcr.io", "registry.k8s.io" (i.e. custom registries are not affected)Current behavior
kubeadm:
kubeadm init
uses its embedded default registry and uploads it in the workload cluster in thekubeadm-config
ConfigMapkubeadm join
uses the registry from thekubeadm-config
ConfigMapkubeadm init
andkubeadm join
run preflight checks which verify if relevant images (including Core DNS) can be pulled (both on control plane and worker machines)imageRepository
in thekubeadm-config
ConfigMap (either.dns.imageRepository
or as fallback.imageRepository
) is equal to the default registry embedded inkubeadm
kubeadm willuse
<registry>/coredns
as imageRepository for the CoreDNS image (i.e. it will pull from e.g.registry.k8s.io/coredns/coredns
)KCP:
kubeadm-config
ConfigMap toregistry.k8s.io
for Kubernetes>= 1.22.0
and< 1.26.0
(except an imageRepository has been explicitly setin
KCP.spec.kubeadmConfigSpec.clusterConfiguration
)CAPD:
kubeadm
preflight errorsError cases
Pinning to the wrong default registry (occurred in CAPD: Cluster API v1.1 upgrade v1.23.15 => v1.24.9)
(job)
Explanation:
kubeadm
v1.23.15
andv1.24.9
use the new registry as default.k8s.gcr.io
inKCP.spec.kubeadmConfigSpec.clusterConfiguration
k8s.gcr.io
®istry.k8s.io
) which is not the default registry of thekubeadm
binary is not supportedkubeadm init
did not use<registry>/coredns
as imageRepository for CoreDNS thus the CoreDNS Deployment had thek8s.gcr.io/coredns:v1.8.6
image which doesn'texist (
<registry>>/coredns/coredns:v1.8.6
would have been correct).kubeadm init
would have already failed, but that didn't happen because CAPD is ignoring allkubeadm
preflight errors.Solution:
kubeadm
binary used. In general if the default registry should be used it is recommended to not set the imageRepository in KCP andkubeadm
/ KCP will take care of it.Upgrade to a Kubernetes version >= v1.22.0 which still has a kubeadm binary with the old default registry
Example: Upgrade from
v1.21.14
tov1.22.16
(imageRepository is not set inKCP.spec.kubeadmConfigSpec.clusterConfiguration
)Explanation:
v1.22.16
node is joinedkubeadm
v1.21.14
andv1.22.16
use the old registry as default.kubeadm init
uses the embeddedk8s.gcr.io
imageRepository and uploads it to thekubeadm-config
ConfigMapv1.22.16
KCP will migrate the registry in the ConfigMap toregistry.k8s.io
kubeadm joins
will fail with preflight error because thekubeadm
binary only handles the CoreDNS imageRepository for thek8s.gcr.io
registry correctly.Solution:
imageRepository
fieldNotes:
Current state: Compatibility of KCP >= v1.2.8 & >= v1.3.0 with Kubernetes / kubeadm
tl;dr KCP is broken for all
v1.22.x
,v1.23.x
andv1.24.x
kubeadm versions which have the old default registry.The error occurs whenever a new Machine should be joined after KCP sets the new registry in the
kubeadm-config
ConfigMap (which is done whenever a rollout is needed, Kubernetes upgrade is just one case).Background information
Kubeadm default registries:
registry.k8s.io
:>= v1.22.17
,>= v1.23.15
,>= v1.24.9
,>= v1.25.0
k8s.gcr.io
: all older kubeadm versionsCoreDNS images available: (ignoring all versions < v1.6.0)
k8s.gcr.io/coredns
®istry.k8s.io/coredns
: 1.6.2, 1.6.5, 1.6.6, 1.6.7, 1.7.0k8s.gcr.io/coredns/coredns
®istry.k8s.io/coredns/coredns
: v1.6.6 v1.6.7 v1.6.9 v1.7.0 v1.7.1 v1.8.0 v1.8.3 v1.8.4 v1.8.5 v1.8.6 v1.9.3 v1.9.4 v1.10.0/kind bug
The text was updated successfully, but these errors were encountered: