-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing tests for 0.4 and 1.0 #5952
Comments
Based on KCP controller logs it looks like the CoreDNS migration lib we're using only supports older CoreDNS versions: https://storage.googleapis.com/kubernetes-jenkins/logs/periodic-cluster-api-e2e-workload-upgrade-1-22-1-23-release-0-4/1483442871951429632/artifacts/clusters/bootstrap/controllers/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-744575bddc-8jfdj/manager.log Let's:
|
/assign Should the doc change be only for the 0.4 section of the book? I'll create a PR into the test/infra now to change the upgrade target to 1.8.4. |
@killianmuldoon Do you know which versions are supported by the CoreDNS lib? Some context about the upgrade test:
We should now change the jobs that the CoreDNS version is as close as possible to that target kubeadm CoreDNS version (depending on the supported versions by the CoreDNS migration lib). Worst case is that the "target" CoreDNS version (the one configured in the job) is the same as the one the source kubeadm is using. An example:
Usually we would upgrade to v1.8.6 in the upgrade test. Depending on the supported range of the CoreDNS lib we should upgrade to v1.8.5 or v1.8.4 instead. We should never try to downgrade CoreDNS. |
We currently assume that the CoreDNS migration lib doesn't fail when a cluster is created with CoreDNS v1.8.6 (1.23=>latest job) and CoreDNS is "upgraded" to v1.8.6 because then the migration shouldn't be executed. |
The supported version are here: https://github.com/coredns/corefile-migration/blob/v1.0.13/migration/versions.go For the given version of the library we're using in CAPI: |
Some more data
Okay I guess now we only have to calculate the maximum supported version for each job and we have it :) |
Btw v1.0 jobs are failing too as expected based on the data: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23 |
Yeah - I'm updating that to 1.8.5 |
That assumption was wrong, even v1.8.6 => v1.8.6 fails: https://storage.googleapis.com/kubernetes-jenkins/logs/periodic-cluster-api-e2e-workload-upgrade-1-23-latest-release-1-0/1483442872060481536/artifacts/clusters/bootstrap/controllers/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-744575bddc-qwdqp/manager.log In my opinion we can and should fix this in KCP (and backport it), so that KCP only runs the migration tool if the CoreDNS version actually changes. Otherwise we have a strict upper limit which CoreDNS version KCP can manage, even if KCP doesn't even have to migrate CoreDNS configuration files. |
Agreed - that should be fixed for sure. |
I'll take a look at the KCP fix too if that's alright with you. |
+1 to the change in KCP to make it possible to upgrade to the same version (I assume this is not only in the webhook, but also in the upgrade logic) WRT to doc, let's document kubernetes version/default CoreDNS version and CAPI versioni/CoreDNS ranges in https://cluster-api.sigs.k8s.io/reference/versions.html#kubeadm-control-plane-provider-kubeadm-control-plane-controller |
/retitle Failing tests for 0.4 and 1.0 |
Short update. CAPI v0.4 & v1.0 updates from 1.22=>1.23 are green again. 1.23=>latest will be fixed by improving KCP as proposed above. |
/assign I'm going to tackle the KCP part. Good to see yesterday's change worked! |
/reopen |
@sbueringer: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/close PR has been cherry-picked to all releases and testgrid looks good. |
@sbueringer: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The following E2E tests for 0.4 seems to be constantly failing:
https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23
https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-23-latest
/kind failing-test
/milestone v0.4
/kind release-blocking
/area testing
The text was updated successfully, but these errors were encountered: