Failing tests for 0.4 and 1.0 #5952

fabriziopandini · 2022-01-19T11:20:10Z

The following E2E tests for 0.4 seems to be constantly failing:

https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23
https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-23-latest

/kind failing-test
/milestone v0.4
/kind release-blocking
/area testing

sbueringer · 2022-01-19T11:45:08Z

Based on KCP controller logs it looks like the CoreDNS migration lib we're using only supports older CoreDNS versions: https://storage.googleapis.com/kubernetes-jenkins/logs/periodic-cluster-api-e2e-workload-upgrade-1-22-1-23-release-0-4/1483442871951429632/artifacts/clusters/bootstrap/controllers/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-744575bddc-8jfdj/manager.log

Let's:

use older CoreDNS versions compatible with KCP in our e2e tests
document which CoreDNS versions we support / we're testing in the book

killianmuldoon · 2022-01-19T11:56:29Z

/assign

Should the doc change be only for the 0.4 section of the book? I'll create a PR into the test/infra now to change the upgrade target to 1.8.4.

sbueringer · 2022-01-19T12:02:22Z

@killianmuldoon Do you know which versions are supported by the CoreDNS lib?

Some context about the upgrade test:

we're creating the cluster with kubeadm of the source version (i.e. the CoreDNS version of the source version is deployed)
during the upgrade we then have the jobs configured to upgrade to the CoreDNS version which the kubeadm of the target version is using

We should now change the jobs that the CoreDNS version is as close as possible to that target kubeadm CoreDNS version (depending on the supported versions by the CoreDNS migration lib). Worst case is that the "target" CoreDNS version (the one configured in the job) is the same as the one the source kubeadm is using.

An example:

kubeadm 1.22 is installing CoreDNS v1.8.4 (https://github.com/kubernetes/kubernetes/blob/release-1.22/cmd/kubeadm/app/constants/constants.go)
kubeadm 1.23 is installing CoreDNS v1.8.6 (https://github.com/kubernetes/kubernetes/blob/release-1.23/cmd/kubeadm/app/constants/constants.go#L343)

Usually we would upgrade to v1.8.6 in the upgrade test. Depending on the supported range of the CoreDNS lib we should upgrade to v1.8.5 or v1.8.4 instead. We should never try to downgrade CoreDNS.

sbueringer · 2022-01-19T12:04:27Z

We currently assume that the CoreDNS migration lib doesn't fail when a cluster is created with CoreDNS v1.8.6 (1.23=>latest job) and CoreDNS is "upgraded" to v1.8.6 because then the migration shouldn't be executed.

killianmuldoon · 2022-01-19T12:07:32Z

The supported version are here: https://github.com/coredns/corefile-migration/blob/v1.0.13/migration/versions.go

For the given version of the library we're using in CAPI:
0.3: v1.0.12 (max 1.8.4)
0.4: v1.0.12 (max 1.8.4)
1.0: v1.0.13 (max 1.8.5)
1.1: v1.0.14 (max 1.8.6)

sbueringer · 2022-01-19T12:08:47Z

Some more data

kubeadm => CoreDNS versions
- 1.21 => v1.8.0
- 1.22 => v1.8.4
- 1.23 => v1.8.6
- latest => v1.8.6

Okay I guess now we only have to calculate the maximum supported version for each job and we have it :)

sbueringer · 2022-01-19T12:10:08Z

Btw v1.0 jobs are failing too as expected based on the data: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23

killianmuldoon · 2022-01-19T12:11:44Z

Yeah - I'm updating that to 1.8.5

sbueringer · 2022-01-19T12:12:10Z

We currently assume that the CoreDNS migration lib doesn't fail when a cluster is created with CoreDNS v1.8.6 (1.23=>latest job) and CoreDNS is "upgraded" to v1.8.6 because then the migration shouldn't be executed.

That assumption was wrong, even v1.8.6 => v1.8.6 fails: https://storage.googleapis.com/kubernetes-jenkins/logs/periodic-cluster-api-e2e-workload-upgrade-1-23-latest-release-1-0/1483442872060481536/artifacts/clusters/bootstrap/controllers/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-744575bddc-qwdqp/manager.log

In my opinion we can and should fix this in KCP (and backport it), so that KCP only runs the migration tool if the CoreDNS version actually changes. Otherwise we have a strict upper limit which CoreDNS version KCP can manage, even if KCP doesn't even have to migrate CoreDNS configuration files.

killianmuldoon · 2022-01-19T12:14:43Z

Agreed - that should be fixed for sure.

killianmuldoon · 2022-01-19T12:18:41Z

I'll take a look at the KCP fix too if that's alright with you.

fabriziopandini · 2022-01-19T12:37:01Z

+1 to the change in KCP to make it possible to upgrade to the same version (I assume this is not only in the webhook, but also in the upgrade logic)

WRT to doc, let's document kubernetes version/default CoreDNS version and CAPI versioni/CoreDNS ranges in https://cluster-api.sigs.k8s.io/reference/versions.html#kubeadm-control-plane-provider-kubeadm-control-plane-controller

sbueringer · 2022-01-19T13:53:06Z

/retitle Failing tests for 0.4 and 1.0

sbueringer · 2022-01-20T16:44:18Z

Short update. CAPI v0.4 & v1.0 updates from 1.22=>1.23 are green again. 1.23=>latest will be fixed by improving KCP as proposed above.

killianmuldoon · 2022-01-20T16:50:24Z

/assign

I'm going to tackle the KCP part. Good to see yesterday's change worked!

sbueringer · 2022-01-27T15:17:01Z

/reopen
until we merged the cherry-picks for v1.1, v1.0 and v0.4

k8s-ci-robot · 2022-01-27T15:17:16Z

@sbueringer: Reopened this issue.

In response to this:

/reopen
until we merged the cherry-picks for v1.1, v1.0 and v0.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2022-02-01T17:49:58Z

/close

PR has been cherry-picked to all releases and testgrid looks good.

k8s-ci-robot · 2022-02-01T17:50:14Z

@sbueringer: Closing this issue.

In response to this:

/close

PR has been cherry-picked to all releases and testgrid looks good.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Jan 19, 2022

k8s-ci-robot added this to the v0.4 milestone Jan 19, 2022

k8s-ci-robot added kind/release-blocking Issues or PRs that need to be closed before the next CAPI release area/testing Issues or PRs related to testing labels Jan 19, 2022

k8s-ci-robot assigned killianmuldoon Jan 19, 2022

killianmuldoon mentioned this issue Jan 19, 2022

fix broken coredns upgrade in tests for Cluster API 0.4 and 1.0 kubernetes/test-infra#24923

Merged

k8s-ci-robot changed the title ~~Failing tests for 0.4~~ Failing tests for 0.4 and 1.0 Jan 19, 2022

k8s-ci-robot closed this as completed in #5986 Jan 27, 2022

k8s-ci-robot reopened this Jan 27, 2022

sbueringer mentioned this issue Jan 27, 2022

🐛 Allow KCP to Update when CoreDNS version doesn't change #6011

Merged

k8s-ci-robot closed this as completed Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing tests for 0.4 and 1.0 #5952

Failing tests for 0.4 and 1.0 #5952

fabriziopandini commented Jan 19, 2022

sbueringer commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022

sbueringer commented Jan 19, 2022 •

edited

Loading

sbueringer commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022 •

edited

Loading

sbueringer commented Jan 19, 2022 •

edited

Loading

sbueringer commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022

sbueringer commented Jan 19, 2022 •

edited

Loading

killianmuldoon commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022

fabriziopandini commented Jan 19, 2022

sbueringer commented Jan 19, 2022

sbueringer commented Jan 20, 2022 •

edited

Loading

killianmuldoon commented Jan 20, 2022

sbueringer commented Jan 27, 2022

k8s-ci-robot commented Jan 27, 2022

sbueringer commented Feb 1, 2022

k8s-ci-robot commented Feb 1, 2022

Failing tests for 0.4 and 1.0 #5952

Failing tests for 0.4 and 1.0 #5952

Comments

fabriziopandini commented Jan 19, 2022

sbueringer commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022

sbueringer commented Jan 19, 2022 • edited Loading

sbueringer commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022 • edited Loading

sbueringer commented Jan 19, 2022 • edited Loading

sbueringer commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022

sbueringer commented Jan 19, 2022 • edited Loading

killianmuldoon commented Jan 19, 2022

killianmuldoon commented Jan 19, 2022

fabriziopandini commented Jan 19, 2022

sbueringer commented Jan 19, 2022

sbueringer commented Jan 20, 2022 • edited Loading

killianmuldoon commented Jan 20, 2022

sbueringer commented Jan 27, 2022

k8s-ci-robot commented Jan 27, 2022

sbueringer commented Feb 1, 2022

k8s-ci-robot commented Feb 1, 2022

sbueringer commented Jan 19, 2022 •

edited

Loading

killianmuldoon commented Jan 19, 2022 •

edited

Loading

sbueringer commented Jan 19, 2022 •

edited

Loading

sbueringer commented Jan 19, 2022 •

edited

Loading

sbueringer commented Jan 20, 2022 •

edited

Loading