Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clusterctl upgrade fails due to cert-manager CRDs #6674

Closed
MaxRink opened this issue Jun 20, 2022 · 11 comments · Fixed by #6749
Closed

Clusterctl upgrade fails due to cert-manager CRDs #6674

MaxRink opened this issue Jun 20, 2022 · 11 comments · Fixed by #6749
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@MaxRink
Copy link
Contributor

MaxRink commented Jun 20, 2022

What steps did you take and what happened:
When trying to upgrade our clusters to the latest capi version ( 1.1.4 ) we noticed that the upgrade fails

clusterctl upgrade apply --contract v1beta1
Checking cert-manager version...
Deleting cert-manager Version="v1.5.3"
Installing cert-manager Version="v1.7.2"
Error: action failed after 10 attempts: failed to update cert-manager component apiextensions.k8s.io/v1, Kind=CustomResourceDefinition, /certificaterequests.cert-manager.io: CustomResourceDefinition.apiextensions.k8s.io "certificaterequests.cert-manager.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha2": must appear in spec.versions

After closer inspection that seems to be an issue caused by #6432 , as v1.6 got skipped.
1.6 marked alpha2 as not served, so only after applying 1.6 crds the upgrade to 1.7 is possible
cert-manager/cert-manager@c6896b2

What did you expect to happen:
Clusterctl correctly upgrades cert-manager

Environment:

  • Cluster-api version:
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 20, 2022
@chrischdi
Copy link
Member

Hey,

I was not able to reproduce the issue and was able to upgrade cert-manager successfully from v1.5.3 to v1.7.2 via clusterctl:

$ clusterctl-v1.1.4 upgrade apply --contract v1beta1
Checking cert-manager version...
Deleting cert-manager Version="v1.5.3"
Installing cert-manager Version="v1.7.2"
Waiting for cert-manager to be available...
Performing upgrade...

could you please provide some more information?

  • Kubernetes version: (use kubectl version):
  • Cluster-api version: (maybe output of kubectl get providers -A)
  • Do you know if cert-manager was installed manually or via clusterctl?

@MaxRink
Copy link
Contributor Author

MaxRink commented Jun 20, 2022

The k8s version is 1.21.11, capi is now current (as clusterctl 1.1.3 still works)

kubectl get providers -A
NAMESPACE                           NAME                     TYPE                     PROVIDER      VERSION   WATCH NAMESPACE
capi-kubeadm-bootstrap-system       bootstrap-kubeadm        BootstrapProvider        kubeadm       v1.1.4    
capi-kubeadm-control-plane-system   control-plane-kubeadm    ControlPlaneProvider     kubeadm       v1.1.4    
capi-system                         cluster-api              CoreProvider             cluster-api   v1.1.4    
capv-system                         infrastructure-vsphere   InfrastructureProvider   vsphere       v1.2.0    

and cert-manager is clusterctl managed

@chrischdi
Copy link
Member

chrischdi commented Jun 20, 2022

I think this is unrelated to "skipping cert-manager v1.6".

These APIs are no longer served in cert-manager 1.6 and are fully removed in cert-manager 1.7. If you have a cert-manager installation that is using or has previously used these deprecated APIs you might need to upgrade your cert-manager custom resources and CRDs. This should be done before upgrading to cert-manager 1.6 or later. [0]

The root cause for me seems to be that there are still resources stored in the old v1alpha2 version (which is shown by the error message).

These stored versions were deprecated already in cert-manager v1.4. They also provide a guide how to migrate: https://cert-manager.io/docs/installation/upgrading/remove-deprecated-apis/#upgrading-existing-cert-manager-resources

Could you try to go through this guide and retry the upgrade using clusterctl v1.1.4 so cert-manager gets upgraded to v1.7.2?

I think this may happen to more users of cluster-api, especially ones which migrated over from older versions (including older cert-managers). Because of that we may have to at least add some more information to our docs.

@MaxRink
Copy link
Contributor Author

MaxRink commented Jun 20, 2022

 cmctl upgrade migrate-api-version

did the trick, and yeah, i guess a lot of people will run into that, if they have old mgmt clusters and upgrade to 1.1.4 or later

@sbueringer
Copy link
Member

sbueringer commented Jun 21, 2022

Interesting issue.

Some observations:

  • the last time we deployed v1alpha2 resources (at least in the core CAPI provider) was v0.3.x
  • CertificateRequests is a resource created by cert-manager (as part of reconciling Certificate afaik)

@MaxRink from which version did you upgrade?

I would assume for v0.3 => (v0.4 | v1.0.x) => v1.1.x it should be fine as I would expect old CertificateRequests to be deleted when we re-create the Certificates during clusterctl upgrade.

For v0.3=>v1.1.x I would expect this to be always an issue, but somehow our e2e test v0.3=>v1.1 works:

@chrischdi
Copy link
Member

Our tests use cert-manager verison v1.1.0 -> v1.7.2 in this test and the storageVersion in cert-manager crds for v1.1.0 was already v1.

Because of that there is never a resource created having storageVersion < v1 which leads to the CRDs not having an old storageVersion set in status.

I assume as soon as the cluster did use cert-manager version < 1.1 , the storageVersion contains an entry for an old version (at latest v1beta1, because v1.1 introduced the v1 CRDs and did set a storageVersion the first time).

I would propose to add a warning at the end of https://cluster-api.sigs.k8s.io/clusterctl/commands/upgrade.html at least for cert-manager including a link to the upstream docs for the workaround?

@MaxRink
Copy link
Contributor Author

MaxRink commented Jun 21, 2022

@sbueringer The mgmt cluster has been around since capi alpha3, got upgraded to alpha4 and now beta1.
And it seems like cert-manager did not upgrade CRs on its own

@sbueringer
Copy link
Member

sbueringer commented Jun 21, 2022

And it seems like cert-manager did not upgrade CRs on its own

Yup. It's just that clusterctl upgrade deletes/creates all Certificates during an upgrade.

I would propose to add a warning at the end of https://cluster-api.sigs.k8s.io/clusterctl/commands/upgrade.html at least for cert-manager including a link to the upstream docs for the workaround?

I think that makes sense, even if we don't 100% know in which cases this issue occurs / doesn't occur.

@chrischdi
Copy link
Member

chrischdi commented Jun 22, 2022

Issue should be reproducible for cluster-api users which started using CAPI <= v0.3.14 and are now upgrading to v1.1.4 (using clusterctl v1.1.4).

CAPI v1.1.4 includes cert-manager v1.7.2 which is the first version where the old CRDs storageVersions got removed.

To be verified via https://github.com/kubernetes-sigs/cluster-api/pull/6699/files

@fabriziopandini
Copy link
Member

/assign

@sbueringer
Copy link
Member

sbueringer commented Jun 27, 2022

We implemented a CRD migration in #6749. It works but there are some exceptions.

The logic is:

If one of the status.storedVersions should be dropped from spec during a clusterctl upgrade we migrate CRs and then drop the version from status.storedVersions before upgrading the CRD.

This works, but there is the fundamental limitation that a storage version cannot be immediately dropped.
Concrete: If v1alpha2 is the storage version, there first has to be another version which becomes storage version before v1alpha2 can be dropped. But this is a limitation of CRDs in general not of our migration code.

A version overview:

1. cert-manager v0.11.0: v1alpha2 (storage)

ClusterAPI v0.3.4 - v0.3.8 => cert-manager v0.11.0

2. cert-manager v0.16.1: v1alpha2 (storage), v1alpha3, v1beta1

ClusterAPI v0.3.9 - v0.3.14 => cert-manager v0.16.1

3. cert-manager v1.0.0 => v1.6.x: v1alpha2, v1alpha3, v1beta1, v1 (storage)

ClusterAPI v0.3.15 - => cert-manager v1.1.0
ClusterAPI v0.4.0 => cert-manager v1.1.0
ClusterAPI v0.4.1 => cert-manager v1.4.0
ClusterAPI v0.4.2 - v0.4.3 => cert-manager v1.5.0
ClusterAPI v0.4.4 - => cert-manager v1.5.3
ClusterAPI v1.0.0 - => cert-manager v1.5.3
ClusterAPI v1.1.0 - v1.1.3 => cert-manager v1.5.3

4. cert-manager >= v1.7.0 v1 (storage)

ClusterAPI v1.1.4 => cert-manager v1.7.2
ClusterAPI v1.2.0 => cert-manager v1.8.2

The limitation above means:

  • It is possible to migrate from ClusterAPI v0.3.4-v0.3.14 to v0.3.15-v1.1.3 and then later to >= v1.1.4
  • It is not possible to directly migrate from v0.3.4-v0.3.14 to >= v1.1.4 as this would mean the storage version v1alpha2 is directly dropped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants