Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic CA rotation in CAPI #7721

Open
furkatgofurov7 opened this issue Dec 12, 2022 · 11 comments
Open

Automatic CA rotation in CAPI #7721

furkatgofurov7 opened this issue Dec 12, 2022 · 11 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@furkatgofurov7
Copy link
Member

furkatgofurov7 commented Dec 12, 2022

User Story

As a [developer/user/operator] I would like to rotate a k8s cluster CA which involves many steps and restarts (rolling upgrade) of pods and updates on other resources (config maps, secrets, service accounts) which is manual: https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/
With CAPI and the ability to deploy many target clusters from a management cluster, I am looking for available options to do the CA rotation at scale (manual operation on each cluster will be very costly). So, it would be interesting to know how the community is addressing this issue. Are there any external open-source tools that could be used to tackle this challenge?

Detailed Description
There are also some cases in which the CA of the target clusters might be different from that of the management cluster.

Some use cases:

  • Deploy of management cluster and many target clusters with the same CA. Perform the cluster CA rotation on the target clusters and the management clusters without impact on traffic
  • Deploy of management cluster and many target clusters with different CA. Perform the cluster CA rotation on the target clusters and the management clusters without impact on traffic

[A clear and concise description of what you want to happen.]
Possible ways to do CA rotation at scale with CAPI built-in support would be ideal

Anything else you would like to add:
Checked the automatic cert rotation for control plane machines only introduced in #6983 which essentially tackles the part of the original issue on certificate management in #5490

[Miscellaneous information that will assist in solving the issue.]

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 12, 2022
@furkatgofurov7
Copy link
Member Author

Tagging folks who were involved in the referenced issues/PRs and see how we can move with this issue
/cc @fabriziopandini @ykakarap @sbueringer

@fabriziopandini
Copy link
Member

/triage accepted
I think this will require a proposal...

Just as a historical note, this was one of the use cases for which we discussed the idea of a kubeadm operator, which never caught traction.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 12, 2022
@furkatgofurov7
Copy link
Member Author

I think this will require a proposal...

Just as a historical note, this was one of the use cases for which we discussed the idea of a kubeadm operator, which never caught traction.

@fabriziopandini thanks, have not heard about the kubeadm operator before, will look around for that (or if you could share any references would be also great) to grasp the initial idea

@furkatgofurov7
Copy link
Member Author

Oh found it, maybe this one: https://hackmd.io/@QlB2bmbhS-aeuDlwOCH9Yw/HkidAVXlS

@furkatgofurov7
Copy link
Member Author

furkatgofurov7 commented Dec 12, 2022

Found out that there is a wider interest on this and #7044 is also gathering the use cases related to this problem

@furkatgofurov7
Copy link
Member Author

cc @smoshiur1237

@Zhupku
Copy link

Zhupku commented Sep 4, 2023

Hi @furkatgofurov7 and @fabriziopandini,

I'm developing a product based on CAPI, I would like to leverage the ability of CAPI to rotate CA.

Based on the above discussion, I think the feature is not available right now?
As far as I can see, rotate CA seems an important feature. May I know what is the reason why this feature is blocked?
I have investigated native k8s support CA rotation. Is there any technical blocker of CAPI implementation of CA rotation?

As I'm new to this project, could you please give me some help?

Thanks

@fabriziopandini
Copy link
Member

You are correct, this is an important feature, and unfortunately, it is not yet available right now.
However, nothing blocks you or someone else from working on this topic, which IMO requires a proposal where we describe how to do something similar to https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/ while respecting CAPI constraints (e.g. immutability / no direct access to the machines).

This requires some research...

@BarthV
Copy link

BarthV commented Sep 22, 2023

Copy paste from my last Kubernetes slack's message :

So after spending days on this topic,
I've finally found the least awful way to rotate CA in a CAPI managed cluster.
This method relies on 3 phases machine rollout, including a "big-bang" control-planes live cert renew.

Phase 1 :

  • edit CAPI cluster CA secret with the old pubCA + new pubCA (in this order)
  • ensure controller-manager still signs using old pubCA (and not the bundle)
  • edit cluster's cluster-info ConfigMap with both pubCA (this is used by kubeadm join)
  • rollout all machines & pods running inside the cluster

Phase 2 "bigbang" (on all CP nodes at the same time) :

  • pause CAPI cluster reconciliation
  • ssh on all control-plane nodes
  • replace ca.key with the new one, edit ca.crt bundle with newCA + oldCA (in this order) , and don't forget about ctrl-mgr singing cert too (newCA)
  • kubeadm certs renew [admin.conf apiserver apiserver-kubelet-client controller-manager.conf front-proxy-client scheduler.conf]
    crictl stop & restart all control plane containers

Phase 3 :

  • edit CAPI cluster CA secret with the new pubCA & new CAkey only
  • update CAPI cluster kubeconfig secret with new admin.conf kubeconfig content
  • edit cluster cluster-info configMap to only expose new pubCA
  • resume CAPI cluster reconciliation
  • rollout restart all Worker machines
  • rollout restart all CP machines

And tadaaam ... it works.

this is still very "manual" (I really hate SSH & remote actions) but we're facing here multiple capi & kubeadm limitations. They are preventing us to automate this CA rollout propelly 1 node at a time.

@fabriziopandini
Copy link
Member

It will be great to document this in the book...

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 12, 2024
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants