diff --git a/keps/sig-cluster-lifecycle/kubeadm/4214-separate-super-user-kubeconfig/README.md b/keps/sig-cluster-lifecycle/kubeadm/4214-separate-super-user-kubeconfig/README.md new file mode 100644 index 00000000000..d1572e3d5e6 --- /dev/null +++ b/keps/sig-cluster-lifecycle/kubeadm/4214-separate-super-user-kubeconfig/README.md @@ -0,0 +1,729 @@ + +# KEP-4214: Separate super-user kubeconfig for kubeadm + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story: Resolving compromised admin credentials](#story-resolving-compromised-admin-credentials) + - [Story: Keeping the super-user credential in a safe place](#story-keeping-the-super-user-credential-in-a-safe-place) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Caveats](#caveats) + - [Risks and Mitigations](#risks-and-mitigations) + - [Risk: Implementation complexity during in-place upgrade](#risk-implementation-complexity-during-in-place-upgrade) + - [Risk: Implementation complexity during re-place upgrade](#risk-implementation-complexity-during-re-place-upgrade) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + - [Using a feature gate](#using-a-feature-gate) + - [Integration test vs e2e test](#integration-test-vs-e2e-test) + - [Signing individual kubeconfig files for control plane nodes](#signing-individual-kubeconfig-files-for-control-plane-nodes) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [x] (R) Design details are appropriately documented +- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [x] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +During the initial control plane node creation (`kubeadm init`) an `admin.conf` file is generated. +This file currently contains a +[`cluster-admin`](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles) +credential that is bound to the `system:masters` group. Create two separate files instead - +`admin.conf` containing a regular Kubernetes cluster-admin credential and a `super-admin.conf` +containing a cluster-admin credential bound to the `system:masters` group. + +## Motivation + + + +Binding an admin credential to the`system:masters` group means it can +[bypass RBAC](https://github.com/kubernetes/kubeadm/issues/2414#issue-836108390) - i.e. +its permissions cannot be removed. +At the time of writing this KEP, Kubernetes does not support +[certificate revocation](https://github.com/kubernetes/kubernetes/issues/18982). This means +the only way to revoke access of an admin credential bound to the `system:masters` group is to +[rotate the certificate authority](https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/) +of this cluster. + +A general purpose `admin.conf` must be created that does not bind to `system:masters`. +This credential can be shared by kubeadm deployed control plane nodes. In case this `admin.conf` +credential is compromised, its permission must be revocable with RBAC. + +A separate "break-glass", super-user credential can be managed in `super-admin.conf`. +This credential can be used to restore the cluster to a normal state in case of disruptive +admin user activity. + +Both files must not be shared to new admin users and instead new admin credentials should +be signed with the `kubeadm kubeconfig user` command. + +### Goals + + + +- Start managing a separate `super-admin.conf` that contains a super-user credential. +- Continue using a file named `admin.conf` for all its current kubeadm uses today. +- Improve the kubeadm documentation at the k8s.io website for creating +additional admin users. + +### Non-Goals + + + +- Do not sign the credentials in the `admin.conf` and `super-admin.conf` files +to be with expiration time of more than 1 year. + +## Proposal + + + +To preserve the existing behavior where `admin.conf` has full cluster access, +a new ClusterRoleBinding will be created called `kubeadm:cluster-admins`. +It will bind the ClusterRole `cluster-admin` to the `kubeadm:cluster-admins` Group. +The credential stored in `admin.conf` will have the following subject: +`O = kubeadm:cluster-admins, CN = kubernetes-admin`. In case of a compromised +credential, the ClusterRoleBinding `kubeadm:cluster-admins` can be removed or updated. +The Group `kubeadm:cluster-admins` is recommended for internal kubeadm use only. + +The new file `super-admin.conf` will contain the following subject: +`O = system:masters, CN = kubernetes-super-admin`. It will act as the "break-glass", +super-user credential that can bypass RBAC. In case this credential is compromised +the cluster certificate authority must be rotated. + +Both the `admin.conf` and `super-admin.conf` files will be renewable by `kubeadm upgrade` +and `kubeadm certs renew`. If the `super-admin.conf` is missing it will not cause an error. +That is in case the super-admin has manually moved the file to a safe location - +i.e. not keeping it on the primary control plane node, where `kubeadm init` was called. + +The documentation in the +[Generating kubeconfig files for additional users](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#kubeconfig-additional-users) +section will be updated. +For each new User or a Group of users the recommendation will be to create a new +ClusterRoleBinding. Use the flags `--client-name` and `--org` of `kubeadm kubeconfig user` +to control what User or Group this credential belongs to. Revoke access of a single User +or a whole Group in case of credential compromise. Using the existing `system:masters` or +`kubeadm:cluster-admins` Groups will not be recommended. + +### User Stories (Optional) + + + +#### Story: Resolving compromised admin credentials + +As a super-user administrator I want to be able to use RBAC to remove access of an +administrator credential that has been compromised. By removing the `kubeadm:cluster-admins` +ClusterRoleBinding all administrators credentials (`admin.conf`) on control plane nodes +that are signed for the `kubeadm:cluster-admins` Group will stop working. +I can proceed to sign new `admin.conf` credentials to be used on all nodes +by using a new custom Group bound to the `cluster-admin` ClusterRole. + +Alternatively, the certificate authority of this cluster can be rotated, +which will allow the `kubeadm:cluster-admins` ClusterRoleBinding to be restored +and used again for new `admin.conf` credentials signed using the new certificate authority +and for the Group `kubeadm:cluster-admins`. + +#### Story: Keeping the super-user credential in a safe place + +As a super-user administrator I want to keep a credential that has super powers +and can override RBAC outside of nodes managed by kubeadm. After `kubeadm init` has +finished, I can move the `super-admin.conf` file to a secure location and only use it in +case of an emergency. + +### Notes/Constraints/Caveats (Optional) + + + +#### Caveats + +Removing the `kubeadm:cluster-admins` ClusterRoleBinding will drop access +to all users in the `kubeadm:cluster-admins` Group, rendering all `admin.conf` +files on control plane nodes as invalid. However, it will not cause immediate +downtime as the `admin.conf` on control plane nodes is only used when executing +kubeadm commands, such as `kubeadm upgrade`. In such conditions the `admin.conf` +should be populated with credentials for a safe, temporary User that is bound +to the `cluster-admin` ClusterRole. + +The super-user admin can then proceed to rotate the cluster certificate authority +and eventually restore the `kubeadm:cluster-admins` ClusterRoleBinding. + +### Risks and Mitigations + + + +#### Risk: Implementation complexity during in-place upgrade + +During `kubeadm upgrade apply` the new ClusterRoleBinding `kubeadm:cluster-admins` +must be added. The switch to two separate files must be performed on all control plane +nodes. This means that both `kubeadm upgrade apply` and `kubeadm upgrade node` must +handle the migration of the `admin.conf` properly. The creation of the new +`super-admin.conf` will be done only on the node where `kubeadm upgrade apply` is +called. On later upgrades, one release after this feature is added, the certificate +renewal logic of `kubeadm upgrade` must be aware that the `super-admin.conf` file could +be missing and should not be rotated. + +Updating the `kube-system/kubeadm-certs` Secret contents where an encrypted +`admin.conf` is stored will not be updated during upgrade. + +The mitigation here is detailed unit tests and e2e tests that ensure that +the migration for in-place upgrades is handled properly. + +#### Risk: Implementation complexity during re-place upgrade + +Users or higher level tools that manage kubeadm re-place upgrades, by removing old +control plane nodes and adding new control plane nodes, without calling +`kubeadm upgrade apply/node` should handle this transition manually. +The RBAC ClusterRoleBinding `kubeadm:cluster-admins` must be created before +the upgrade has started. A new `admin.conf` that has the subject: +`O = kubeadm:cluster-admins, CN = kubernetes-admin` must be uploaded in the +`kube-system/kubeadm-certs` Secret and encrypted with the appropriate certificate key. +Joining control plane nodes, must be able to download and decrypt the new `admin.conf`. + +Again, tests will be required to ensure that the `admin.conf` subject is migrated +properly. The `super-admin.conf` file will not exist at all under such conditions, +therefore the administrator can sign one manually by using +the `kubeadm kubeconfig user --client-name=kubernetes-super-admin --org=system:masters` +command. + +The same users or higher level tools can decide not to opt-in into this +new behavior for existing clusters and continue using the `admin.conf` with +`system:masters`. However, this means such clusters will drift away +from the kubeadm security defaults. + +## Design Details + + + +### Test Plan + + + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +kubeadm will include new unit tests to ensure the new separate admin files are +generated properly. During init/join-control-plane/upgrade the existing +[kinder](https://git.k8s.io/kubeadm/kinder) upgrade e2e test jobs will +test this functionality. + +One additional integration test can be added in `cmd/kubeadm/test`. It can be maintained +for one or more releases until more users upgrade to the first release where this +feature is available. It can do the following (can vary, subject to implementation details): +- Calls `kubeadm init phase certs ca`. +- Calls `kubeadm init phase kubeconfig admin`. +- Checks if two admin kubeconfig files are generated. +- Calls `kubeadm certs renew admin.conf` and verifies whether the kubeconfig files +are updated. + +##### Prerequisite testing updates + + + +None. + +##### Unit tests + + + + + + + +At least the following kubeadm packages will require updates and new unit tests: +- `cmd/kubeadm/app/phases/kubeconfig` +- `cmd/kubeadm/app/phases/certs` +- `cmd/kubeadm/app/phases/upgrade` + +##### Integration tests + + + + + + + +One new integration test can be added here: +- `cmd/kubeadm/test` + +##### e2e tests + + + + + +The functionality will be exercised by the existing regular and upgrade e2e tests +that use the kinder tool. + +### Graduation Criteria + + + +Once released, the feature will come in effect immediately during upgrade +to a particular version or when new cluster creation is done with the +kubeadm release when the feature was added. There are no plans for opt-out +or opt-in with a feature gate as this is considered a security improvement. +The feature will graduate immediately after release. + +### Upgrade / Downgrade Strategy + + + +On upgrade, the regular kubeconfig and certificate renewal process will be +performed. The `admin.conf` file will be replaced with a file that has +the de-escalated privileges. If the `super-admin.conf` file is present on +a node during a future N+1 release, the file will be replaced with +updated credentials. If `super-admin.conf` is not present, no errors +will be returned. + +The `kubeadm upgrade apply` command will manage the addition of +the new ClusterRoleBinding `kubeadm:cluster-admins`. One release +after the feature was enabled, this logic for the RBAC management +can be removed. + +kubeadm does not support downgrades. + +### Version Skew Strategy + + + +The target release of kubeadm will be able to upgrade from N-1 nodes that +do not have the feature enabled yet. During cluster creation with the +target kubeadm version the feature will become enabled immediately. + +The `kubeadm upgrade apply` command will manage the addition of +the new ClusterRoleBinding `kubeadm:cluster-admins` RBAC. One release +after the feature was enabled, this logic for the RBAC management +can be removed. + +## Production Readiness Review Questionnaire + +Not applicable for kubeadm. The kubeadm project is considered "out-of-tree". + +## Implementation History + + + +- 18.09.2023: KEP created (1.29) +- 10.10.2023: Address minor feedback. KEP marked as implementable. + +## Drawbacks + + + +An estimated drawback is the change in user expectations. +The users may expect that the `admin.conf` file will continue to have the +super powers provided by the `system:masters` Group. This feature will affect +this expectation. An action will be required by the same users to sign a new +explicit `super-admin.conf` file with the super powers. + +## Alternatives + + + +### Using a feature gate + +A feature gate was considered where users can use it to opt-in into this +behavior. The feature could use the standard Alpha-Beta-GA graduation cycle +and have the feature gate enabled for the Beta release. + +An argument against this behavior is that the feature would not add disruption for +the average user. The `admin.conf` can continue to work as a cluster-admin +credential. The feature gate would only add complexity and will have no +well established benefits, other than granular feature enablement control. + +Another argument is that in practice this is a security improvement +and preferably users should not be able to opt-out of similar features. + +### Integration test vs e2e test + +The kinder e2e testing tool is quite flexible and the nodes do include tools +such as `openssl` for certificate inspection and `base64` for decoding base64 +strings. However, writing such an e2e test must be done in bash +or hardcoded in kinder as Go code. + +Instead the option to use an Go integration test included in `cmd/kubeadm/test` +seems preferable. It will allow using the Go standard library and existing +kubeadm utils for parsing kubeconfig files and x509 certificates. + +One downside is that the same integration test will be executed on every +change in the kubeadm tree under `kubernetes/kubernetes` instead +of being less frequent - i.e. periodic. + +### Signing individual kubeconfig files for control plane nodes + +Today, control plane nodes use the same `admin.conf` that is shared +via a `kube-system/kubeadm-certs` Secret and encrypted with a RSA key. +This behavior is expected from kubeadm since it treats control plane +nodes as setup replicas (more or less). + +During joining of control plane nodes, this download of the `admin.conf` +can be skipped. Instead the `ca.key` and `ca.cert` pair (already shared) +can be used to create a new `admin.conf` unique for this control-plane node. + +For example: +- Control plane node `foo` wishes to join the cluster. +- The `ca.key` and `ca.cert` are downloaded from the Secret. +- The node name is ensured, likely by expecting the kubelet +client certificate. +- A new `admin.conf` is created that has a subject: +`O = kubeadm:cluster-admins, CN = kubernetes-admin-foo`. +- This User `kubernetes-admin-foo` is bound to the `cluster-admin` +ClusterRole with an additional RBAC rule. + +Similarly, during `kubeadm init` the `admin.conf` must contain +the CN with the node name. `kubeadm upgrade` for this node +must ensure to properly maintain the new RBAC rule and `CN` +in the `admin.conf`. + +This would allow to revoke access of individual control plane +nodes' `admin.conf` users. Since it adds complexity to the current KEP, +it could be done in a separate KEP as additional hardening. + +However, it also opens some questions about node security. With disk structure +in mind, if the `admin.conf` of a control plane node has leaked, +that may also mean the `ca.key` has leaked which means the entire cluster +is compromised. + +## Infrastructure Needed (Optional) + + + +None. diff --git a/keps/sig-cluster-lifecycle/kubeadm/4214-separate-super-user-kubeconfig/kep.yaml b/keps/sig-cluster-lifecycle/kubeadm/4214-separate-super-user-kubeconfig/kep.yaml new file mode 100644 index 00000000000..fee7a8ed7ed --- /dev/null +++ b/keps/sig-cluster-lifecycle/kubeadm/4214-separate-super-user-kubeconfig/kep.yaml @@ -0,0 +1,19 @@ +title: Separate super-user kubeconfig for kubeadm +kep-number: 4214 +authors: + - "@neolit123" +owning-sig: sig-cluster-lifecycle +participating-sigs: + - sig-cluster-lifecycle +status: implementable +creation-date: 2023-9-18 +last-updated: 2023-10-10 +reviewers: + - "@SataQiu" + - "@pacoxu" + - "@chendave" +approvers: + - "@SataQiu" + - "@pacoxu" +latest-milestone: "0.0" +stage: "alpha"