KEP-3257: Trust Anchor Sets #3258

ahmedtd · 2022-03-30T09:49:56Z

This KEP proposes a new TrustAnchorSets object that signers can use to distribute their X.509 trust anchors. Additionally, it proposes a Kubelet projected volume source to allow workloads to easily consume data from TrustAnchorSets.

Issue link: #3257

Currently remaining

@enj and revocation
@mikedanese and default read grant
PRR is good.

k8s-ci-robot · 2022-03-30T09:50:04Z

Welcome @ahmedtd!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2022-03-30T09:50:06Z

Hi @ahmedtd. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ahmedtd · 2022-03-30T18:42:21Z

Notes from SIG Auth:

Consider CRL / revocation? Put an OCSP URL in the TrustAnchorSet object?
Is the signer name field always necessary? What about the case where we are configuring webhook backends or distributing default anchors for the whole cluster?

ahmedtd · 2022-04-01T17:42:52Z

Another comment: What about reliability concerns / how can we safely roll out changes to the contents of a TrustAnchorSets object.

enj

I read through this rather quickly and likely missed things.

keps/sig-auth/3257-trust-anchor-sets/README.md

enj · 2022-04-13T17:17:27Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+connections to the server without downtime, even as I rotate the trust anchors
+that back the CA.


But how do you know when all clients have read the updated CA?

As described below, you basically have to do this out of band, by ensuring that clients reread the updated trust anchors file promptly.

A full end-to-end solution would involve each client exporting metrics indicating that it has picked up the new trust anchors, and not allowing the old trust anchors to be retired until you are satisfied with the uptake metrics.

I would also be interested to read a more detailed use case of how CA rotation could be achieved with this mechanism as I think making it possible to rotate CAs easier would be very valuable.
Is there a way to observe when kubelet has updated all pods with the new CAs?
It feels like if it was the case that the client anyway needs to implement custom logic to export metrics about what CAs they use then it is actually not the case that this makes rotation much easier for clients (unless there is already some standard way for them to do that)

Metrics from kubelet are a good idea.

They don't cover the end-to-end client experience, though. When I've built a similar product, we basically went with "tell clients they need to notice changes within 5 minutes, and then wait several hours after each step in the CA rotation". Workloads that fail to pick up the new changes will begin crashing or having connection errors, which can be handled by liveness / readiness probes.

I don't think that the kubelet has any way to know if the client application has started using the updated trust bundle. A kubelet metric could let you know if the mount points were updated. If we assume a fairly small number of ClusterTrustBundles that may be possible to know. We should certainly be able to know the number of ClusterTrustBundle mount point updates and update errors, which is useful for knowing if the kubelet code is generally functional, but not for precise rotation decisions.

I don't think that the kubelet has any way to know if the client application has started using the updated trust bundle.

Is kubelet able to observe reads on disk via the container runtime? i.e. kubelet changes the bundle at time T1 and observes a read of the file at time T2 as a good (but still not perfect) indicator that the app is up to date?

I don't think so, unless we do something drastic like implement projected volumes in terms of FUSE.

keps/sig-auth/3257-trust-anchor-sets/README.md

ahmedtd · 2022-04-13T18:42:20Z

Feedback from SIG Auth:

Probably simpler to remove all the custom RBAC stuff and just provide a recommended mapping from signer name to TrustAnchorSet name
Probably refer to X509 extensions for OCSP URLs. Is requiring OCSP too big of a hurdle.
Canarying: Unclear. We could just require administrators to create a new TrustAnchorSets object, with a version in the name. Maybe kubelet could understand this format and use canarying, or maybe it doesn't, and we require workload recreation.

keps/sig-auth/3257-trust-anchor-sets/README.md

keps/sig-auth/3257-trust-anchor-sets/kep.yaml

keps/sig-auth/3257-trust-anchor-sets/README.md

ritazh · 2022-06-06T16:37:00Z

Hi @ahmedtd, any progress on this KEP? Just checking in as enhancements freeze is at 18:00 PST on Thursday June 16, 2022.

deads2k

I like the design, I think some edge cases and auto-trust behavior need to be filled in prior to being implementable.

deads2k · 2022-09-30T15:54:27Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+In general, ClusterTrustBundle objects are considered publicly-readable within
+the cluster.  Read permissions will be granted cluster-wide to the
+`system:authenticated` group.


This can't work, I think... if you can create a pod, you will be able to get the content of any ClusterTrustBundle. We would need to add a namespace-scoped TrustBundle to support this usecase.

And allow some kind of cross-namespace reference. Or we would need to have a cluster-scoped API with a, "show me the ones I can see" view, which we know doesn't scale well. I don't think we should hold this design on that problem.

deads2k · 2022-09-30T15:57:14Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+connections to the server without downtime, even as I rotate the trust anchors
+that back the CA.


I don't think that the kubelet has any way to know if the client application has started using the updated trust bundle. A kubelet metric could let you know if the mount points were updated. If we assume a fairly small number of ClusterTrustBundles that may be possible to know. We should certainly be able to know the number of ClusterTrustBundle mount point updates and update errors, which is useful for knowing if the kubelet code is generally functional, but not for precise rotation decisions.

keps/sig-auth/3257-trust-anchor-sets/README.md

deads2k · 2022-09-30T16:52:45Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+  name: frobnicator.example.com
+  clientConfig:
+    url: https://webhooks.example.com/frobnicator
+-    caBundle: "< long base64 data>"


Currently an empty caBundle means "use system certificates". This means that aggregated apiservers will need to be updated before the trustAnchors field can be enabled so that the correct trust is used. this isn't a blocker, but should be indicated in release notes.

Hmmm. That's problematic for the rollback case where you enable the feature gate, configure the feature, and then disable the feature gate.

Should we introduce some sort of selector enum, or maybe say that trustAnchors takes precedence over caBundle if it is set (and the feature gate is enabled)?

I've updated the doc so that ClusterTrustBundleName takes precedence over the other TLS fields if it is non-empty. That way, users don't have to unset CABundle in order to try out ClusterTrustBundleName, and they can freely toggle the feature gate.

keps/sig-auth/3257-trust-anchor-sets/README.md

deads2k · 2022-09-30T17:07:18Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+### Upgrade / Downgrade Strategy
+
+At present, there are no upgrade / downgrade concerns.  The ClusterTrustBundle feature gate controls overall availability of the feature.


I think that admissionwebhook configuration and apiservice behavior with empty cabundles represents a migration concern on downgrade.

Suggestion: "The addition of ClusterTrustBundles offers the opportunity for teams to migrate Webhook Configurations from directly embedded CA Bundles ({mutatingwebhookconfigurations,validatingwebhookconfigurations}.webhooks.clientConfig.caBundle) to external ClusterTrustBundle references. Administrators making use of the new {mutatingwebhookconfigurations,validatingwebhookconfigurations}.webhooks.clientConfig.trustAnchors.clusterTrustBundleName field would need to revert back to using the caBundle field before/immediately after downgrading, particularly when failurePolicy = Fail"

I think this is now handled by saying that both CABundle and ClusterTrustBundleName can be set at the same time, but that ClusterTrustBundleName takes precedence.

deads2k · 2022-09-30T17:07:51Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+### Version Skew Strategy
+
+Both kubelet and kube-apiserver will need to be at 1.26 for the full featureset to be present.  If only kube-apiserver is at 1.26 and kubelet is lower, then the the pod mounting feature will be cleanly unavailable, but all other aspects of the feature will work.


what error is produced by the kubelet and how is it observed by the pod author and by the cluster-admin?

I've updated the document to describe that Kubelet will emit events for problems during volume mount (ClusterTrustBundle doesn't exist), and during volume update (ClusterTrustBundle has ceased to exist).

keps/sig-auth/3257-trust-anchor-sets/README.md

deads2k · 2022-09-30T17:09:23Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+Yes, if the ClusterTrustBundle feature gate is disabled, the cluster will largely continue to function properly.


need to indicate how the trust by server components changes when their hooks go from "missing cabundle and present clustertrustbundle" to simply "missing ca bundle"

I think this is now handled by specifying that both fields may be set but that clustertrustbundlename takes precedence.

deads2k · 2022-09-30T17:11:18Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+
+If we can find a low-cardinality way for Kubelet to report what object generation of each ClusterTrustBundle object that it is currently exporting to pods, that would help implement the two new SLOs proposed above.


I think you can have a count of volume updates and a count of mount errors even if you don't subdivide by clustertrustbundle. That would inform you of feature problems on a per-kubelet basis.

I've documented that in the metrics section.

enj · 2022-10-03T18:08:16Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+In general, ClusterTrustBundle objects are considered publicly-readable within
+the cluster.  Read permissions will be granted cluster-wide to the
+`system:authenticated` group.


I am still in the "make this readable by system:authenticated" camp though I think @mikedanese had concerns. We could give this a dedicated cluster role binding in RBAC to make it easier for admins to opt-out of the enumeration. Maybe we also need some kind of dedicated RBAC check for referencing a bundle that is also granted globally by default?

enj · 2022-10-03T19:29:36Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+Security: Should individual trust anchor set entries designate an OSCP endpoint
+to check for certificate revociation?  Or should we require the URL to be
+embedded in the issued certificates?


What approach are we planning on here?

keps/sig-auth/3257-trust-anchor-sets/README.md

enj · 2022-10-03T19:57:53Z

This looks pretty good to me. I would still like a better story around revocation, i.e. #3258 (comment)

mikedanese · 2022-10-03T21:47:01Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+For each pemTrustAnchors projected volume source in the pod spec:
+
+1. Kubelet will establish a watch on the named ClusterTrustBundle (using a


What cardinality of ClusterTrustBundles do we expect? Tens, hundreds? This is only needed if kubelets won't have read access to all ClusterTrustBundles. Otherwise, might not be worth the additional watches. @wojtek-t wdyt?

I've simplified this section so that it doesn't imply anything about how kubelet actually gets the updates. The amount is probably low enough that we can just have a global watch. The objects will be fairly big, though.

mikedanese · 2022-10-03T22:03:48Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+
+In general, ClusterTrustBundle objects are considered publicly-readable within
+the cluster.  Read permissions will be granted cluster-wide to the
+`system:authenticated` group.


If the primary consumption method for these is a volume projection, why would we not just grant the permission to nodes? We don't grant workloads access to read config maps, etc... and we probably don't want to encourage random clients to get/list/watch these potentially very large objects.

In GKE, we would not be able to grant read to system:authenticated.

deads2k · 2022-10-05T16:26:15Z

If the primary consumption method for these is a volume projection, why would we not just grant the permission to nodes? We don't grant workloads access to read config maps, etc... and we probably don't want to encourage random clients to get/list/watch these potentially very large objects.

Pod authors need to know the names of the ClusterTrustBundles available for mounting in their cluster.

ahmedtd · 2022-10-05T17:14:31Z

If the primary consumption method for these is a volume projection, why would we not just grant the permission to nodes? We don't grant workloads access to read config maps, etc... and we probably don't want to encourage random clients to get/list/watch these potentially very large objects.

Pod authors need to know the names of the ClusterTrustBundles available for mounting in their cluster.

There's a couple of different points being discussed here; you may be talking past each other.

Should workloads (pods running in the cluster) have access to ClusterTrustBundles. The answer is yes, and it will happen automatically by granting kubelets the permission to read all ClusterTrustBundles.
Should cluster users (people authenticating as non-serviceaccount identities) have access to ClusterTrustBundles. The answer is yes. What's unclear is whether or not that should happen by some sort of default grant to system:authenticated, or via standard RBAC / webhook integrations set up by cluster operators.

On point 2, for what its worth, I don't think we should grant permissions just because someone is able to authenticate to your cluster. Just because you configure your cluster to trust identity tokens issued by some OIDC IDP doesn't mean you want to grant everybody with an identity issued by that IDP (temps, vendors, employees in unrelated business functions) a bunch of permissions in your cluster.

Edit: If convenience is the major reason to want a system:authenticated grant, then I think Kubernetes should ship some predefined ClusterRoles like "Application Developer" and "Cluster Administrator" and document their use.

deads2k · 2022-10-05T19:20:09Z

PRR is approved

This KEP proposes a new ClusterTrustBundle object that signers can use to distribute their X.509 trust anchors. The following integrations are supported: * Kubelet can mount the contents of a ClusterTrustBundle into a pod. * Webhook and aggregated API server config objects can refer to a ClusterTrustBundle object rather than an inline list of trust anchors.

deads2k · 2022-10-06T13:15:54Z

lgtm

I've updated the description with the items I know of as outstanding.

enj · 2022-10-06T20:21:42Z

ahmedtd#15 is what I would like to see as a starting point for revocation.

#3258 (comment) is an okay workaround to
#3258 (comment) because users with edit level access can impersonate SAs and thus could use that to fetch the list of bundles

marosset · 2022-10-06T22:06:55Z

keps/sig-auth/3257-trust-anchor-sets/kep.yaml

+owning-sig: sig-auth
+participating-sigs:
+  - sig-auth
+status: provisional


Update to implementable if this is still targeting v1.26 🤞

We have a last-minute scope reduction. I'm working on a doc update now. I'll incorporate the status change.

That update has been pushed

mikedanese · 2022-10-06T23:21:36Z

LGTM

enj · 2022-10-06T23:27:21Z

/lgtm
/approve

johnbelamaric · 2022-10-07T01:06:48Z

/approve on behalf of David

k8s-ci-robot · 2022-10-07T01:07:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahmedtd, enj, johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-auth/OWNERS~~ [enj,johnbelamaric]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wojtek-t · 2022-11-23T09:45:27Z

keps/sig-auth/3257-trust-anchor-sets/README.md

+every pod in the cluster, which will require watches from all Kubelets.  When
+they are updated, workloads will need to receive the updates fairly quickly
+(within 5 minutes across the whole cluster), to accommodate emergency rotation
+of trust anchors for a private CA.


I missed the original comment in:
#3258
[Now catching up with some proposals that were merged in 1.26 timeframe]

IIUC, the proposal as of now claims, that each Kubelet will have a single watch, watching for all ClusterTrustBundle objects. Do I understand correctly?
If so, assuming that the changes to CTBs are infrequent, that sounds fine.

If I misunderstood it and you want to have a per-pod watch(es) for all CTBs it requires, that will for sure cause as a lot of problems, and I we have to revisit this decision.

[In general, high number of watches was hugely problematic in the past, and we don't want to make that even worse with this feature.]

@ahmedtd @mikedanese
@kubernetes/sig-scalability - FYI

yes, intent is single watch per kubelet

@ahmedtd, mind opening an update to clarify that?

Thanks Jordan - that sgtm.
Let's just add it to the KEP.

Clarified in an update.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 30, 2022

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 30, 2022

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 30, 2022

k8s-ci-robot requested a review from enj March 30, 2022 09:50

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Mar 30, 2022

k8s-ci-robot requested a review from liggitt March 30, 2022 09:50

k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Mar 30, 2022

ahmedtd mentioned this pull request Mar 30, 2022

ClusterTrustBundles (previously Trust Anchor Sets) #3257

Open

14 tasks

ahmedtd force-pushed the trust-anchor-sets branch 3 times, most recently from b7c5319 to b3a278a Compare March 30, 2022 09:58

ahmedtd force-pushed the trust-anchor-sets branch from b3a278a to c9df986 Compare April 1, 2022 18:45

enj reviewed Apr 13, 2022

View reviewed changes

keps/sig-auth/3257-trust-anchor-sets/README.md Outdated Show resolved Hide resolved

neolit123 reviewed Apr 13, 2022

View reviewed changes

keps/sig-auth/3257-trust-anchor-sets/README.md Show resolved Hide resolved

sftim reviewed May 5, 2022

View reviewed changes

keps/sig-auth/3257-trust-anchor-sets/README.md Outdated Show resolved Hide resolved

sftim reviewed May 5, 2022

View reviewed changes

keps/sig-auth/3257-trust-anchor-sets/kep.yaml Show resolved Hide resolved

sftim reviewed May 5, 2022

View reviewed changes

keps/sig-auth/3257-trust-anchor-sets/README.md Outdated Show resolved Hide resolved

kikisdeliveryservice reviewed May 13, 2022

View reviewed changes

keps/sig-auth/3257-trust-anchor-sets/README.md Show resolved Hide resolved

sbueringer mentioned this pull request Jun 3, 2022

RuntimeSDK: seamless certificate handling for RuntimeExtensions kubernetes-sigs/cluster-api#6507

Closed

k8s-ci-robot added area/enhancements Issues or PRs related to the Enhancements subproject sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Jun 8, 2022

ahmedtd force-pushed the trust-anchor-sets branch 3 times, most recently from 5bdb334 to 026db5a Compare September 28, 2022 06:53

deads2k requested changes Sep 30, 2022

View reviewed changes

enj reviewed Oct 3, 2022

View reviewed changes

mikedanese reviewed Oct 3, 2022

View reviewed changes

ahmedtd force-pushed the trust-anchor-sets branch from 22f37e2 to 68617ab Compare October 6, 2022 05:58

marosset reviewed Oct 6, 2022

View reviewed changes

ahmedtd force-pushed the trust-anchor-sets branch from d273b79 to 434efb5 Compare October 6, 2022 23:19

KEP-3257: Drop webhook/aggregation integrations, mark implementable

df35249

ahmedtd force-pushed the trust-anchor-sets branch from 434efb5 to df35249 Compare October 6, 2022 23:21

k8s-ci-robot assigned enj Oct 6, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2022

k8s-ci-robot merged commit 71c9513 into kubernetes:master Oct 7, 2022

wojtek-t reviewed Nov 23, 2022

View reviewed changes

wojtek-t mentioned this pull request Feb 3, 2023

kep-3257: Update target milestone, record reviewers / approvers #3826

Merged

		connections to the server without downtime, even as I rotate the trust anchors
		that back the CA.


		### Upgrade / Downgrade Strategy

		At present, there are no upgrade / downgrade concerns. The ClusterTrustBundle feature gate controls overall availability of the feature.


		### Version Skew Strategy

		Both kubelet and kube-apiserver will need to be at 1.26 for the full featureset to be present. If only kube-apiserver is at 1.26 and kubelet is lower, then the the pod mounting feature will be cleanly unavailable, but all other aspects of the feature will work.


		###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

		Yes, if the ClusterTrustBundle feature gate is disabled, the cluster will largely continue to function properly.


		###### Are there any missing metrics that would be useful to have to improve observability of this feature?

		If we can find a low-cardinality way for Kubelet to report what object generation of each ClusterTrustBundle object that it is currently exporting to pods, that would help implement the two new SLOs proposed above.


		For each pemTrustAnchors projected volume source in the pod spec:

		1. Kubelet will establish a watch on the named ClusterTrustBundle (using a

KEP-3257: Trust Anchor Sets #3258

KEP-3257: Trust Anchor Sets #3258

Conversation

ahmedtd commented Mar 30, 2022 • edited by deads2k Loading

k8s-ci-robot commented Mar 30, 2022

k8s-ci-robot commented Mar 30, 2022

ahmedtd commented Mar 30, 2022

ahmedtd commented Apr 1, 2022

enj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahmedtd Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

ahmedtd commented Apr 13, 2022

ritazh commented Jun 6, 2022

deads2k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enj commented Oct 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Oct 5, 2022

ahmedtd commented Oct 5, 2022 • edited Loading

deads2k commented Oct 5, 2022

deads2k commented Oct 6, 2022

enj commented Oct 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikedanese commented Oct 6, 2022

enj commented Oct 6, 2022

johnbelamaric commented Oct 7, 2022

k8s-ci-robot commented Oct 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahmedtd commented Mar 30, 2022 •

edited by deads2k

Loading

ahmedtd Oct 3, 2022 •

edited

Loading

ahmedtd commented Oct 5, 2022 •

edited

Loading

enj commented Oct 6, 2022 •

edited

Loading