-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA cert in Secret not updated when self-signed CA itself gets renewed. #5851
Comments
Thanks for raising this! As we discussed on slack this is a known issue, and one I'd like to address if at all possible. I'm not sure that the expected behaviour you described would work, though: cert-manager cannot safely update already-issued certificates with the "new" CA after the CA is rotated - the certificates would have to be re-issued by the new CA. Plus, we can't necessarily properly detect every certificate which would need re-issuing after the CA certificate is rotated. I think the smallest useful change we could make would be to emit loud warnings when we're issuing a certificate which would expire after its issuer. That'll help people to detect when things are about to go wrong. Beyond that, I think most of the solution to this lies in users planning their certificate hierarchy. It's possible for cert-manager to automate a lot of stuff in this space, but when a root certificate is rotated it can be dangerous for us to do anything at all. We could do a lot more towards documenting this! |
Not sure I follow this, but then of course I'm not familiar with the exact workflow of the controller(s). When the CA It's not so far from the current auto-reissue of As a temporary workaround (in case others land on this issue), is just to increase the duration of the CA |
There's a whole lot of "distributed systems" stuff which can go wrong. We can look for all certs which refer to the changed issuer when the issuer is changed, but that process can fail leaving us in a halfway state where some certs won't be updated. Plus, there could be a lot of certificates, and reissuing them all could in the worst case involve generating a lot of large RSA keys which isn't cheap. Even if we update all the certs successfully, they might not be picked up and used immediately by the pods which are mounting the secrets, adding more complexity. Obviously I'm talking mainly about edge cases there and we it's reasonable to say that the happy path could still be improved even if all the edge cases aren't solved - but it's an illustration of the difficulties we have. We also simply don't have that much time for making changes like this ATM. I'd love for cert-manager to be better at this stuff (it's super annoying to me that we don't have support for
I think this is likely to be a good approach - and scheduling in a calendar event to renew manually in say 9 months. It sounds very old-fashioned and not-cloud-native, but for non-trivial PKI stuff it often ends up with some manual steps (albeit usually because a PKI will have an offline root certificate meaning that rotation of intermediates is necessarily manual) EDIT: To be clear, I'd rather automate it more than insist on manual steps, for sure. Just there are limits of how much we can safely automate (gotta consider that the CA certificate might also be trusted outside the cluster) and how much time we have to do this stuff |
Thanks for the detailed response. I see a lot of the issues you're listing that to me would be more operational than functional i.e. "If you have a lot of certs managed by cert-manager, that's going to mean a lot of CPU overhead at renewal time" imho is a given.
I see this as part of the normal reconciliation loop: every time the
Understandable, but more an issue for operators. If lots of certs are currently managed by cert-manager the same more-or-less applies, and it's why we have rate-limited queues and CPU scheduling.
Totally true and an existing issue solved by the current advice of using a tools such as "wave" if the pod process is incapable of detecting & reloading changed certs.
Would you be open to a collaborative PR? As I mentioned in Slack, I work in Open Source too and luckily my company is very much open to upstream contributions. If the changes as I understand them are possible and could be accepted I could see it being a valuable investment of my time 😄 I don't mean to sound argumentative, just giving an outsiders perspective and making sure I fully understand the issues at hand 😃 |
I'm very open to collaboration and I'd gladly review a PR, but I'm also about to go on holiday for a while and realistically I won't personally be able to get back to doing anything until at least 2023-03-22 😦 After that, though, I'll be working a bunch on my Kubecon talk which is related to this use case so I should be able to help a bit then! @maelvls is currently working on open source stuff and @irbekrm is too (although Irbe is currently on holiday), so they might be able to point you in the right direction if you have questions. I'd suggest maybe starting with a design doc PR so we can agree on what changes we might want to make. Maybe also start a discussion in #cert-manager-dev on Kubernetes slack. I hope that's helpful! I'll check back in when I'm back - and thank you for showing an interest, I love this part of open source 😁 |
Awesome, thanks for the info. |
One implementation could be adding another check to the list of policies that the trigger controller checks before determining whether a cert needs to be re-issued. This check could just check that if the Certificate is for CA issuer (assuming then) check the issuer's CA secret and verify the the issued certificate matches the CA. We should then also update this event handler predicate as at the moment the trigger handler only runs on Certificate Secret events- the event handler predicate would be crucial to ensure that both the controller does get triggered when the CA Secret changes and does not get triggered (and run all the policy checks, some of which are expensive) when any random cluster Secret changes. The above approach has a problem in that it would introduce more coupling between certificates controllers and the CA issuer, which may make any future refactoring difficult and does not fit the model well. Alternative approach could be to add a new status field (i.e 'revision') that any issuer could implement that would signal to trigger controller that certificates need to be re-issued together with a new annotation for Certificates to signal for which 'revision' the Certificate was last successfully issued. With this, CA issuer could store something like CA cert fingerprint on its status and when it detects that a new CA cert has been stored in the Secret, it could bump the revision. We would add a new check to the trigger controller to trigger new issuance if the revision annotation does not match the revision field on the Certificate. That would allow us to avoid adding any issuer-specific logic to certificates controllers and any issuer, including external issuers, could implement their own logic for what would be considered a new 'revision' for them. The third alternative would be to have some external component that applies Issuing condition when it detects that the CA certificate has been updated. That would need to either somehow store the fingerprint of the old certificate or have other means to distinguish between Secret updates because the CA certificate changed from other updates (i.e user added a new label). |
Is it OK that CA issues certificates expiring after CA expiration? Does fixing #5864 will partially help to avoid this issue? |
The reason for this is a bug in cert manager. The Certificates we have contain the CA that they've been signed with. When a CA has been renewed, the Certificates that use it will not be updated until they themselves are renewed. This causes an issue because the Certificate (in KIAM's case) will renew a few hours after the CA. This is enough time for the CA in the kiam certificate to expire and cause a flurry of alerts. This has mainly been observed with KIAM, but we decided to change all alerts. See cert-manager/cert-manager#5851
* Change certificates expiration from 14 to 13 days The reason for this is a bug in cert manager. The Certificates we have contain the CA that they've been signed with. When a CA has been renewed, the Certificates that use it will not be updated until they themselves are renewed. This causes an issue because the Certificate (in KIAM's case) will renew a few hours after the CA. This is enough time for the CA in the kiam certificate to expire and cause a flurry of alerts. This has mainly been observed with KIAM, but we decided to change all alerts. See cert-manager/cert-manager#5851 * Update CHANGELOG
I feel like the trust manager is the solution to this problem - updating just the ca.crt in previously issued client secrets would mean the chain is broken in that secret. Reissuing all the certs could take a long time and cause a thundering herd style scenario. You really need a way to distribute both the old CA cert and the new CA cert until such time as all certs have been refreshed with the new CA. |
Issues go stale after 90d of inactivity. |
/remove-lifecycle stale |
Just linking similar or relevant issues: |
The advantage of not using trust-manager is that one can simply use csi driver without messing with additional mounts. |
I understand that the UX of that is better, but it's really not safe to use When a root X is rotated, you need to trust both the old and new roots for a period, until everything has been updated to use the new chain. It's not possible to do that with |
Why it is not possible? (What would need to change in cert-manager?) |
cert-manager would need to track every existing cert that's still valid and that could've been trusted, and then put all those certs in ca.crt - like an inventory of past valid CA certificates. But it can't know which of those past certs should actually be trusted - you might have manually rotated one because it was exposed for example - so we can't really just track everything and then throw it in ca.crt! trust-manager is a little bit more manual, because you'd have to track those roots explicitly and add them as sources in trust-manager - but it's safe in part because you have complete control. cert-manager can only guess! |
Or am I wrong? (I do not understand why "cert-manager can only guess"?) |
I think of it like this: Imagine we have two different services, which we'll call X and Y. Assume X and Y need to talk to each other using TLS and both use They both get their certs from an issuer we'll call A. Since they both are issued from A, they both trust A since that's what's in Imagine that A is about to expire so we rotate it to B. We rotate X's certificate to be issued from B, but that means it no longer trusts Y's certificate which is still issued from A. Also, Y doesn't trust B until it has been rotated. This is the problem we have today - there's always downtime since we can't guarantee that both X and Y have their certs reissued immediately. We could change it so that the issuer keeps track of all certs it uses, and puts all of them in Instead, we could change it so that we "pre-issue" B, so it appears in Going further, if we then realise we accidentally uploaded the private key for B on github somehow, we need to distrust B and rotate again to C. So we need to add some way for cert-manager to know that Does that make sense? Sorry it's so long! |
@SgtCoDFish First, thank you for your time you spent discussing the solutions.
I think trust-manager does not need to be reinvented, but rather optionally integrated with cert-manager so that its bundle can also be used via CSI driver.
What would have to be done to make it possible to override |
I'm not sure I follow; could you give me an example? If a user is able to mount via CSI driver, why would that user not be able to mount a trust-manager-generated ConfigMap?
To use ca.crt safely, every issuer has to provide the CA which should be trusted for its chain (which isn't necc. simple), plus a history of past CAs which were used and which are still valid. We'd also at minimum need the ability to filter certain CAs out of whatever is returned. |
It is not about impossibility, it is about UX. (As you mentioned earlier: "I understand that the UX of that is better...")
From what you said above: "At this point though I think we've gone to more effort than it would've been to just use trust-manager, not to mention the many other benefits we'd get from trust-manager in this situation." I assume that this complexity is / will be solved by the trust-manager.
Sure. Let's have a K8S cluster. In this cluster we will have two issuers:
|
As someone who just encountered this problem myself, I am still unable to grasp how (or where) using trust-manager would solve this issue. I am trying to sum up the discussion so far, to try to make sense of what has been written so far (also - rubber duck mode). There are two problems at hand (note I use the names from the initial post for clarity):
A tempting solution is to use both the CA certificate directly via So from my point of view, the core issue revolves around the question "when does cert-manager need to re-generate |
Issues go stale after 90d of inactivity. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. |
Stale issues rot after 30d of inactivity. |
Just gave this a try and it seems like trust-manager does help with this. Here's the setup I tried ---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: selfsigned-ca-short
namespace: my-trust-namespace
spec:
isCA: true
commonName: selfsigned-ca-short
secretName: ca-short
duration: 1h
privateKey:
algorithm: ECDSA
size: 256
issuerRef:
name: selfsigned-issuer
kind: ClusterIssuer
group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: ca-issuer-short
spec:
ca:
secretName: ca-short
---
apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
name: bundle
spec:
sources:
- secret:
name: ca-short
key: tls.crt
target:
configMap:
key: ca.crt I first wanted to test the "month 8 case" as mentioned in this comment from a related issue: #2478 (comment) For my test case since the CA expires each hour, the valid window is from the 40th to 60th minute. I also had a MQTT broker service running on an issued leaf cert from this CA with expiry of 30 days. Around the 40th minute, my observations:
I think this works because, by default, cert-manager doesn't rotate the private key when renewing a cert, CA or not. Combine it with what @Jamstah said in this comment #5864 (comment)
So in my case since the private key stayed the same, the leaf cert issued from the old CA can be cryptographically chained to the new CA even though the public key is different. I think this also means that as long as the client gets the newest trust bundle, the leaf cert can be used even though it way outlives the original CA (30 days vs 1 hour). Lastly, after 60 minutes, the original CA expired, but I can still send messages and pass openssl verification as long as I use the renewed trust bundle. If I use the old trust bundle, I get an error from mosquitto saying |
Rotten issues close after 30d of inactivity. |
@cert-manager-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@ebuildy: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I encountered the same issue and resolved it by extending the duration of the self-signed CA, ensuring that the renewBefore period is longer than the duration of the issued certificate. This guarantees that the signed certificate always has a valid (non-expired) CA certificate. For example:
|
Describe the bug:
When using a self-signed issuer to managed an internal CA, when the CA itself gets renewed, none of the Certificates issued via the CA receive the updated CA cert, so once it expires all of the services using it fail to connect due to certificate expiration.
Expected behaviour:
When a CA managed by Cert-Manager is renewed, all Certificates issued by that CA should have their Secret updated with the new CA cert.
Steps to reproduce the bug:
Apply the following to a cluster
Note the CA fingerprint matches from the CA and client cert:
Wait an hour for the CA certificate to be renewed...
Then check again:
They no longer match, so the certificate bundle in the client Secret is wrong and workloads attempting to use it will see some sort of "certificate expiration" error.
Environment details::
helm template ... | kubectl apply -f -
/kind bug
The text was updated successfully, but these errors were encountered: