Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA cert in Secret not updated when self-signed CA itself gets renewed. #5851

Closed
TimJones opened this issue Mar 6, 2023 · 34 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@TimJones
Copy link

TimJones commented Mar 6, 2023

Describe the bug:
When using a self-signed issuer to managed an internal CA, when the CA itself gets renewed, none of the Certificates issued via the CA receive the updated CA cert, so once it expires all of the services using it fail to connect due to certificate expiration.

Expected behaviour:
When a CA managed by Cert-Manager is renewed, all Certificates issued by that CA should have their Secret updated with the new CA cert.

Steps to reproduce the bug:

Apply the following to a cluster

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: selfsigned-ca
spec:
  isCA: true
  commonName: selfsigned-ca
  secretName: selfsigned-ca
  # Shortest time allowed
  duration: "1h"
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned
    kind: Issuer
    group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned-ca
spec:
  ca:
    secretName: selfsigned-ca
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: selfsigned-client
spec:
  secretName: selfsigned-client
  commonName: selfsigned-client
  issuerRef:
    name: selfsigned-ca
    kind: Issuer
    group: cert-manager.io

Note the CA fingerprint matches from the CA and client cert:

$ kubectl get secret selfsigned-ca -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=7C:53:15:A9:EE:75:20:43:88:5A:5F:8C:AE:53:C0:D1:2A:77:77:A3

$ kubectl get secret selfsigned-client -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=7C:53:15:A9:EE:75:20:43:88:5A:5F:8C:AE:53:C0:D1:2A:77:77:A3

Wait an hour for the CA certificate to be renewed...
Then check again:

$ kubectl get secret selfsigned-ca -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=B2:48:49:D4:CC:45:F5:46:BF:B9:7D:AB:71:2C:2E:31:7E:7A:FD:59

$ kubectl get secret selfsigned-client -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=7C:53:15:A9:EE:75:20:43:88:5A:5F:8C:AE:53:C0:D1:2A:77:77:A3

They no longer match, so the certificate bundle in the client Secret is wrong and workloads attempting to use it will see some sort of "certificate expiration" error.

Environment details::

  • Kubernetes version: v1.25.5
  • Cloud-provider/provisioner: N/A (bare-metal)
  • cert-manager version: v1.11.0
  • Install method: Helm via ArgoCD i.e. helm template ... | kubectl apply -f -

/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2023
@SgtCoDFish
Copy link
Member

Thanks for raising this! As we discussed on slack this is a known issue, and one I'd like to address if at all possible.

I'm not sure that the expected behaviour you described would work, though: cert-manager cannot safely update already-issued certificates with the "new" CA after the CA is rotated - the certificates would have to be re-issued by the new CA.

Plus, we can't necessarily properly detect every certificate which would need re-issuing after the CA certificate is rotated.

I think the smallest useful change we could make would be to emit loud warnings when we're issuing a certificate which would expire after its issuer. That'll help people to detect when things are about to go wrong.

Beyond that, I think most of the solution to this lies in users planning their certificate hierarchy. It's possible for cert-manager to automate a lot of stuff in this space, but when a root certificate is rotated it can be dangerous for us to do anything at all. We could do a lot more towards documenting this!

@TimJones
Copy link
Author

TimJones commented Mar 7, 2023

cert-manager cannot safely update already-issued certificates with the "new" CA after the CA is rotated - the certificates would have to be re-issued by the new CA.

Not sure I follow this, but then of course I'm not familiar with the exact workflow of the controller(s).

When the CA Certificate is renewed, the backing Secret would also change I assume. The Issuer controller could/should be watching that Secret for changes and requeue all the Certificates that match its issuerRef for renewal also.

It's not so far from the current auto-reissue of Certificates and I'm not sure where the danger is in this process, but then I'm not a PKI expert so happy to be taught 😃

As a temporary workaround (in case others land on this issue), is just to increase the duration of the CA Certificate to something unreasonably long (8760h ~ 1 year), but I think something more robust is needed. At the very least an alterable metric for each Certificate when it's CA bundle is close to expiry.

@SgtCoDFish
Copy link
Member

SgtCoDFish commented Mar 7, 2023

Not sure I follow this, but then of course I'm not familiar with the exact workflow of the controller(s).

There's a whole lot of "distributed systems" stuff which can go wrong. We can look for all certs which refer to the changed issuer when the issuer is changed, but that process can fail leaving us in a halfway state where some certs won't be updated. Plus, there could be a lot of certificates, and reissuing them all could in the worst case involve generating a lot of large RSA keys which isn't cheap.

Even if we update all the certs successfully, they might not be picked up and used immediately by the pods which are mounting the secrets, adding more complexity.

Obviously I'm talking mainly about edge cases there and we it's reasonable to say that the happy path could still be improved even if all the edge cases aren't solved - but it's an illustration of the difficulties we have.

We also simply don't have that much time for making changes like this ATM. I'd love for cert-manager to be better at this stuff (it's super annoying to me that we don't have support for pathLen yet) but we just don't have that much time. 😦

just to increase the duration of the CA Certificate to something unreasonably long (8760h ~ 1 year)

I think this is likely to be a good approach - and scheduling in a calendar event to renew manually in say 9 months. It sounds very old-fashioned and not-cloud-native, but for non-trivial PKI stuff it often ends up with some manual steps (albeit usually because a PKI will have an offline root certificate meaning that rotation of intermediates is necessarily manual)

EDIT: To be clear, I'd rather automate it more than insist on manual steps, for sure. Just there are limits of how much we can safely automate (gotta consider that the CA certificate might also be trusted outside the cluster) and how much time we have to do this stuff

@TimJones
Copy link
Author

TimJones commented Mar 7, 2023

Thanks for the detailed response. I see a lot of the issues you're listing that to me would be more operational than functional i.e. "If you have a lot of certs managed by cert-manager, that's going to mean a lot of CPU overhead at renewal time" imho is a given.

We can look for all certs which refer to the changed issuer when the issuer is changed, but that process can fail leaving us in a halfway state where some certs won't be updated.

I see this as part of the normal reconciliation loop: every time the Certificate reconciliation is queued, check the issuerRef matches the ca.crt and if not, start a regular renewal with a CertificateRequest. Nothing should be lost half way for long, even with controller restarts.

Plus, there could be a lot of certificates, and reissuing them all could in the worst case involve generating a lot of large RSA keys which isn't cheap.

Understandable, but more an issue for operators. If lots of certs are currently managed by cert-manager the same more-or-less applies, and it's why we have rate-limited queues and CPU scheduling.

Even if we update all the certs successfully, they might not be picked up and used immediately by the pods which are mounting the secrets, adding more complexity.

Totally true and an existing issue solved by the current advice of using a tools such as "wave" if the pod process is incapable of detecting & reloading changed certs.

We also simply don't have that much time for making changes like this ATM.

Would you be open to a collaborative PR? As I mentioned in Slack, I work in Open Source too and luckily my company is very much open to upstream contributions. If the changes as I understand them are possible and could be accepted I could see it being a valuable investment of my time 😄

I don't mean to sound argumentative, just giving an outsiders perspective and making sure I fully understand the issues at hand 😃

@SgtCoDFish
Copy link
Member

I'm very open to collaboration and I'd gladly review a PR, but I'm also about to go on holiday for a while and realistically I won't personally be able to get back to doing anything until at least 2023-03-22 😦 After that, though, I'll be working a bunch on my Kubecon talk which is related to this use case so I should be able to help a bit then!

@maelvls is currently working on open source stuff and @irbekrm is too (although Irbe is currently on holiday), so they might be able to point you in the right direction if you have questions.

I'd suggest maybe starting with a design doc PR so we can agree on what changes we might want to make. Maybe also start a discussion in #cert-manager-dev on Kubernetes slack.

I hope that's helpful! I'll check back in when I'm back - and thank you for showing an interest, I love this part of open source 😁

@TimJones
Copy link
Author

TimJones commented Mar 7, 2023

Awesome, thanks for the info.

@irbekrm
Copy link
Contributor

irbekrm commented Mar 13, 2023

One implementation could be adding another check to the list of policies that the trigger controller checks before determining whether a cert needs to be re-issued. This check could just check that if the Certificate is for CA issuer (assuming then) check the issuer's CA secret and verify the the issued certificate matches the CA. We should then also update this event handler predicate as at the moment the trigger handler only runs on Certificate Secret events- the event handler predicate would be crucial to ensure that both the controller does get triggered when the CA Secret changes and does not get triggered (and run all the policy checks, some of which are expensive) when any random cluster Secret changes.

The above approach has a problem in that it would introduce more coupling between certificates controllers and the CA issuer, which may make any future refactoring difficult and does not fit the model well.

Alternative approach could be to add a new status field (i.e 'revision') that any issuer could implement that would signal to trigger controller that certificates need to be re-issued together with a new annotation for Certificates to signal for which 'revision' the Certificate was last successfully issued. With this, CA issuer could store something like CA cert fingerprint on its status and when it detects that a new CA cert has been stored in the Secret, it could bump the revision. We would add a new check to the trigger controller to trigger new issuance if the revision annotation does not match the revision field on the Certificate. That would allow us to avoid adding any issuer-specific logic to certificates controllers and any issuer, including external issuers, could implement their own logic for what would be considered a new 'revision' for them.
I've not thought about this in detail, but perhaps this approach would work. The question of whether something like this can be safely switched on in GA for all CA issuers still stands (i.e whether everyone wants this feature). Perhaps the CA issuer could have some new fields to allow users to specify if the revision needs bumping or not (perhaps too hacky). In theory, the issuer could have all kinds of logic, i.e to allow users to specify how long before expiry of CA cert all certs need to be renewed, to add some skew to avoid thundering herd of renewals etc, but that would be lots of extra complexity.
I'd be interested to see someone explore this approach in a design doc.

The third alternative would be to have some external component that applies Issuing condition when it detects that the CA certificate has been updated. That would need to either somehow store the fingerprint of the old certificate or have other means to distinguish between Secret updates because the CA certificate changed from other updates (i.e user added a new label).
The benefit of this is that the approach could be tested out and refined before adding the code in-tree.

@n9
Copy link

n9 commented Apr 26, 2023

Is it OK that CA issues certificates expiring after CA expiration? Does fixing #5864 will partially help to avoid this issue?

mnitchev added a commit to giantswarm/prometheus-rules that referenced this issue May 15, 2023
The reason for this is a bug in cert manager. The Certificates we have
contain the CA that they've been signed with. When a CA has been
renewed, the Certificates that use it will not be updated until they
themselves are renewed. This causes an issue because the Certificate (in
KIAM's case) will renew a few hours after the CA. This is enough time
for the CA in the kiam certificate to expire and cause a flurry of
alerts.
This has mainly been observed with KIAM, but we decided to change all
alerts.

See cert-manager/cert-manager#5851
mnitchev added a commit to giantswarm/prometheus-rules that referenced this issue May 15, 2023
* Change certificates expiration from 14 to 13 days

The reason for this is a bug in cert manager. The Certificates we have
contain the CA that they've been signed with. When a CA has been
renewed, the Certificates that use it will not be updated until they
themselves are renewed. This causes an issue because the Certificate (in
KIAM's case) will renew a few hours after the CA. This is enough time
for the CA in the kiam certificate to expire and cause a flurry of
alerts.
This has mainly been observed with KIAM, but we decided to change all
alerts.

See cert-manager/cert-manager#5851

* Update CHANGELOG
@Jamstah
Copy link

Jamstah commented May 18, 2023

I feel like the trust manager is the solution to this problem - updating just the ca.crt in previously issued client secrets would mean the chain is broken in that secret. Reissuing all the certs could take a long time and cause a thundering herd style scenario. You really need a way to distribute both the old CA cert and the new CA cert until such time as all certs have been refreshed with the new CA.

@jetstack-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 16, 2023
@n9
Copy link

n9 commented Aug 17, 2023

/remove-lifecycle stale

@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2023
@n9
Copy link

n9 commented Aug 17, 2023

The advantage of not using trust-manager is that one can simply use csi driver without messing with additional mounts.

@SgtCoDFish
Copy link
Member

SgtCoDFish commented Aug 17, 2023

The advantage of not using trust-manager is that one can simply use csi driver without messing with additional mounts.

I understand that the UX of that is better, but it's really not safe to use ca.crt for trust purposes.

When a root X is rotated, you need to trust both the old and new roots for a period, until everything has been updated to use the new chain. It's not possible to do that with ca.crt. If you can't trust both at once, it'll essentially guarantee that something breaks when you rotate.

@n9
Copy link

n9 commented Aug 17, 2023

@SgtCoDFish

It's not possible to do that with ca.crt.

Why it is not possible? (What would need to change in cert-manager?)

@SgtCoDFish
Copy link
Member

Why it is not possible? (What would need to change in cert-manager?)

cert-manager would need to track every existing cert that's still valid and that could've been trusted, and then put all those certs in ca.crt - like an inventory of past valid CA certificates.

But it can't know which of those past certs should actually be trusted - you might have manually rotated one because it was exposed for example - so we can't really just track everything and then throw it in ca.crt!

trust-manager is a little bit more manual, because you'd have to track those roots explicitly and add them as sources in trust-manager - but it's safe in part because you have complete control. cert-manager can only guess!

@n9
Copy link

n9 commented Aug 17, 2023

@SgtCoDFish

But it can't know which of those past certs should actually be trusted - you might have manually rotated one because it was exposed for example

  • All valid certificates for a private key that signed the current tls.crt should be trusted. If a private key of the CA was exposed, so you manually rotated the certificate, the old private key will not match the new signed tls.crt, it will only match the new private key of the new CA.

Or am I wrong? (I do not understand why "cert-manager can only guess"?)

@SgtCoDFish
Copy link
Member

I think of it like this:

Imagine we have two different services, which we'll call X and Y. Assume X and Y need to talk to each other using TLS and both use ca.crt for trust purposes.

They both get their certs from an issuer we'll call A. Since they both are issued from A, they both trust A since that's what's in ca.crt.

Imagine that A is about to expire so we rotate it to B. We rotate X's certificate to be issued from B, but that means it no longer trusts Y's certificate which is still issued from A. Also, Y doesn't trust B until it has been rotated. This is the problem we have today - there's always downtime since we can't guarantee that both X and Y have their certs reissued immediately.

We could change it so that the issuer keeps track of all certs it uses, and puts all of them in ca.crt. Now we rotate X's certificate and it trusts Y's cert, but we have a problem because Y still doesn't know about B until its cert is rotated at least once.

Instead, we could change it so that we "pre-issue" B, so it appears in ca.crt but doesn't get used until we choose to use it. So we do that, then issue a new cert for X and Y using A, so their ca.crt now contains both A and B. Then we re-issue X's cert using B, and Y will trust it. No downtime, great. At this point though I think we've gone to more effort than it would've been to just use trust-manager, not to mention the many other benefits we'd get from trust-manager in this situation.

Going further, if we then realise we accidentally uploaded the private key for B on github somehow, we need to distrust B and rotate again to C.

So we need to add some way for cert-manager to know that ca.crt should contain A and C but not B... at this point, we've reinvented trust-manager but worse and harder to use, and this ca.crt logic needs to be re-implemented in an issuer-specific way for every issuer type.

Does that make sense? Sorry it's so long!

@n9
Copy link

n9 commented Aug 18, 2023

@SgtCoDFish First, thank you for your time you spent discussing the solutions.

... we've reinvented trust-manager but worse and harder to use ...

I think trust-manager does not need to be reinvented, but rather optionally integrated with cert-manager so that its bundle can also be used via CSI driver.

... and this ca.crt logic needs to be re-implemented in an issuer-specific way for every issuer type.

What would have to be done to make it possible to override ca.crt globally?

@SgtCoDFish
Copy link
Member

I think trust-manager does not need to be reinvented, but rather optionally integrated with cert-manager so that its bundle can also be used via CSI driver

I'm not sure I follow; could you give me an example? If a user is able to mount via CSI driver, why would that user not be able to mount a trust-manager-generated ConfigMap?

What would have to be done to make it possible to override ca.crt globally?

To use ca.crt safely, every issuer has to provide the CA which should be trusted for its chain (which isn't necc. simple), plus a history of past CAs which were used and which are still valid. We'd also at minimum need the ability to filter certain CAs out of whatever is returned.

@n9
Copy link

n9 commented Aug 23, 2023

@SgtCoDFish

If a user is able to mount via CSI driver, why would that user not be able to mount a trust-manager-generated ConfigMap?

It is not about impossibility, it is about UX. (As you mentioned earlier: "I understand that the UX of that is better...")

To use ca.crt safely, every issuer has to provide the CA which should be trusted for its chain (which isn't necc. simple), plus a history of past CAs which were used and which are still valid. We'd also at minimum need the ability to filter certain CAs out of whatever is returned.

From what you said above: "At this point though I think we've gone to more effort than it would've been to just use trust-manager, not to mention the many other benefits we'd get from trust-manager in this situation."

I assume that this complexity is / will be solved by the trust-manager.

I'm not sure I follow; could you give me an example?

Sure. Let's have a K8S cluster. In this cluster we will have two issuers:

  • Let's Encrypt for incoming external HTTPS traffic; CA trust is solved by default root CA certs -- no need for trust manager
  • self-signed issuer for mTLS of internal traffic (self-signed issuer will issue and rotate certificates) -- a naive approach if one use ca.crt provided by csi-driver will not work (as discussed above); a solution is to have an option to configure cert-manager to override the ca.crt with the ca bundle provided by trust-manager.

@kernle32dll
Copy link

kernle32dll commented Sep 7, 2023

As someone who just encountered this problem myself, I am still unable to grasp how (or where) using trust-manager would solve this issue. I am trying to sum up the discussion so far, to try to make sense of what has been written so far (also - rubber duck mode).

There are two problems at hand (note I use the names from the initial post for clarity):

  1. Given that a certificate (selfsigned-client) can outlive its CA (selfsigned-ca), we end up with a ca.crt in the certificate (selfsigned-client), which is expired, and thus can't be used at all.
  2. Directly using the CA certificate via selfsigned-ca is not possible either, since when the CA key rotates, the CA certificate gets updated, but that new certificate can't be used to verify the existing client certificate (selfsigned-client), as that has been created with the "old" key.

A tempting solution is to use both the CA certificate directly via selfsigned-ca, and ca.crt from selfsigned-client. This falls flat however, since we still have problem 1, as the certificate selfsigned-client is not re-generated when the CA expires (thus we end up with an expired ca.crt and a CA certificate that can't verify the client certificate - double fail!).

So from my point of view, the core issue revolves around the question "when does cert-manager need to re-generate selfsigned-client"? I would argue "somewhen before the old CA expires, and after the new one has been generated (rotation). So ideally the ca.crt in selfsigned-client is kept consistent with the certificate it is made for, and when the certificate gets updated, the ca.crt is replaced with the new one, so the package is kept consistent.
However, I can see that this is quite difficult to realize, as there are numerous "distributed-system" and timing things to consider.

@jetstack-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2023
@n9
Copy link

n9 commented Dec 7, 2023

/remove-lifecycle stale

@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2023
@jetstack-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2024
@n9
Copy link

n9 commented Mar 6, 2024

/remove-lifecycle stale

@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2024
@cert-manager-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@cert-manager-prow cert-manager-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2024
@cert-manager-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale

@cert-manager-prow cert-manager-prow bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2024
@jlian
Copy link

jlian commented Jul 26, 2024

I feel like the trust manager is the solution to this problem - updating just the ca.crt in previously issued client secrets would mean the chain is broken in that secret. Reissuing all the certs could take a long time and cause a thundering herd style scenario. You really need a way to distribute both the old CA cert and the new CA cert until such time as all certs have been refreshed with the new CA.

Just gave this a try and it seems like trust-manager does help with this.

Here's the setup I tried

---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: selfsigned-ca-short
  namespace: my-trust-namespace
spec:
  isCA: true
  commonName: selfsigned-ca-short
  secretName: ca-short
  duration: 1h
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: ca-issuer-short
spec:
  ca:
    secretName: ca-short
---
apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: bundle
spec:
  sources:
  - secret:
      name: ca-short
      key: tls.crt
  target:
    configMap:
      key: ca.crt

I first wanted to test the "month 8 case" as mentioned in this comment from a related issue: #2478 (comment)

For my test case since the CA expires each hour, the valid window is from the 40th to 60th minute. I also had a MQTT broker service running on an issued leaf cert from this CA with expiry of 30 days.

Around the 40th minute, my observations:

  1. cert-manager successfully renews the CA and trust-manager picks it up and distributes the new trust bundle correctly.

  2. The leaf certificate isn't renewed, with Kubernetes reporting that its last transition time was when it was issued for the first time. Given the OP of the issue, this is expected.

  3. I create an MQTT client pod with mosquitto client and mounted the renewed trust bundle as a file then passed it in via --ca-file. I try sending a message to the MQTT broker, and it works!

    $ mosquitto_pub -h my-broker -p 8884 -m "hello" -t "source" -d --cafile /var/run/certs/ca.crt -D CONNECT authentication-method 'K8S-SAT' -D CONNECT authentication-data $(cat /var/run/secrets/tokens/mq-sat)
    Client null sending CONNECT
    Client $server-generated/393a34cb-f8a3-4b69-9a9f-4bc666176da9 received CONNACK (0)
    Client $server-generated/393a34cb-f8a3-4b69-9a9f-4bc666176da9 sending PUBLISH (d0, q0, r0, m1, 'source', ... (5 bytes))
    Client $server-generated/393a34cb-f8a3-4b69-9a9f-4bc666176da9 sending DISCONNECT
  4. I use openssl and can verify the leaf cert chain validity with both the old CA and new CA

    $ openssl verify -verbose -CAfile new-ca.crt broker-leaf.crt
    broker-leaf.crt: OK
    $ openssl verify -verbose -CAfile old-ca.crt broker-leaf.crt
    broker-leaf.crt: OK

I think this works because, by default, cert-manager doesn't rotate the private key when renewing a cert, CA or not. Combine it with what @Jamstah said in this comment #5864 (comment)

...leaf certs are signed by the CA key and not by the CA cert. The same CA key can be used to create a new CA cert with new validity dates that would be able to be used in the chain with the leaf cert and validate it successfully, even though the original CA cert has expired. There's a good description here: https://serverfault.com/questions/878919/what-happens-to-code-sign-certificates-when-when-root-ca-expires

So in my case since the private key stayed the same, the leaf cert issued from the old CA can be cryptographically chained to the new CA even though the public key is different. I think this also means that as long as the client gets the newest trust bundle, the leaf cert can be used even though it way outlives the original CA (30 days vs 1 hour).

Lastly, after 60 minutes, the original CA expired, but I can still send messages and pass openssl verification as long as I use the renewed trust bundle. If I use the old trust bundle, I get an error from mosquitto saying OpenSSL Error[0]: error:0A000086:SSL routines::certificate verify failed and openssl verify fails with error 10 at 1 depth lookup: certificate has expired / error broker-leaf.crt: verification failed. Shows how important trust manager is.

@cert-manager-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close

@cert-manager-prow
Copy link
Contributor

@cert-manager-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ebuildy
Copy link

ebuildy commented Sep 24, 2024

/reopen

@cert-manager-prow
Copy link
Contributor

@ebuildy: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@NoodlesWang2024
Copy link

I encountered the same issue and resolved it by extending the duration of the self-signed CA, ensuring that the renewBefore period is longer than the duration of the issued certificate. This guarantees that the signed certificate always has a valid (non-expired) CA certificate.

For example:

  • Self-signed CA: duration of 6 months, with renewBefore set to 3 months
  • Issued Certificate: duration of 3 months, with renewBefore set to 1 day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests