Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix renewal of user provided CAs #6180

Merged
merged 4 commits into from
Jan 25, 2022
Merged

Conversation

ppatierno
Copy link
Member

@ppatierno ppatierno commented Jan 13, 2022

Type of change

  • Bugfix

Description

This PR fixes #5466.
Today, when a user provides his own CA certificate (cluster or client) but then he tries to follow the renew process as we documented, the procedure doesn't work at all.
The current code doesn't have logic for the CO to detect that a new CA certificate was provided by the user (by following the documentation and having a Secret with both the new CA certificate together with the old one CA-).
With this PR, this first detection issue is resolved by adding the thumbprint of the cluster CA certificate, as an annotation, on all the Secret(s) that contains server certificates signed by that CA certificate. For example, the <cluster>-zookeeper-nodes, <cluster>-kafka-brokers and the same for the CO and EO (TO and UO) related Secret(s).

annotations:
    clients-ca-thumbprint: hCyUdeqWt6gXDUKYdAtaFYDiPoZlWAwCjW7liqtWkps=

Just for the Kafka brokers related Secret(s) there is also a corresponding annotation with the thumbprint of the clients CA certificate.

annotations:
    clients-ca-thumbprint: hCyUdeqWt6gXDUKYdAtaFYDiPoZlWAwCjW7liqtWkps=
    cluster-ca-thumbprint: DoNswf/bRfs6nF1IVx7PhbOVXKvJvjv/wOxuQY8eoTM=

During the periodic reconcile, the CO is able to detect that the thumbprint of the provided CA certificate is different from the one stored in such Secret(s).
When the change is detected, it just sets the renewal type as REPLACE_KEY, starting the same flow which runs when the CA certificates are handled by Strimzi itself (of course a part the CA certificate generation that is skipped).
So it runs the right pods rolls to first trust the new CA certificate and then generating the new server certificates for the different components (CO itself, Kafka, ZooKeeper, TO, UO).

In order to do so there are a couple of things to take into account that needs the documentation to be updated because the user has to follow them in order to renew the CA certificate properly:

  1. The CN (Common Name) of the current CA certificate and the renewed one have to be different. This is because when they will be both trusted, the openssl tool used by stunnel in the tls-sidecar (in the TO) is not able to use both of them to trust the ZooKeeper server certificates but just one of them, so the verification would fail. When the Strimzi auto-creation of CA certificate is used, this is exactly what happens; a new CA is created with a suffix of v1 (compared to the previous v0) in the CN to make them different, so manual renewal should follow same path.
  2. The Secret containing the CA certificate has to have the strimzi.io/ca-cert-generation annotation as well as the one containing the key has to have the strimzi.io/ca-key-generation. This generation should start from 0 and increased on each renewal; even this time this is exactly how the auto-renewal process works in Strimzi doing that for you.

So the above steps need to be documented to make "handmade" renewal the same as the "auto" renewal from Strimzi operator.

Fixing this way, I made a couple of assumptions:

  • when renewing the CA certificate, the user will always create a new key (as we describe in the documentation) so the reason why the fix inject into the flow with a REPLACE_KEY renewal type.
  • the CA- is not deleted by the CO automatically when it expires (as it happens for Strimzi generated CA certificate). It's the user to delete it from the Secret.

Documentation will come with a different PR.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Make sure all tests pass
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md

@ppatierno ppatierno added this to the 0.28.0 milestone Jan 13, 2022
@scholzj
Copy link
Member

scholzj commented Jan 14, 2022

The Secret containing the CA certificate has to have the strimzi.io/ca-cert-generation annotation as well as the one containing the key has to have the strimzi.io/ca-key-generation. This generation should start from 0 and increased on each renewal; even this time this is exactly how the auto-renewal process works in Strimzi doing that for you.

So, how does this deal with existing clusters which will not have the generation set? You can easily tell new users to set it. But we need to make sure that when an existing user wich custom CA secret without the generation upgrades, it doesn't fall all over.

@scholzj
Copy link
Member

scholzj commented Jan 14, 2022

Just for the Kafka brokers related Secret(s) there is also a corresponding annotation with the thumbprint of the clients CA certificate.

This seems a bit weird ... since the secret has nothing to do with the Clients CA. Will it stay in sync? Because the secret will not change when the Clients CA is renewed.

@ppatierno
Copy link
Member Author

This seems a bit weird ... since the secret has nothing to do with the Clients CA. Will it stay in sync? Because the secret will not change when the Clients CA is renewed.

But the thumbprint is update; the flow follows the Secret creation as usual by creating the same one but with a different thumbprint. I just added that to the usual build Secret creation in this case.

@ppatierno ppatierno marked this pull request as ready for review January 17, 2022 14:47
@ppatierno ppatierno force-pushed the fix-user-ca-renewal branch 3 times, most recently from 9889067 to 04b20dc Compare January 18, 2022 14:50
Comment on lines 408 to 412
/*
if (isCaCertThumbprintChanged(secret)) {
reasons.add("new certificate generated");
}
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should be removed?

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit wich I think you should address. LGTM otherwise.

@scholzj
Copy link
Member

scholzj commented Jan 21, 2022

/azp run regression

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@see-quick see-quick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors from Azure are not related to this PR and should be fixed here #6245

Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the companion doc PR (#6240) made me wonder whether we should really be using an extra annotation for the thumbprint when we already have the generation annotation. I'd like to understand if the generation on its own is enough to track the changing dependency (and if not why not).

@ppatierno
Copy link
Member Author

@tombentley it's a good point ... thinking more about it, the generation annotation is on the CA certificate Secret only so when it's replaced by a new CA with a newer generation we lost that information and the CO cannot infer that CA is changed. The thumbprint annotation is on the Secrets (ZK, K, EO, CO, ...) containing server certificates signed with the CA certificate so that on reconcile we can compare the new CA thumbprint with the one on those Secrets. Using the generation could work but not out of box, we would need to store it on the Secrets (instead of the thumbprint) as well to be able to make a comparison and detect the CA certificate update.

@ppatierno
Copy link
Member Author

@tombentley @scholzj I changed the code by saving the strimzi.io/cluster-ca-cert-generation and strimzi.io/clients-ca-cert-generation (only for Kafka) on Secrets instead of the thumbprint. I run some tests and of course CO is able to detect change because the user increments the generation on renewal. I don't see any drawbacks.

Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ppatierno

@ppatierno
Copy link
Member Author

/azp run regression

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Paolo Patierno <[email protected]>

Fixed unit test

Signed-off-by: Paolo Patierno <[email protected]>

Fixed NPE when no thumbprint annotation is on Secret

Signed-off-by: Paolo Patierno <[email protected]>

Factored out a common method to check CA cert thumbprint is changed
Defined a method to return the annotation for CA cert thumbprint

Signed-off-by: Paolo Patierno <[email protected]>

Addressed different comments

Signed-off-by: Paolo Patierno <[email protected]>

Changed printing CA certificate thumbprint to DEBUG log level

Signed-off-by: Paolo Patierno <[email protected]>
@ppatierno ppatierno force-pushed the fix-user-ca-renewal branch from eba09a9 to dd9e48b Compare January 25, 2022 08:39
@ppatierno
Copy link
Member Author

/azp run regression

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppatierno ppatierno merged commit 6b20f3e into strimzi:main Jan 25, 2022
@ppatierno ppatierno deleted the fix-user-ca-renewal branch January 25, 2022 15:59
@ppatierno ppatierno changed the title Added fix for renewal of user provided CAs Fix renewal of user provided CAs Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The renewal process with custom CA does not work
4 participants