Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server Plugin UpstreamAuthority cert-manager #2274

Merged

Conversation

JoshVanL
Copy link
Contributor

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality

This PR introduces cert-manager as a server upstream authority plugin. This plugin creates CertificateRequests in the target cluster, configured with the referenced issuer. Once signed by the Issuer it returns the signed certificate and CA.

Description of change

Adds cert-manager as a new UA plugin.
Updates k8s dependencies required when importing cert-manager.
Updates test files and kube controllers with new func signatures from updating dependencies.

Let me know if there is anything missing or would like me to split up the PR.

Which issue this PR fixes

fixes #1796

@JoshVanL JoshVanL force-pushed the server-plugin-upstreamauthority-cert-manager branch from ca82960 to 133adda Compare May 12, 2021 13:41
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this contrib @JoshVanL, we sincerely appreciate it! Very excited to gain support for cert-manager :-D

Sorry for the delay in review - I was out most of last week on vacation. I took a pass today and it looks great, had a few small questions here and there but nothing major.

I'm wondering if we should have an integration test for this. How stable is the CertificateRequest API? Is there anything special we need in terms of role bindings, cert-manager config, etc that we need to make all this work?

Finally, it would be great to hear your feedback on the process of authoring a new plugin. We are putting some attention on that currently and hearing a little about your experience would be helpful.


This plugin will request a signing certificate from cert-manager via a
[CertificateRequest](https://cert-manager.io/docs/concepts/certificaterequest/)
resource. Once the referenced issuer has signed the request,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we lose some text at the end here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have added the rest :)


| Configuration | Description |
| ------------------------- | ----------------------------------------------------------------- |
| kube_config_path | Path to the kubeconfig used to connect to the Kubernetes cluster. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to kube_config_file and have the default be in-cluster config? That will make it consistent with all the other k8s-y plugins

| ------------------------- | ----------------------------------------------------------------- |
| kube_config_path | Path to the kubeconfig used to connect to the Kubernetes cluster. |
| namespace | The namespace to create CertificateRequests for signing. |
| issuer_name | The name of the issuer to reference in CertificateRequests. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because we can request certs from different issuers using the same cert-manager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

go.mod Outdated
k8s.io/kube-aggregator v0.18.2
k8s.io/utils v0.0.0-20201110183641-67b214c5f920
sigs.k8s.io/controller-runtime v0.6.0
k8s.io/api v0.20.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azdagron raised a question in a call last week about this dep bump. Is it needed for cert-manager upstream support? To date, I think our version has been held back to give a better compat guarantee with older k8s clusters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is because of importing the cert-manager repo. After some go mod wrangling, I've struggled to reduce the versions. Happy to instead use an unstructured client to make things easier? This would resolve having to import jetstack/cert-manager.

)

const (
pluginName = "cert-manager"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we normally follow snake case for plugin names ... that said, I know that cert-manager is named cert-manager 😅

I don't know what's worse, inconsistent naming with the other plugins or inconsistent naming with the upstream thing! My initial thought here is to just leave it as is, but thought I'd comment for other reviewers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I went back and forth a few times on this! Happy to change it either way 😄

}
// If no issuer_group given, default to cert-manager.io
if len(config.IssuerGroup) == 0 {
p.log.Warn("configuration has empty issuer_group property, defaulting to 'cert-manager.io'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these warnings are necessary. If the default is worthy of a warning, my preference would be to have a different default or none at all. Is there something specific we're worried about here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will change the default to a debug.

The defaults are the same that are used in cert-manager so made sense to me to port them here from a UX perspective. Happy to remove them completely though, and just error out if they are not defined.


if cmutil.CertificateRequestIsDenied(cr) {
cond := cmutil.GetCertificateRequestCondition(cr, cmapi.CertificateRequestConditionDenied)
return fmt.Errorf("created CertificateRequest %s/%s has been denied: %s:%s", cr.Namespace, cr.Name, cond.Reason, cond.Message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably log this error with the appropriate fields, and then return a more succinct error message to core. That way we'll get structured logging since core won't extract all of this stuff out from the error it receives.

Similar comment for all the fmt.Errorf messages that are templating details into the error message

}

// If the configured issuer did not populate the CA on the request we cannot
// build the upstream roots. We can only error here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Is this something that can happen "normally"? Do we need to document anything about it e.g. our requirements on a certain cert-manager configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is up to the cert-manager Issuer implementation as to whether they support setting the CA or not, and we can't programmatically determine this before as they are just stringy references.

For cert-manager core Issuers,Vault and CA support setting this field, but ACME for example doesn't.

@JoshVanL
Copy link
Contributor Author

JoshVanL commented Jun 3, 2021

Thanks for the review @evan2645 😄

I'm wondering if we should have an integration test for this. How stable is the CertificateRequest API? Is there anything special we need in terms of role bindings, cert-manager config, etc that we need to make all this work?

Sure thing, will work on an integration test next. The CertificateRequest API is v1 now and should be stable from the client point of view. From a role perspective, the client need only "create" permissions for "CertificateRequests" in the defined namespace. The identity of the client to the API server is important though, as this ties into the approval mechanism. I'll add a note about that in the documentation.

Finally, it would be great to hear your feedback on the process of authoring a new plugin. We are putting some attention on that currently and hearing a little about your experience would be helpful.

Sure! I mostly found it fairly simple to know which bits I should be adding in, and didn't have much trouble find things. Knowing what goes into MintX509CAResponse was a bit tricky, though that is likely due to me not knowing enough about the project yet.

@JoshVanL
Copy link
Contributor Author

JoshVanL commented Jun 7, 2021

@evan2645 Had a go at an integration test for cert-manager, let me know what you think 🙂

The go dependency changes are tricky.. Feel that using an unstructured client may be the way to go, wdyt?

Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woot! This is looking great @JoshVanL I think we are almost there! A few more comments from me, couple things I missed on the last pass :( sorry about that.

Sure! I mostly found it fairly simple to know which bits I should be adding in, and didn't have much trouble find things. Knowing what goes into MintX509CAResponse was a bit tricky, though that is likely due to me not knowing enough about the project yet.

Aah yes ok ... I can see now that you are using our pre-1.0 protos. We have a new set of protos which are better documented, I wonder if they would have helped. You can check them out here, curious if that would have made things better for you. @azdagron do we need to do anything special here since we're still in the middle of this plugin subsystem change?

Had a go at an integration test for cert-manager, let me know what you think 🙂

Awesome, thank you for this!! One of the challenges we've had is maintaining integration code for things we don't know a lot about. I'm sure you all have this problem too, if you have any pointers they'd be very much welcome :-D in the meantime, this integration test goes a long way in helping to mitigate the overhead.

I had some failures trying to run it locally, all related to openssl invocations of course 😆 I think maybe there are a couple typos there. Otherwise, the approach of calling mint and examining the issuer and the bundle sounds good to me.

The go dependency changes are tricky.. Feel that using an unstructured client may be the way to go, wdyt?

I'll defer to @azdagron on what he wants to do here. I am not sure what other implications unstructured client may bring

}

// loadConfig parses and defaults incoming configure requests
func (p *Plugin) loadConfig(req *spi.ConfigureRequest) (*Config, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a check here for config.Namespace? The docs suggest it's required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that was an oversight

Comment on lines 147 to 149
cmapi.UsageDigitalSignature,
cmapi.UsageKeyEncipherment,
cmapi.UsageCertSign,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the usages that we normally set when cutting a CA cert:

KeyUsage: x509.KeyUsageDigitalSignature |
x509.KeyUsageCertSign |
x509.KeyUsageCRLSign,

The key encipherment usage caught my eye here. I don't think that digital signature is strictly necessary and probably should be omitted across the board, but since it's already there let's just keep it?

Suggested change
cmapi.UsageDigitalSignature,
cmapi.UsageKeyEncipherment,
cmapi.UsageCertSign,
cmapi.UsageDigitalSignature,
cmapi.UsageCertSign,
cmapi.UsageCRLSign,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I've also added UsageCRLSign

}
// If no issuer_kind given, default to Issuer
if len(config.IssuerKind) == 0 {
p.log.Debug("configuration has empty issuer_kind property, defaulting to 'Issuer'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I missed this before. All log messages have to be formatted as a sentence without a period. I think most of the messages here can conform to the guideline by simply capitalizing the first letter

Suggested change
p.log.Debug("configuration has empty issuer_kind property, defaulting to 'Issuer'")
p.log.Debug("Configuration has empty issuer_kind property, defaulting to 'Issuer'")

log.Info("waiting for certificaterequest to be signed")

// Poll the CertificateRequest until it is signed. If not signed after 300
// polls, error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that there is some kind of periodic clean up that occurs on cert-manager side or elsewhere? I don't see us removing the resource we created, and if we die at this point we'll leak the one we made and create a new one on next boot

Copy link
Contributor Author

@JoshVanL JoshVanL Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't actually.. we should do something here to clean up.

I think what would be best is that we delete successful CertificateRequests in MintX509CA.

We can also annotate created CRs: cert-manager.spiffe.io/trust-domain: <GlobalConfig.trust-domain>. We can either delete these CRs optimistically at the start of MintX509CA or at the end as part of the same cleanup. If MintX509CA always fails, it makes sense to me to delete at the start. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some prior art to this:

cert-manager revision-manager: This deletes CRs based on a max revision history. This requires running a kube controller which wouldn't be appropriate here AFAIK.
istio-csr optionally deletes the CR after it being successful
cni-lib which optimistically deletes stale CRs during issuance: This is similar to the approach described above^ which I think is the best.

Comment on lines 12 to 14
leafURIResult=$(./bin/kubectl exec -n spire $(./bin/kubectl get pod -n spire -o name) -- cat svid.pem | openssl x509 --noout -ext subjectAltName | grep URI | sed 's/^ *//g')
leafIssuerResult=$(./bin/kubectl exec -n spire $(./bin/kubectl get pod -n spire -o name) -- cat svid.pem | openssl x509 --noout -issuer)
caSubjectResult=$(./bin/kubectl exec -n spire $(./bin/kubectl get pod -n spire -o name) -- cat bundle.pem | openssl x509 --noout --subject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags are single dash i.e. -noout, -subject

When I ran it locally, it also complained about -ext. I couldn't find an equivalent flag for this that made sense given what you're trying to do, I normally use -noout -text

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is based on a commit in our main branch that was published prior to our switch to GitHub Actions ... hence, this PR is currently not triggering CI. We require branches be up-to-date prior to merge, so it's not a problem per se, but if you update your branch now you'll get the CI runners going on this PR which may help for debugging this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags are single dash i.e. -noout, -subject

When I ran it locally, it also complained about -ext. I couldn't find an equivalent flag for this that made sense given what you're trying to do, I normally use -noout -text

Thanks, I'll move it over to that. CLIs between OS's is always tricky 😅, sed comes to mind 😆

This PR is based on a commit in our main branch that was published prior to our switch to GitHub Actions ... hence, this PR is currently not triggering CI. We require branches be up-to-date prior to merge, so it's not a problem per se, but if you update your branch now you'll get the CI runners going on this PR which may help for debugging this.

Cheers! Once changing to the unstructured client it should make rebasing a bit easier.

ca_ttl = "12h"
ca_subject {
country = ["GB"]
organization = ["cert-manager.io"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might recommend changing this to something SPIRE specific so that it disambiguates the results when we look at leaf issuer and CA (i.e. we can see clearly that leaf was signed by SPIRE, but the root CA is cert-manager). One failure mode here is that we don't take the Upstream path and instead fall back to SPIRE root for some reason, and this case will be harder to distinguish here if we use cert-manager subject for the SPIRE intermediate.

Copy link
Contributor Author

@JoshVanL JoshVanL Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I'll change that back to the standard values.

@azdagron
Copy link
Member

azdagron commented Jun 7, 2021

@azdagron do we need to do anything special here since we're still in the middle of this plugin subsystem change?

Naw, we can convert this to v1 trivially after it is merged.

I'll defer to @azdagron on what he wants to do here. I am not sure what other implications unstructured client may bring

This isn't the first time folks have wanted to update the k8s client deps. We've generally pushed back when there wasn't a requirement. In this case, the unstructured client seems like a workaround, but it's not clear to me how much work. In the end, latching the client code dependency to our minimum supported version only helps compile time compatibility against the minimum version but says nothing about compatibility with the newest versions. A more robust, comprehensive integration test suite that exercised all of the k8s api interactions with at least the minimum and latest versions would give us better assurances.

All this to say that I'm probably ok if we want to bump this now, unless the unstructured client is a fairly trivial drop-in.

@JoshVanL
Copy link
Contributor Author

JoshVanL commented Jun 8, 2021

@evan2645 @azdagron Thanks both for your reviews!

I'll work on adding the cleanup mechanism described above^ and fixing up the integration openssl commands.

I'm also going to give a go at moving the client to unstructured- this shouldn't be a huge change and resolves the mod issues (also doesn't mean this PR changes the kube controllers elsewhere).

A more robust, comprehensive integration test suite that exercised all of the k8s api interactions with at least the minimum and latest versions would give us better assurances.

The kube compatibility of this integration is effectively tied to the compatibility of the cert-manager installation. Since we use the v1 API which was released in cert-manager v1.0, we have compatibility with kube back till v1.11.

@azdagron
Copy link
Member

azdagron commented Jun 8, 2021

I'm also going to give a go at moving the client to unstructured- this shouldn't be a huge change and resolves the mod issues (also doesn't mean this PR changes the kube controllers elsewhere).

Awesome, I appreciate your willingness! Glad to hear it won't be costly.

The kube compatibility of this integration is effectively tied to the compatibility of the cert-manager installation. Since we use the v1 API which was released in cert-manager v1.0, we have compatibility with kube back till v1.11.

😍 that is really good to hear!

My comment about the integration tests were geared towards SPIRE's k8s integrations in general, not necessarily this particular interaction. I've opened up #2350 to track that separate discussion.

@JoshVanL JoshVanL force-pushed the server-plugin-upstreamauthority-cert-manager branch 2 times, most recently from 142b522 to d2ec5b9 Compare June 9, 2021 18:28
@JoshVanL
Copy link
Contributor Author

JoshVanL commented Jun 9, 2021

Hi @evan2645 @azdagron! After struggling when testing using the unstructured client and the controller-runtime fake client, I have opted to go with adding the Go struct types in an internal package. This seems to work well. I have also added a note that this can be swapped out in future after we create a separate repo with just these types. This is something I want to pick up for the cert-manger v1.5 release.

I have added in a cleanup function, which is called on defer when minting. It will delete any CertificateRequests which the SPIRE server is responsible for (the same namespace with the label cert-manager.spiffe.io/trust-domain=<trust domain>) and are in a terminal state (either failed, denied, or are signed and ready).

I have expanded the unit testing out quite a bit, and now covers the MintX509CA func. The integration test has also been expanded a bit to also check that cleanups are occurring.

JoshVanL added 12 commits June 9, 2021 19:38
Signed-off-by: joshvanl <[email protected]>
considerations and kube_config_file change

Signed-off-by: joshvanl <[email protected]>
clean up stale CertificateRequests

Signed-off-by: joshvanl <[email protected]>
@JoshVanL JoshVanL force-pushed the server-plugin-upstreamauthority-cert-manager branch from d2ec5b9 to 9fb031e Compare June 9, 2021 18:41
Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so sorry it took me so long to get back on this. I think what you have here is fine. If/when that change comes in that moves these types to another repo, even better, but this doesn't sound too bad to maintain. I'm good with the direction this takes.

@evan2645, any other thoughts? are we ready to take this?

Signed-off-by: joshvanl <[email protected]>
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you @JoshVanL!

@azdagron azdagron merged commit 34a4b01 into spiffe:main Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add cert-manager as an UpstreamAuthority
3 participants