Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tls): use fixed-length cert CommonNames #968

Merged
merged 6 commits into from
Dec 20, 2024

Conversation

andrewazores
Copy link
Member

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits: git commit -S -m "YOUR_COMMIT_MESSAGE"

Fixes: #964

Description of the change:

This change adds allows the users to provide...

Motivation for the change:

This change is helpful because users may want to...

How to manually test:

  1. Insert steps here...
  2. ...

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Nov 5, 2024

/build_test completed successfully ✅.
View Actions Run.

@andrewazores andrewazores marked this pull request as ready for review November 5, 2024 19:27
@andrewazores andrewazores requested a review from a team November 5, 2024 19:27
Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty straightforward, I just want to check first if this breaks anything when upgrading the operator. I'm worried something like #896 will happen.

Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm getting a 503 error after upgrading using the steps in #897. I think the easiest fix would be expand the check from #897 to compare the entire certificate spec and not just the secret name. This doesn't solve the certificate expiry issue (#119), but at least should get us back to the current state.

@andrewazores
Copy link
Member Author

The check from #897 looks like it's just checking the SecretName referenced by the CA Issuer, and the Issuer's Spec only really contains the SecretName.

SecretName: NewCryostatCACert(gvk, cr).Spec.SecretName,

SecretName: common.ClusterUniqueNameWithPrefix(gvk, "ca", cr.Name, cr.InstallNamespace),

currentSecret := current.SecretName

If I'm understanding the problem correctly, it's that the Certificates' Specs have changed (because the CommonName field has changed) - the Issuers and theirs Specs actually haven't changed. But I suppose on upgrade, a new CA gets created again and the old one needs to be invalidated and all its Certificates deleted?

@ebaron
Copy link
Member

ebaron commented Nov 6, 2024

The check from #897 looks like it's just checking the SecretName referenced by the CA Issuer, and the Issuer's Spec only really contains the SecretName.

SecretName: NewCryostatCACert(gvk, cr).Spec.SecretName,

SecretName: common.ClusterUniqueNameWithPrefix(gvk, "ca", cr.Name, cr.InstallNamespace),

currentSecret := current.SecretName

If I'm understanding the problem correctly, it's that the Certificates' Specs have changed (because the CommonName field has changed) - the Issuers and theirs Specs actually haven't changed. But I suppose on upgrade, a new CA gets created again and the old one needs to be invalidated and all its Certificates deleted?

Right that's what needs to happen on upgrade. You're right, detecting this will be a little more complicated than just modifying the check done in #897.

One possibility is treating the certificate spec like an immutable field, and calling the deleteCertChain method when attempting to modify it. Something like:

// Immutable, only updated when the deployment is created
if deploy.CreationTimestamp.IsZero() {
deploy.Spec.Selector = deployCopy.Spec.Selector
} else if !cmp.Equal(deploy.Spec.Selector, deployCopy.Spec.Selector) {
// Return error so deployment can be recreated
return errSelectorModified
}

@andrewazores
Copy link
Member Author

I found that the cmp.Equal() check on the whole cert Spec didn't behave quite how I expected, so in that latest commit I'm checking specific properties for equality. This looks like it works as expected, including in the upgrade case as outlined in #897's test steps, so once I figure out what exactly was breaking the deep equality check on the Specs then I can clean it up and I think this will be ready.

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Nov 7, 2024

/build_test : At least one test failed ❌.
View Actions Run.

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Nov 7, 2024

/build_test completed successfully ✅.
View Actions Run.

@andrewazores
Copy link
Member Author

I'm still not really sure what's causing cmp.Equal() to return false unexpectedly.

diff --git a/internal/controllers/certmanager.go b/internal/controllers/certmanager.go
index 0be2db9..fcf156d 100644
--- a/internal/controllers/certmanager.go
+++ b/internal/controllers/certmanager.go
@@ -410,6 +410,7 @@ func (r *Reconciler) createOrUpdateCertificate(ctx context.Context, cert *certv1
                        }
                }
 
+               fmt.Printf("Cert Name: %s\nDiff:\n%s\n", cert.Name, cmp.Diff(cert.Spec, *specCopy))
                if !cmp.Equal(cert.Spec.CommonName, specCopy.CommonName) &&
                        !cmp.Equal(cert.Spec.DNSNames, specCopy.DNSNames) &&
                        !cmp.Equal(cert.Spec.Duration, specCopy.Duration) &&

I tried running through the #897 upgrade steps again with this log line added so that I could spot whatever difference there is in the specs, but to my surprise:

Cert Name: cryostat-sample-ca
Diff:
  v1.CertificateSpec{
   Subject: nil,
   LiteralSubject: "",
-  CommonName: "ca.cryostat-sample.cert-manager",
+  CommonName: "cryostat-ca-cert-manager",
   Duration: nil,
   RenewBefore: nil,
   ... // 2 identical fields
   URIs: nil,
   EmailAddresses: nil,
-  SecretName: "cryostat-sample-ca",
+  SecretName: "cryostat-ca-d3eeb54a7d5c06da92982118f09f5656cb84f8f9ea567839351b0112a4a83763",
   SecretTemplate: nil,
   Keystores: nil,
   ... // 7 identical fields
  }

Cert Name: cryostat-sample
Diff:
2024-11-21T21:08:27Z INFO controllers.Cryostat Certificate created {"name": "cryostat-sample", "namespace": "test"}
Cert Name: cryostat-sample-reports
Diff:
2024-11-21T21:08:27Z INFO controllers.Cryostat Certificate created {"name": "cryostat-sample-reports", "namespace": "test"}
Cert Name: cryostat-sample-agent-proxy
Diff:

ie. only the CA had a couple of small and expected changes. That doesn't seem to match the cmp.Equal() behaviour I was seeing, where it continuously returned false on every comparison, leading to the reconcile loop never settling.

Any ideas on what I could be missing? Or, is the current property-by-property check OK?

		if !cmp.Equal(cert.Spec.CommonName, specCopy.CommonName) &&
			!cmp.Equal(cert.Spec.DNSNames, specCopy.DNSNames) &&
			!cmp.Equal(cert.Spec.Duration, specCopy.Duration) &&
			!cmp.Equal(cert.Spec.RenewBefore, specCopy.RenewBefore) &&
			!cmp.Equal(cert.Spec.IPAddresses, specCopy.IPAddresses) &&
			!cmp.Equal(cert.Spec.URIs, specCopy.URIs) &&
			!cmp.Equal(cert.Spec.EmailAddresses, specCopy.EmailAddresses) &&
			!cmp.Equal(cert.Spec.SecretName, specCopy.SecretName) &&
			!cmp.Equal(cert.Spec.SecretTemplate, specCopy.SecretTemplate) &&
			!cmp.Equal(cert.Spec.IssuerRef, specCopy.IssuerRef) &&
			!cmp.Equal(cert.Spec.IsCA, specCopy.IsCA) &&
			!cmp.Equal(cert.Spec.Keystores, specCopy.Keystores) &&
			!cmp.Equal(cert.Spec.PrivateKey, specCopy.PrivateKey) &&
			!cmp.Equal(cert.Spec.EncodeUsagesInRequest, specCopy.EncodeUsagesInRequest) &&
			!cmp.Equal(cert.Spec.Usages, specCopy.Usages) {
			return errCertificateModified
		}

internal/controllers/certmanager.go Show resolved Hide resolved
internal/controllers/certmanager.go Outdated Show resolved Hide resolved
@andrewazores
Copy link
Member Author

I noticed one other funny effect from the #897 upgrade steps - the old separate Grafana Route gets left behind. That makes sense since it was an upgrade from 2.4.0 to 4.0.0-dev, and in 2.x we exposed Grafana as a separate Route. In 3.0 it's exposed as a path on the auth proxy with the same Route. I think the old leftover Route is harmless, but also useless.

@tthvo
Copy link
Member

tthvo commented Nov 22, 2024

I noticed one other funny effect from the #897 upgrade steps - the old separate Grafana Route gets left behind. That makes sense since it was an upgrade from 2.4.0 to 4.0.0-dev, and in 2.x we exposed Grafana as a separate Route. In 3.0 it's exposed as a path on the auth proxy with the same Route. I think the old leftover Route is harmless, but also useless.

Oh right, how about we delete the old Grafana route in the reconciler, similar to how we do it for CORS modifications?

// Remove CORS modifications from previous operator versions
return r.deleteCorsAllowedOrigins(ctx, cr)

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we should add another test case in env-test that verifies the reconciler when there is an existing old cert?

@andrewazores
Copy link
Member Author

Maybe, we should add another test case in env-test that verifies the reconciler when there is an existing old cert?

diff --git a/internal/controllers/reconciler_test.go b/internal/controllers/reconciler_test.go
index 612c69f..dce34d5 100644
--- a/internal/controllers/reconciler_test.go
+++ b/internal/controllers/reconciler_test.go
@@ -1795,6 +1795,34 @@ func (c *controllerTest) commonTests() {
 				t.expectCertificates()
 			})
 		})
+		Context("with a modified certificate TLS CommonName", func() {
+			var oldCerts []*certv1.Certificate
+			BeforeEach(func() {
+				oldCerts = []*certv1.Certificate{
+					t.NewCryostatCert(),
+					t.NewReportsCert(),
+					t.NewAgentProxyCert(),
+				}
+				t.objs = append(t.objs, t.NewCryostat().Object, t.OtherCAIssuer())
+				for _, cert := range oldCerts {
+					t.objs = append(t.objs, cert)
+				}
+			})
+			JustBeforeEach(func() {
+				cr := t.getCryostatInstance()
+				for _, cert := range oldCerts {
+					// Make the old certs owned by the Cryostat CR
+					err := controllerutil.SetControllerReference(cr.Object, cert, t.Client.Scheme())
+					Expect(err).ToNot(HaveOccurred())
+					err = t.Client.Update(context.Background(), cert)
+					Expect(err).ToNot(HaveOccurred())
+				}
+				t.reconcileCryostatFully()
+			})
+			It("should recreate certificates", func() {
+				t.expectCertificates()
+			})
+		})
 
 		Context("reconciling a multi-namespace request", func() {
 			targetNamespaces := []string{"multi-test-one", "multi-test-two"}

Is this something like what you're thinking of? The t.expectCertificates() looks like it's already making the right assertions. This test passes both with or without this PR's changes, so it isn't really doing anything new, but I guess that is to be expected - the bug is not really with the controller's own behaviour, but with the CommonName length restriction which isn't present in the test harness (since it isn't using a real cluster/client).

@tthvo
Copy link
Member

tthvo commented Nov 26, 2024

Is this something like what you're thinking of? The t.expectCertificates() looks like it's already making the right assertions. This test passes both with or without this PR's changes, so it isn't really doing anything new, but I guess that is to be expected - the bug is not really with the controller's own behaviour, but with the CommonName length restriction which isn't present in the test harness (since it isn't using a real cluster/client).

Oh yup, I guess what I had in mind was to have a scenario where there are existing/old certificates (i.e. with specs defined from previous operator versions) and let the test expect if these certificates are reconciled correctly. Since the changes in this PR apply to all managed certificates, we do like something below. What do you think?

diff --git a/internal/controllers/reconciler_test.go b/internal/controllers/reconciler_test.go
index 612c69f..0db9d3f 100644
--- a/internal/controllers/reconciler_test.go
+++ b/internal/controllers/reconciler_test.go
@@ -1765,13 +1765,16 @@ func (c *controllerTest) commonTests() {
 				})
 			})
 		})
-		Context("with a modified CA certificate", func() {
+
+		Context("with modified certificates", func() {
 			var oldCerts []*certv1.Certificate
 			BeforeEach(func() {
 				t.objs = append(t.objs, t.NewCryostat().Object, t.OtherCAIssuer())
 				oldCerts = []*certv1.Certificate{
-					t.NewCryostatCert(),
-					t.NewReportsCert(),
+					t.OtherCACert(),
+					t.OtherAgentProxyCert(),
+					t.OtherCryostatCert(),
+					t.OtherReportsCert(),
 				}
 				// Add an annotation for each cert, the test will assert that
 				// the annotation is gone.
diff --git a/internal/test/resources.go b/internal/test/resources.go
index bcad916..b5e89a0 100644
--- a/internal/test/resources.go
+++ b/internal/test/resources.go
@@ -1058,6 +1058,12 @@ func (r *TestResources) NewCryostatCert() *certv1.Certificate {
 	}
 }
 
+func (r *TestResources) OtherCryostatCert() *certv1.Certificate {
+	cert := r.NewCryostatCert()
+	cert.Spec.CommonName = fmt.Sprintf("%s.%s.svc", r.Name, r.Namespace)
+	return cert
+}
+
 func (r *TestResources) NewReportsCert() *certv1.Certificate {
 	return &certv1.Certificate{
 		ObjectMeta: metav1.ObjectMeta{
@@ -1084,6 +1090,12 @@ func (r *TestResources) NewReportsCert() *certv1.Certificate {
 	}
 }
 
+func (r *TestResources) OtherReportsCert() *certv1.Certificate {
+	cert := r.NewReportsCert()
+	cert.Spec.CommonName = fmt.Sprintf("%s-reports.%s.svc", r.Name, r.Namespace)
+	return cert
+}
+
 func (r *TestResources) NewAgentProxyCert() *certv1.Certificate {
 	return &certv1.Certificate{
 		ObjectMeta: metav1.ObjectMeta{
@@ -1110,6 +1122,12 @@ func (r *TestResources) NewAgentProxyCert() *certv1.Certificate {
 	}
 }
 
+func (r *TestResources) OtherAgentProxyCert() *certv1.Certificate {
+	cert := r.NewAgentProxyCert()
+	cert.Spec.CommonName = fmt.Sprintf("%s-agent.%s.svc", r.Name, r.Namespace)
+	return cert
+}
+
 func (r *TestResources) NewCACert() *certv1.Certificate {
 	return &certv1.Certificate{
 		ObjectMeta: metav1.ObjectMeta{
@@ -1127,6 +1145,13 @@ func (r *TestResources) NewCACert() *certv1.Certificate {
 	}
 }
 
+func (r *TestResources) OtherCACert() *certv1.Certificate {
+	cert := r.NewCACert()
+	cert.Spec.CommonName = fmt.Sprintf("ca.%s.cert-manager", r.Name)
+	cert.Spec.SecretName = r.Name + "-ca"
+	return cert
+}
+
 func (r *TestResources) NewAgentCert(namespace string) *certv1.Certificate {
 	name := r.getClusterUniqueNameForAgent(namespace)
 	return &certv1.Certificate{

@andrewazores andrewazores force-pushed the tls-common-name branch 2 times, most recently from 58cd03a to 776b818 Compare December 17, 2024 21:07
@andrewazores
Copy link
Member Author

/build_test

Copy link

/build_test completed successfully ✅.
View Actions Run.

Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested using a bundle upgrade, and everything works smoothly. Thanks @andrewazores!

@andrewazores
Copy link
Member Author

/build_test

Copy link

/build_test completed successfully ✅.
View Actions Run.

@andrewazores andrewazores merged commit 4ce5cca into cryostatio:main Dec 20, 2024
7 checks passed
@andrewazores andrewazores deleted the tls-common-name branch December 20, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] CommonName field in certificates may be too long
3 participants