Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

security: clear client cert expiration cache on SIGHUP #110726

Merged
merged 2 commits into from
Sep 20, 2023

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Sep 15, 2023

Fixes #110493.
Epic: CRDB-28893

Release note (security update): The SIGHUP signal will now clear the
cached expiration times for client certificates that are reported by
the security.certificate.expiration.client metric. (The SIGHUP signal also
will still cause the server to reload certificates from the filesystem, as
it did before.)

@blathers-crl
Copy link

blathers-crl bot commented Sep 15, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/security/certificate_manager.go line 179 at r2 (raw file):

					log.StructuredEvent(ctx, &eventpb.CertsReload{Success: true})
				}
				cm.clientCertExpirationCache.Clear()

I'd recommend pulling this up, above the call to LoadCertificates, such that the structured logging message is still guaranteed to be emitted after the reload (and now, flush) completes.

@knz
Copy link
Contributor

knz commented Sep 15, 2023

this would also benefit from a test, you can see TestRotateCerts for an example.

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the pointer to the test!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/security/certificate_manager.go line 179 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I'd recommend pulling this up, above the call to LoadCertificates, such that the structured logging message is still guaranteed to be emitted after the reload (and now, flush) completes.

good call

@rafiss rafiss added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Sep 16, 2023
@rafiss rafiss marked this pull request as ready for review September 16, 2023 07:36
@rafiss rafiss requested review from a team as code owners September 16, 2023 07:36
@rafiss rafiss requested review from mgartner and removed request for a team September 16, 2023 07:36
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me. Let's however check with @lancel66 whether the customer is OK with a flush of all the metrics at once (an alternative design, not considered here, was to provide a mechanism to flush the metric for just 1 SQL user).

Reviewed 1 of 2 files at r3, 4 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rafiss)


pkg/security/cert_expiry_cache.go line 91 at r4 (raw file):

		OnEvictedEntry: func(entry *cache.Entry) {
			gauge := entry.Value.(*aggmetric.Gauge)
			gauge.Update(0)

Why was this needed?

This will help with making a change to clear the cache on the SIGHUP
signal.

Release note: None
Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @mgartner)


pkg/security/cert_expiry_cache.go line 91 at r4 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Why was this needed?

I noticed this comment on Unlink

// Unlink unlinks this child from the parent, i.e. the parent will no longer// track this child (i.e. won't generate labels for it, etc). However, the child  
// will continue to be functional and reference the parent, meaning updates to  
// it will be reflected in the aggregate stored in the parent.

so it seems more correct to reset the child gauge when unlinking.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rafiss)


pkg/security/cert_expiry_cache.go line 91 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

I noticed this comment on Unlink

// Unlink unlinks this child from the parent, i.e. the parent will no longer// track this child (i.e. won't generate labels for it, etc). However, the child  
// will continue to be functional and reference the parent, meaning updates to  
// it will be reflected in the aggregate stored in the parent.

so it seems more correct to reset the child gauge when unlinking.

probably worth a comment.

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @mgartner)


pkg/security/cert_expiry_cache.go line 91 at r4 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

probably worth a comment.

i've added one now

Copy link
Contributor

@cameronnunez cameronnunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @mgartner, and @rafiss)


pkg/security/cert_expiry_cache.go line 91 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i've added one now

the aggregate value stored in the parent is meaningless, so that's why it wasn't being reset when unlinked. but this change is harmless so up to you


pkg/security/certificate_manager.go line 190 at r5 (raw file):

}

// MaybeUpsertClientExpiration updates the updates or inserts the expiration

nit: "updates" instead of "updates the updates"

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez, @knz, and @mgartner)


pkg/security/cert_expiry_cache.go line 91 at r4 (raw file):

Previously, cameronnunez (Cameron Nunez) wrote…

the aggregate value stored in the parent is meaningless, so that's why it wasn't being reset when unlinked. but this change is harmless so up to you

i'll keep it since it's a good example to set for other usages


pkg/security/certificate_manager.go line 190 at r5 (raw file):

Previously, cameronnunez (Cameron Nunez) wrote…

nit: "updates" instead of "updates the updates"

fixed, thanks!

Release note (security update): The SIGHUP signal will now clear the
cached expiration times for client certificates that are reported by
the security.certificate.expiration.client metric.
@rafiss
Copy link
Collaborator Author

rafiss commented Sep 20, 2023

tftrs!

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 20, 2023

Build succeeded:

@craig craig bot merged commit d2cf1f8 into cockroachdb:master Sep 20, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 20, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 40ca81d to blathers/backport-release-23.1-110726: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@rafiss rafiss deleted the sig-cert-cache branch September 20, 2023 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to flush security_certificate_expiration_client metric cache without restarting nodes
4 participants