Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a file watcher for Azure client cert auth #5254

Closed

Conversation

bryan-cox
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:
Add a file watcher, utilizing fsnotify, to monitor the Azure client certificate authentication. If the file changes, the pod will need to be restarted since Azure SDK doesn't support hotloading a new certificate yet.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
This is a follow up to #5200

  • cherry-pick candidate

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 11, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jont828 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 11, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @bryan-cox. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 11, 2024
@bryan-cox bryan-cox force-pushed the aro-hcp-cert-follow-up branch from e5a9fa2 to b7649d5 Compare November 11, 2024 17:19
@bryan-cox bryan-cox marked this pull request as ready for review November 11, 2024 17:19
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2024
@mboersma
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 11, 2024
@bryan-cox
Copy link
Contributor Author

/retest

@bryan-cox bryan-cox force-pushed the aro-hcp-cert-follow-up branch 3 times, most recently from 0352e27 to e956714 Compare November 12, 2024 12:25
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-test

@bryan-cox bryan-cox force-pushed the aro-hcp-cert-follow-up branch from e956714 to 0b3bf20 Compare November 12, 2024 20:10
Add a file watcher, utilizing fsnotify, to monitor the Azure client
certificate authentication. If the file changes, the pod will need to be
 restarted since Azure SDK doesn't support hotloading a new certificate
 yet.

Signed-off-by: Bryan Cox <[email protected]>
@bryan-cox bryan-cox force-pushed the aro-hcp-cert-follow-up branch from 0b3bf20 to 97352d2 Compare November 12, 2024 20:17
Copy link

codecov bot commented Nov 13, 2024

Codecov Report

Attention: Patch coverage is 54.34783% with 21 lines in your changes missing coverage. Please review.

Project coverage is 52.78%. Comparing base (ec18b58) to head (97352d2).
Report is 33 commits behind head on main.

Files with missing lines Patch % Lines
util/filewatcher/filewatcher.go 57.14% 15 Missing and 3 partials ⚠️
azure/scope/identity.go 25.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5254   +/-   ##
=======================================
  Coverage   52.78%   52.78%           
=======================================
  Files         270      271    +1     
  Lines       29057    29103   +46     
=======================================
+ Hits        15337    15362   +25     
- Misses      12928    12945   +17     
- Partials      792      796    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

case event, ok := <-watcher.Events:
if ok && (event.Has(fsnotify.Write) || event.Has(fsnotify.Chmod) || event.Has(fsnotify.Remove)) {
klog.Infof("file, %s, was modified, exiting...", event.Name)
os.Exit(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some more graceful way we can handle this? I don't remember off the top of my head what kinds of cleanup things we might be doing now in main (or will need to do in the future), but I'd prefer not to have to worry about that at all 😄.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to alternative suggestions. This is just the pattern we were using in a different project for this specific scenario with Azure SDK not being able to hotload the new cert and requiring the pod to restart in order to use the new cert.

Maybe I'm missing it but I don't see any cleanup things in the main.go. Is that where you were referencing or was it someplace else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure SDK doesn't support hotloading a new certificate yet

What exactly does this mean? CAPZ orchestrates reading the cert from disk and using that to create a TokenCredential, and it's already doing that once in every reconciliation. Do successive calls to azidentity.NewClientCertificateCredential with different certificates not produce working creds?

Copy link
Contributor Author

@bryan-cox bryan-cox Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojnhuh sorry for the delay, I was doing some testing related this I wanted to check first.

So, if we are grabbing the certificate and re-authenticating with Azure, we shouldn't need this filewatcher to reboot the pod. Like:

parsedCertificate, key, err := azidentity.ParseCertificates(certsContent, nil)
	if err != nil {
		return err
	}

	options := &azidentity.ClientCertificateCredentialOptions{
		SendCertificateChain: true,
	}
	creds, err := azidentity.NewClientCertificateCredential(tenantID, options, ... etc.)

However, with #5211 #5283, when something like Secrets Store CSI updates the certificate in the CAPZ pod, that should invalidate the cache right? The invalidated cache would then trigger a re-authentication with Azure with a new certificate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what should happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could validate though that would be great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll pull your PR tomorrow and make an image out of it and give it a go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr - I'm going to close this PR. I do not think we need it given the caching PRs mentioned previously in this thread.

I created an image off the latest in CAPZ to use in a deployment with a sidecar container to check to see CAPZ didn't error out when the certificates rotated. In my project, I just modified the deployment of CAPZ and added a sidecar container and a shared volume between the two containers. I was able to exec into that pod, see the certificate, manually rotate the certificate in key vault and disable the old certificate, and see the new certificate in the sidecar container.

The CAPZ pod never rebooted, there were no errors in the log, and I was able to successfully scale up my nodepool using the new certificate.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bryan-cox
Copy link
Contributor Author

/close

See #5254 (comment) for more details on why.

@k8s-ci-robot
Copy link
Contributor

@bryan-cox: Closed this PR.

In response to this:

/close

See #5254 (comment) for more details on why.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants