-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure system recovers quickly from failures or drift in state #326
Conversation
292abf5
to
05edd2a
Compare
05edd2a
to
8765f43
Compare
- helm upgrades will cause the caBundle to get reset on the mutating webhooks. By "reconciling" the state of the system every second, we ensure the drift in this state has a minimal impact on the uptime of the system. it will now verify that the certificates as well as the CA bundle are "correct" every second and update them if they arent.
8765f43
to
913f97a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I haven't run this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good! I've left some comments about improving the tests, but otherwise, this looks good to me. Let me know what you think!
Co-authored-by: Iryna Shustava <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @ashwin-venkatesh !!
Add PodSecurityPolicies for server-acl-init
Changes proposed in this PR:
helm upgrades will cause the caBundle to get reset on the mutating webhooks. By "reconciling" the state of the system every second, we ensure the drift in this state has a minimal impact on the uptime of the system. it will now verify that the certificates as well as the CA bundle are "correct" every second and update them if they arent.
this will also ensure any edits made to the secret or the webhook configuration, which could lead to downtime in the system, are recovered from within a second.
How I've tested this PR:
How I expect reviewers to test this PR:
kubectl edit
the secret with the certificate and/or the MWC for the webhook certs. The system should recover within a few seconds and the webhook and the webhook configuration should be able to communicate with each other successfully.Image for testing:
ashwinvenkatesh/consul-k8s:webhook-certs@sha256:25b85fd6ccbe7e141cd1541264c40cb15f28c1ba815a4df58c6a3949b583e91f
Checklist: