-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lengthening initial backoff time for EndpointSlice controller #89438
Lengthening initial backoff time for EndpointSlice controller #89438
Conversation
84c8359
to
7ae3e16
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, robscott The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
The EndpointSlice controller has the potential to manage a large number of resources that are updated frequently. Without proper backoffs in place, there is potential for it to unnecessarily overload the API Server with requests. This makes two significant changes: Increasing the base backoff from 5ms to 1s and making all syncs triggered by EndpointSlice changes delayed by at least 1 second to enable batching.
7ae3e16
to
94e5537
Compare
/hold cancel |
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
The EndpointSlice controller has the potential to manage a large number of resources that are updated frequently. Without proper backoffs in place, there is potential for it to unnecessarily overload the API Server with requests. This makes two significant changes:
Special notes for your reviewer:
I've tested this with several e2e clusters with 2 failure scenarios:
discovery.k8s.io/v1beta1
API Group on the API Servercreate
EndpointSlices from EndpointSlice controller ClusterRoleIn both cases the increased thresholds for exponential backoff seemed to be helpful. Additionally, the backoff value was quickly reset when these conditions were removed and successful syncs resumed.
For reference, here are the base backoffs I've seen for other controllers that have custom values (defaults are 5ms and 1000s):
Does this PR introduce a user-facing change?:
/sig network
/priority important-soon
/cc @wojtek-t
/assign @freehan @liggitt