Lengthening initial backoff time for EndpointSlice controller #89438

robscott · 2020-03-24T18:27:16Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
The EndpointSlice controller has the potential to manage a large number of resources that are updated frequently. Without proper backoffs in place, there is potential for it to unnecessarily overload the API Server with requests. This makes two significant changes:

Increasing the base backoff from 5ms to 1s.
Making all syncs triggered by EndpointSlice changes delayed by at least 1 second to enable batching.

Special notes for your reviewer:
I've tested this with several e2e clusters with 2 failure scenarios:

Disabling discovery.k8s.io/v1beta1 API Group on the API Server
Removing ability to create EndpointSlices from EndpointSlice controller ClusterRole

In both cases the increased thresholds for exponential backoff seemed to be helpful. Additionally, the backoff value was quickly reset when these conditions were removed and successful syncs resumed.

For reference, here are the base backoffs I've seen for other controllers that have custom values (defaults are 5ms and 1000s):

Controller	Min	Max
Certificate	200ms	1000s
Job	10s	360s
Namespace	5ms	60s
Service	5s	300s

Does this PR introduce a user-facing change?:

EndpointSlice controller waits longer to retry failed sync.

/sig network
/priority important-soon
/cc @wojtek-t
/assign @freehan @liggitt

pkg/controller/endpointslice/endpointslice_controller.go

liggitt · 2020-03-25T01:50:35Z

/lgtm
/approve

k8s-ci-robot · 2020-03-25T01:51:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controller/endpointslice/OWNERS~~ [liggitt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2020-03-25T06:25:55Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2020-03-25T09:34:54Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

wojtek-t

/hold

pkg/controller/endpointslice/endpointslice_controller.go

The EndpointSlice controller has the potential to manage a large number of resources that are updated frequently. Without proper backoffs in place, there is potential for it to unnecessarily overload the API Server with requests. This makes two significant changes: Increasing the base backoff from 5ms to 1s and making all syncs triggered by EndpointSlice changes delayed by at least 1 second to enable batching.

wojtek-t · 2020-03-25T18:10:32Z

/hold cancel
/lgtm

robscott · 2020-03-25T20:43:06Z

/retest

k8s-ci-robot assigned freehan and liggitt Mar 24, 2020

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Mar 24, 2020

k8s-ci-robot requested a review from wojtek-t March 24, 2020 18:27

wojtek-t reviewed Mar 24, 2020

View reviewed changes

pkg/controller/endpointslice/endpointslice_controller.go Outdated Show resolved Hide resolved

liggitt reviewed Mar 24, 2020

View reviewed changes

pkg/controller/endpointslice/endpointslice_controller.go Outdated Show resolved Hide resolved

pkg/controller/endpointslice/endpointslice_controller.go Outdated Show resolved Hide resolved

robscott force-pushed the endpointslice-controller-error-backoff branch 3 times, most recently from 84c8359 to 7ae3e16 Compare March 24, 2020 22:24

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2020

wojtek-t reviewed Mar 25, 2020

View reviewed changes

pkg/controller/endpointslice/endpointslice_controller.go Outdated Show resolved Hide resolved

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2020

robscott force-pushed the endpointslice-controller-error-backoff branch from 7ae3e16 to 94e5537 Compare March 25, 2020 18:00

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2020

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2020

k8s-ci-robot assigned wojtek-t Mar 25, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2020

k8s-ci-robot merged commit c4fd09d into kubernetes:master Mar 25, 2020

k8s-ci-robot added this to the v1.19 milestone Mar 25, 2020

robscott deleted the endpointslice-controller-error-backoff branch March 11, 2021 04:55

robscott mentioned this pull request Mar 9, 2023

Update batching delay for EndpointSlices uses two flags inconsistently #116011

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lengthening initial backoff time for EndpointSlice controller #89438

Lengthening initial backoff time for EndpointSlice controller #89438

robscott commented Mar 24, 2020 •

edited

Loading

liggitt commented Mar 25, 2020

k8s-ci-robot commented Mar 25, 2020

fejta-bot commented Mar 25, 2020

fejta-bot commented Mar 25, 2020

wojtek-t left a comment

wojtek-t commented Mar 25, 2020

robscott commented Mar 25, 2020

Lengthening initial backoff time for EndpointSlice controller #89438

Lengthening initial backoff time for EndpointSlice controller #89438

Conversation

robscott commented Mar 24, 2020 • edited Loading

liggitt commented Mar 25, 2020

k8s-ci-robot commented Mar 25, 2020

fejta-bot commented Mar 25, 2020

fejta-bot commented Mar 25, 2020

wojtek-t left a comment

Choose a reason for hiding this comment

wojtek-t commented Mar 25, 2020

robscott commented Mar 25, 2020

robscott commented Mar 24, 2020 •

edited

Loading