🌱 Increase leader election lease values for KCP #3980

vincepri · 2020-12-02T22:40:13Z

Signed-off-by: Vince Prignano [email protected]

What this PR does / why we need it:

To improve a self-managed cluster resilience to temporary errors related
to etcd leadership, this change increases the duration for all the lease
times.

The following are the most important values for leader election, we
increase the amount the non-leader candidates wait (1m now) and we
increase the renew deadline to 40s instead of 10, which should give
enough time for etcd connectivity to be established again.

Lease duration is now 1 minute instead of 15s
Renew deadline has been increased to 40 seconds instead of 10

In addition:

Retry period has been increased to 5 seconds instead of 2
- Avoid overloading the API Server / etcd with lease retry requests

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Related to #3978

To improve a self-managed cluster resilience to temporary errors related to etcd leadership, this change increases the duration for all the lease times. The following are the most important values for leader election, we increase the amount the non-leader candidates wait (1m now) and we increase the renew deadline to 40s instead of 10, which should give enough time for etcd connectivity to be established again. - Lease duration is now 1 minute instead of 15s - Renew deadline has been increased to 40 seconds instead of 10 In addition: - Retry period has been increased to 5 seconds instead of 2 - Avoid overloading the API Server / etcd with lease retry requests Signed-off-by: Vince Prignano <[email protected]>

vincepri · 2020-12-02T22:41:31Z

/milestone v0.4.0

fabriziopandini · 2020-12-02T22:51:20Z

/test pull-cluster-api-test-main

detiber

Are these changes going to affect timeouts that we have in place for operations in clusterctl such as init and move operations that take place shortly after an init?

I suspect that there will be places that test timeouts need to be updated to take into account this change as well (or ensuring that leader election is disabled for tests)

fabriziopandini · 2020-12-03T15:37:48Z

Currently, clusterctl init does not wait for the controllers to be up and running. Same for clusterctl move, which assumes that the target cluster is already initialized
There are timeouts in the E2E test, so might be those might require some tuning (the e2e job on this pr job passed without any change)

vincepri · 2020-12-03T15:43:35Z

/test pull-cluster-api-test-main

vincepri · 2020-12-03T16:35:05Z

@detiber These changes should not impact anything other than giving a bit more time for the controller to recover, they are a bit too aggressive for KCP, especially in the self-managed scenario which is going to be very common for a management cluster

detiber · 2020-12-03T19:51:21Z

@vincepri the longer lease duration will affect startup times for controllers when a lock was previously present, but I'm good with dealing with any issues that may arise from that as they come up.
/lgtm

JoelSpeed · 2020-12-03T19:53:21Z

Do we leverage the release on cancel feature of the leader election code? That could mitigate the long startup times when a new instance of the controller starts up

vincepri · 2020-12-03T21:27:16Z

@JoelSpeed I'd expect that to be in controller runtime, although we need to double check

JoelSpeed · 2020-12-04T12:55:36Z

@vincepri It's an option in the ctrl.Options that we could leverage, not sure if it's a blocker for this but we could plumb it through in the KCP main.go either as a flag option or an opinionated setting
https://github.com/kubernetes-sigs/controller-runtime/blob/00e7f851401bb78389db24d6f25fbfbc5f8edbe1/pkg/manager/manager.go#L179-L184

vincepri · 2020-12-04T16:16:17Z

Ah got it, it seems that option isn't in Controller Runtime v0.5.x (which was the one we're using in v0.3.x, or current stable release) — That said I think we should probably enable it in v1alpha4, let's open an issue to track it

wfernandes · 2020-12-14T23:06:30Z

This PR seems to be for v1alpha4. That said, should we enable LeaderElectionReleaseOnCancel in a separate issue/PR now that we are using controller-runtime v0.7.0-alpha.8.

vincepri · 2020-12-15T20:21:33Z

This commit was made to be backported @wfernandes

vincepri · 2020-12-15T20:21:52Z

/assign @CecileRobertMichon @detiber
for approval

vincepri · 2021-01-04T18:24:06Z

@CecileRobertMichon do you have some time to review these changes?

CecileRobertMichon · 2021-01-11T18:52:20Z

Sorry I thought I had already approved this, I guess I never hit submit...

/approve

k8s-ci-robot · 2021-01-11T18:52:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [CecileRobertMichon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fabriziopandini · 2021-01-12T11:06:28Z

/test pull-cluster-api-test-main

fabriziopandini · 2021-01-12T11:26:55Z

/test pull-cluster-api-test-main

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 2, 2020

k8s-ci-robot requested review from JoelSpeed and ncdc December 2, 2020 22:40

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 2, 2020

vincepri changed the title ~~🌱 Increase leader election lease values~~ 🌱 Increase leader election lease values for KCP Dec 2, 2020

k8s-ci-robot added this to the v0.4.0 milestone Dec 2, 2020

detiber reviewed Dec 3, 2020

View reviewed changes

k8s-ci-robot assigned detiber Dec 3, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2020

k8s-ci-robot assigned CecileRobertMichon Dec 15, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2021

k8s-ci-robot merged commit 52794b5 into kubernetes-sigs:master Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 Increase leader election lease values for KCP #3980

🌱 Increase leader election lease values for KCP #3980

vincepri commented Dec 2, 2020

vincepri commented Dec 2, 2020

fabriziopandini commented Dec 2, 2020

detiber left a comment

fabriziopandini commented Dec 3, 2020

vincepri commented Dec 3, 2020

vincepri commented Dec 3, 2020

detiber commented Dec 3, 2020

JoelSpeed commented Dec 3, 2020

vincepri commented Dec 3, 2020

JoelSpeed commented Dec 4, 2020

vincepri commented Dec 4, 2020

wfernandes commented Dec 14, 2020

vincepri commented Dec 15, 2020 •

edited

Loading

vincepri commented Dec 15, 2020

vincepri commented Jan 4, 2021

CecileRobertMichon commented Jan 11, 2021

k8s-ci-robot commented Jan 11, 2021

fabriziopandini commented Jan 12, 2021

fabriziopandini commented Jan 12, 2021

🌱 Increase leader election lease values for KCP #3980

🌱 Increase leader election lease values for KCP #3980

Conversation

vincepri commented Dec 2, 2020

vincepri commented Dec 2, 2020

fabriziopandini commented Dec 2, 2020

detiber left a comment

Choose a reason for hiding this comment

fabriziopandini commented Dec 3, 2020

vincepri commented Dec 3, 2020

vincepri commented Dec 3, 2020

detiber commented Dec 3, 2020

JoelSpeed commented Dec 3, 2020

vincepri commented Dec 3, 2020

JoelSpeed commented Dec 4, 2020

vincepri commented Dec 4, 2020

wfernandes commented Dec 14, 2020

vincepri commented Dec 15, 2020 • edited Loading

vincepri commented Dec 15, 2020

vincepri commented Jan 4, 2021

CecileRobertMichon commented Jan 11, 2021

k8s-ci-robot commented Jan 11, 2021

fabriziopandini commented Jan 12, 2021

fabriziopandini commented Jan 12, 2021

vincepri commented Dec 15, 2020 •

edited

Loading