Enable leader election on endpoints for controllers #14094

smarterclayton · 2017-05-08T06:52:52Z

Support the new upstream module for leader election via a new config
field and command line flag (--lock-service-name). If specified, the new
style election will be used. The legacy etcd election (triggered by
controllerTTL > 0) will wait to verify no endpoint object exists before
competing for the etcd lease, and will step down if it detects the
endpoint object is created.

With these changes, the controllers can now be run as static pods on the
masters and talk only to the API. This will allow them to appear in the
api and be scraped by prometheus.

[test]

liggitt · 2017-05-10T02:12:19Z

pkg/cmd/server/api/v1/types.go

+	// the kube-system namespace to coordinate the lock. This overrides the behavior of
+	// the controllerTTL value, and will instead use the leader election flags defined in
+	// the Kubernetes controllerArguments field.
+	LockServiceName *string `json:"lockServiceName"`


maybe avoid putting "service" in the config field, and instead specify the resource to avoid needing a new field once configmaps are supported (c.f. kubernetes/kubernetes#44857)?

smarterclayton · 2017-05-10T04:41:21Z

Yeah

0xmichalis · 2017-05-10T22:36:24Z

We may want to cherry-pick kubernetes/kubernetes#45478

smarterclayton · 2017-05-12T19:38:32Z

[test]

…

On Fri, May 12, 2017 at 12:54 PM, OpenShift Bot ***@***.***> wrote: continuous-integration/openshift-jenkins/test FAILURE ( https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1401/) (Base Commit: f24a57f <f24a57f> ) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#14094 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p_McgHX0-VfftmSCvytnO9t_ghcHks5r5I69gaJpZM4NTjdO> .

smarterclayton · 2017-05-14T23:02:29Z

[test]

smarterclayton · 2017-05-15T00:59:16Z

Any other comments? Applied comment from before. Will follow up in a separate PR with extended tests.

enj

This looks fine to me, but I need more context on how leader election + etcd + controllers work together.

enj · 2017-05-17T23:56:20Z

pkg/cmd/server/origin/leaderelection.go

+			return plug.New(!options.PauseControllers), func() {}, nil
+		}
+
+		client, err := etcd.MakeEtcdClient(options.EtcdClientInfo)


How does this work with etcd3?

Same as before

Goal is to move off this completely by 3.7

Same as before

So we are storing data in both etcd v2 and v3 at the same time?

The server doesn't actually care - two different APIs

0xmichalis · 2017-05-18T08:51:35Z

pkg/cmd/server/origin/leaderelection.go

+func legacyLeaderElectionStart(id, name string, leased *plug.Leased, lock rl.Interface, ttl time.Duration) func() {
+	return func() {
+		glog.V(2).Infof("Verifying no controller manager is running for %s", id)
+		wait.Poll(ttl/2, 0, func() (bool, error) {


Is this PollInfinite?

0xmichalis · 2017-05-20T15:03:13Z

pkg/cmd/server/origin/leaderelection.go

+		})
+		glog.V(2).Infof("Attempting to acquire controller lease as %s, renewing every %s", id, ttl)
+		go leased.Run()
+		go wait.Poll(ttl/2, 0, func() (bool, error) {


This one too is infinite

0xmichalis · 2017-05-20T15:04:46Z

pkg/cmd/server/api/v1/types.go

+	// controller instance should lead. It defaults to "kube-system"
+	LockNamespace string `json:"lockNamespace"`
+	// LockResource is the group and resource name to use to coordinate for the controller lock.
+	// If unset, defaults to "Endpoints".


s/Endpoints/endpoints/

smarterclayton · 2017-05-20T22:45:48Z

Updated

…

On Sat, May 20, 2017 at 11:04 AM, Michail Kargakis ***@***.*** > wrote: ***@***.**** commented on this pull request. ------------------------------ In pkg/cmd/server/api/v1/types.go <#14094 (comment)>: > // ServiceServingCert holds configuration for service serving cert signer which creates cert/key pairs for // pods fulfilling a service to serve with. ServiceServingCert ServiceServingCert `json:"serviceServingCert"` } +// ControllerElectionConfig contains configuration values for deciding how a controller +// will be elected to act as leader. +type ControllerElectionConfig struct { + // LockName is the resource name used to act as the lock for determining which controller + // instance should lead. + LockName string `json:"lockName"` + // LockNamespace is the resource namespace used to act as the lock for determining which + // controller instance should lead. It defaults to "kube-system" + LockNamespace string `json:"lockNamespace"` + // LockResource is the group and resource name to use to coordinate for the controller lock. + // If unset, defaults to "Endpoints". s/Endpoints/endpoints/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#14094 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p13VxEU18_2ughG_3uq2e2upAA4Mks5r7wEQgaJpZM4NTjdO> .

smarterclayton · 2017-05-21T03:26:49Z

If no other comments, [merge]

smarterclayton · 2017-05-22T14:39:44Z

[merge] exec flake

Support the new upstream module for leader election via a new config field and command line flag (--lock-service-name). If specified, the new style election will be used. The legacy etcd election (triggered by controllerTTL > 0) will wait to verify no endpoint object exists before competing for the etcd lease, and will step down if it detects the endpoint object is created. With these changes, the controllers can now be run as static pods on the masters and talk only to the API. This will allow them to appear in the api and be scraped by prometheus.

openshift-bot · 2017-05-23T19:46:00Z

Evaluated for origin test up to 17c4ce7

openshift-bot · 2017-05-23T19:46:10Z

Evaluated for origin merge up to 17c4ce7

openshift-bot · 2017-05-23T21:29:56Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1656/) (Base Commit: 16c3f11)

openshift-bot · 2017-05-24T11:37:48Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/770/) (Base Commit: 21254d2) (Image: devenv-rhel7_6260)

PR openshift#14094 added support for leader election on endpoints for controllers, but the legacy (etcd) mode was logging NotFound errors, which would be a normal condition when endpoints were not configured. This change ensures that logging only occurs for errors other than NotFound.

smarterclayton requested review from enj and liggitt May 9, 2017 05:15

smarterclayton force-pushed the elect branch from ae4489b to 82d4e84 Compare May 9, 2017 05:23

liggitt reviewed May 10, 2017

View reviewed changes

smarterclayton force-pushed the elect branch 4 times, most recently from a2c15a9 to cc2af2c Compare May 12, 2017 15:15

enj reviewed May 18, 2017

View reviewed changes

0xmichalis reviewed May 18, 2017

View reviewed changes

smarterclayton force-pushed the elect branch from cc2af2c to 9962471 Compare May 19, 2017 21:56

0xmichalis reviewed May 20, 2017

View reviewed changes

smarterclayton force-pushed the elect branch from 9962471 to 28fc4ec Compare May 20, 2017 22:43

smarterclayton force-pushed the elect branch from 28fc4ec to fa1c8fe Compare May 21, 2017 01:19

smarterclayton force-pushed the elect branch from fa1c8fe to 17c4ce7 Compare May 23, 2017 19:41

openshift-bot merged commit 67275e1 into openshift:master May 24, 2017

marun mentioned this pull request Jun 15, 2017

Fix leader election logging #14662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable leader election on endpoints for controllers #14094

Enable leader election on endpoints for controllers #14094

smarterclayton commented May 8, 2017

liggitt May 10, 2017

smarterclayton commented May 10, 2017 via email

0xmichalis commented May 10, 2017

smarterclayton commented May 12, 2017 via email

smarterclayton commented May 14, 2017

smarterclayton commented May 15, 2017

enj left a comment

enj May 17, 2017

smarterclayton May 18, 2017

smarterclayton May 18, 2017

enj May 18, 2017

smarterclayton May 18, 2017

0xmichalis May 18, 2017

smarterclayton May 18, 2017

smarterclayton May 19, 2017

0xmichalis May 20, 2017

0xmichalis May 20, 2017

smarterclayton commented May 20, 2017 via email

smarterclayton commented May 21, 2017

smarterclayton commented May 22, 2017 via email

openshift-bot commented May 23, 2017

openshift-bot commented May 23, 2017

openshift-bot commented May 23, 2017

openshift-bot commented May 24, 2017 •

edited

Loading

Enable leader election on endpoints for controllers #14094

Enable leader election on endpoints for controllers #14094

Conversation

smarterclayton commented May 8, 2017

Choose a reason for hiding this comment

smarterclayton commented May 10, 2017 via email

0xmichalis commented May 10, 2017

smarterclayton commented May 12, 2017 via email

smarterclayton commented May 14, 2017

smarterclayton commented May 15, 2017

enj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton commented May 20, 2017 via email

smarterclayton commented May 21, 2017

smarterclayton commented May 22, 2017 via email

openshift-bot commented May 23, 2017

openshift-bot commented May 23, 2017

openshift-bot commented May 23, 2017

openshift-bot commented May 24, 2017 • edited Loading

openshift-bot commented May 24, 2017 •

edited

Loading