Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable leader election on endpoints for controllers #14094

Merged
merged 1 commit into from
May 24, 2017

Conversation

smarterclayton
Copy link
Contributor

Support the new upstream module for leader election via a new config
field and command line flag (--lock-service-name). If specified, the new
style election will be used. The legacy etcd election (triggered by
controllerTTL > 0) will wait to verify no endpoint object exists before
competing for the etcd lease, and will step down if it detects the
endpoint object is created.

With these changes, the controllers can now be run as static pods on the
masters and talk only to the API. This will allow them to appear in the
api and be scraped by prometheus.

[test]

// the kube-system namespace to coordinate the lock. This overrides the behavior of
// the controllerTTL value, and will instead use the leader election flags defined in
// the Kubernetes controllerArguments field.
LockServiceName *string `json:"lockServiceName"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe avoid putting "service" in the config field, and instead specify the resource to avoid needing a new field once configmaps are supported (c.f. kubernetes/kubernetes#44857)?

@smarterclayton
Copy link
Contributor Author

smarterclayton commented May 10, 2017 via email

@0xmichalis
Copy link
Contributor

We may want to cherry-pick kubernetes/kubernetes#45478

@smarterclayton
Copy link
Contributor Author

smarterclayton commented May 12, 2017 via email

@smarterclayton
Copy link
Contributor Author

[test]

@smarterclayton
Copy link
Contributor Author

Any other comments? Applied comment from before. Will follow up in a separate PR with extended tests.

Copy link
Contributor

@enj enj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, but I need more context on how leader election + etcd + controllers work together.

return plug.New(!options.PauseControllers), func() {}, nil
}

client, err := etcd.MakeEtcdClient(options.EtcdClientInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work with etcd3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goal is to move off this completely by 3.7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

So we are storing data in both etcd v2 and v3 at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server doesn't actually care - two different APIs

func legacyLeaderElectionStart(id, name string, leased *plug.Leased, lock rl.Interface, ttl time.Duration) func() {
return func() {
glog.V(2).Infof("Verifying no controller manager is running for %s", id)
wait.Poll(ttl/2, 0, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PollInfinite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

})
glog.V(2).Infof("Attempting to acquire controller lease as %s, renewing every %s", id, ttl)
go leased.Run()
go wait.Poll(ttl/2, 0, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one too is infinite

// controller instance should lead. It defaults to "kube-system"
LockNamespace string `json:"lockNamespace"`
// LockResource is the group and resource name to use to coordinate for the controller lock.
// If unset, defaults to "Endpoints".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Endpoints/endpoints/

@smarterclayton
Copy link
Contributor Author

smarterclayton commented May 20, 2017 via email

@smarterclayton
Copy link
Contributor Author

If no other comments, [merge]

@smarterclayton
Copy link
Contributor Author

smarterclayton commented May 22, 2017 via email

Support the new upstream module for leader election via a new config
field and command line flag (--lock-service-name). If specified, the new
style election will be used. The legacy etcd election (triggered by
controllerTTL > 0) will wait to verify no endpoint object exists before
competing for the etcd lease, and will step down if it detects the
endpoint object is created.

With these changes, the controllers can now be run as static pods on the
masters and talk only to the API. This will allow them to appear in the
api and be scraped by prometheus.
@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 17c4ce7

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 17c4ce7

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1656/) (Base Commit: 16c3f11)

@openshift-bot
Copy link
Contributor

openshift-bot commented May 24, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/770/) (Base Commit: 21254d2) (Image: devenv-rhel7_6260)

@openshift-bot openshift-bot merged commit 67275e1 into openshift:master May 24, 2017
marun added a commit to marun/origin that referenced this pull request Jun 15, 2017
PR openshift#14094 added support for leader election on endpoints for
controllers, but the legacy (etcd) mode was logging NotFound errors,
which would be a normal condition when endpoints were not configured.
This change ensures that logging only occurs for errors other than
NotFound.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants