-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable leader election on endpoints for controllers #14094
Conversation
pkg/cmd/server/api/v1/types.go
Outdated
// the kube-system namespace to coordinate the lock. This overrides the behavior of | ||
// the controllerTTL value, and will instead use the leader election flags defined in | ||
// the Kubernetes controllerArguments field. | ||
LockServiceName *string `json:"lockServiceName"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe avoid putting "service" in the config field, and instead specify the resource to avoid needing a new field once configmaps are supported (c.f. kubernetes/kubernetes#44857)?
Yeah
|
We may want to cherry-pick kubernetes/kubernetes#45478 |
a2c15a9
to
cc2af2c
Compare
[test]
…On Fri, May 12, 2017 at 12:54 PM, OpenShift Bot ***@***.***> wrote:
continuous-integration/openshift-jenkins/test FAILURE (
https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1401/)
(Base Commit: f24a57f
<f24a57f>
)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#14094 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p_McgHX0-VfftmSCvytnO9t_ghcHks5r5I69gaJpZM4NTjdO>
.
|
[test] |
Any other comments? Applied comment from before. Will follow up in a separate PR with extended tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me, but I need more context on how leader election + etcd + controllers work together.
return plug.New(!options.PauseControllers), func() {}, nil | ||
} | ||
|
||
client, err := etcd.MakeEtcdClient(options.EtcdClientInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with etcd3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goal is to move off this completely by 3.7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before
So we are storing data in both etcd v2 and v3 at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server doesn't actually care - two different APIs
func legacyLeaderElectionStart(id, name string, leased *plug.Leased, lock rl.Interface, ttl time.Duration) func() { | ||
return func() { | ||
glog.V(2).Infof("Verifying no controller manager is running for %s", id) | ||
wait.Poll(ttl/2, 0, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this PollInfinite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
}) | ||
glog.V(2).Infof("Attempting to acquire controller lease as %s, renewing every %s", id, ttl) | ||
go leased.Run() | ||
go wait.Poll(ttl/2, 0, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one too is infinite
pkg/cmd/server/api/v1/types.go
Outdated
// controller instance should lead. It defaults to "kube-system" | ||
LockNamespace string `json:"lockNamespace"` | ||
// LockResource is the group and resource name to use to coordinate for the controller lock. | ||
// If unset, defaults to "Endpoints". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Endpoints/endpoints/
Updated
…On Sat, May 20, 2017 at 11:04 AM, Michail Kargakis ***@***.*** > wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/cmd/server/api/v1/types.go
<#14094 (comment)>:
> // ServiceServingCert holds configuration for service serving cert signer which creates cert/key pairs for
// pods fulfilling a service to serve with.
ServiceServingCert ServiceServingCert `json:"serviceServingCert"`
}
+// ControllerElectionConfig contains configuration values for deciding how a controller
+// will be elected to act as leader.
+type ControllerElectionConfig struct {
+ // LockName is the resource name used to act as the lock for determining which controller
+ // instance should lead.
+ LockName string `json:"lockName"`
+ // LockNamespace is the resource namespace used to act as the lock for determining which
+ // controller instance should lead. It defaults to "kube-system"
+ LockNamespace string `json:"lockNamespace"`
+ // LockResource is the group and resource name to use to coordinate for the controller lock.
+ // If unset, defaults to "Endpoints".
s/Endpoints/endpoints/
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#14094 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p13VxEU18_2ughG_3uq2e2upAA4Mks5r7wEQgaJpZM4NTjdO>
.
|
If no other comments, [merge] |
[merge] exec flake
|
Support the new upstream module for leader election via a new config field and command line flag (--lock-service-name). If specified, the new style election will be used. The legacy etcd election (triggered by controllerTTL > 0) will wait to verify no endpoint object exists before competing for the etcd lease, and will step down if it detects the endpoint object is created. With these changes, the controllers can now be run as static pods on the masters and talk only to the API. This will allow them to appear in the api and be scraped by prometheus.
Evaluated for origin test up to 17c4ce7 |
Evaluated for origin merge up to 17c4ce7 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1656/) (Base Commit: 16c3f11) |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/770/) (Base Commit: 21254d2) (Image: devenv-rhel7_6260) |
PR openshift#14094 added support for leader election on endpoints for controllers, but the legacy (etcd) mode was logging NotFound errors, which would be a normal condition when endpoints were not configured. This change ensures that logging only occurs for errors other than NotFound.
Support the new upstream module for leader election via a new config
field and command line flag (--lock-service-name). If specified, the new
style election will be used. The legacy etcd election (triggered by
controllerTTL > 0) will wait to verify no endpoint object exists before
competing for the etcd lease, and will step down if it detects the
endpoint object is created.
With these changes, the controllers can now be run as static pods on the
masters and talk only to the API. This will allow them to appear in the
api and be scraped by prometheus.
[test]