-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(EG K8S Provider): Enable leader election for EG controller #2694
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2694 +/- ##
=======================================
Coverage 66.45% 66.46%
=======================================
Files 160 161 +1
Lines 22553 22632 +79
=======================================
+ Hits 14988 15042 +54
- Misses 6696 6717 +21
- Partials 869 873 +4 ☔ View full report in Codecov by Sentry. |
Should this PR also include some work in the infrastructure controller? |
Yes, the side effect at this point is that infra updates will result in concurrent updates and extra revision conflict's / retrys. |
Added changes to ensure that infra will be created only by the elected EG instance. |
return | ||
case <-r.Elected: | ||
initInfra() | ||
r.Logger.Info("started") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this log can also be moved to initInfra
func
}, nil | ||
} | ||
|
||
func parseEnvDuration(envVar string, duration **time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also make duration
as a return value instead of using secondary pointer? (more like a gopher this way.
@@ -73,6 +79,13 @@ func New(cfg *rest.Config, svr *config.Server, resources *message.ProviderResour | |||
return nil, fmt.Errorf("unable to set up ready check: %w", err) | |||
} | |||
|
|||
// Emit elected & continue with deployment of infra resources | |||
go func() { | |||
if <-mgr.Elected(); true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if <-mgr.Elected(); true { | |
if <-mgr.Elected() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to using <-mgr.Elected(), given that it's a channel of type struct{}.
/retest |
1 similar comment
/retest |
charts/gateway-helm/values.tmpl.yaml
Outdated
@@ -1,5 +1,10 @@ | |||
deployment: | |||
envoyGateway: | |||
leaderElection: | |||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesnt look like the right way to set these field, suggest creating a field in the Kubernetes section of EnvoyGateway
config
gateway/api/v1alpha1/envoygateway_types.go
Line 183 in deea895
type EnvoyGatewayKubernetesProvider struct { |
also we dont need an explicit enabled
field, can use the instantiation of a leaderElection
ptr type to decide whether to enable it or not
This will then need to be passed to the provider via
type Config struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this config need to be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and moved under config.envoyGateway
@@ -113,6 +113,7 @@ install-ratelimit: | |||
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system | |||
kubectl rollout status --watch --timeout=5m -n envoy-gateway-system deployment/envoy-gateway | |||
kubectl wait --timeout=5m -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available | |||
tools/hack/deployment-exists.sh "app.kubernetes.io/name=envoy-ratelimit" "envoy-gateway-system" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this doing the same thing as L117 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kubectl wait command isn't effective for deployment resources that haven't been created yet.
Therefore, it's necessary for the envoy-rate limit deployment to be in place before we can utilize kubectl wait to ensure it's in the available state.
Since the Envoy gateway requires additional time now to start up as leader and manage the rate limit deployment, this addon serves to mitigate that delay.
internal/provider/runner/runner.go
Outdated
@@ -43,7 +44,7 @@ func (r *Runner) Start(ctx context.Context) (err error) { | |||
if err != nil { | |||
return fmt.Errorf("failed to get kubeconfig: %w", err) | |||
} | |||
p, err := kubernetes.New(cfg, &r.Config.Server, r.ProviderResources) | |||
p, err := kubernetes.New(cfg, &r.Config.Server, r.ProviderResources, r.Elected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't be needed, and should be available in r.Config.Server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
r.Logger.Info("started") | ||
// Wait for leader if leader election is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe adding more explicit comments that highlight what and why this is happening - "block, and only allow the leader to create infrastructure resources"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
finally got to this, thanks for building this out ! adding some comments |
Thanks! |
/retest |
api/v1alpha1/envoygateway_types.go
Outdated
} | ||
|
||
type LeaderElection struct { | ||
// LeaseDuration defines the time non-leader contenders will wait before attempting to claim leadership. It's based on the timestamp of the last acknowledged signal. The default setting is 15 seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you follow the kubebuilder way of definition defaults, and also add optional tags
this may not be a CRD, but it helps with doc generation
@@ -42,6 +42,14 @@ spec: | |||
- server | |||
- --config-path=/config/envoy-gateway.yaml | |||
env: | |||
- name: ENVOY_GATEWAY_LEADER_ELECTION_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkodg this allows to disable leader election? or we actually don't want to allow to configure it via helm at all? only via envoy spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a related note below
@@ -901,6 +901,13 @@ func (r *gatewayAPIReconciler) addFinalizer(ctx context.Context, gc *gwapiv1.Gat | |||
|
|||
// watchResources watches gateway api resources. | |||
func (r *gatewayAPIReconciler) watchResources(ctx context.Context, mgr manager.Manager, c controller.Controller) error { | |||
// trigger a reconcile after getting elected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why this is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon its initial startup, the Envoy gateway will not assume the leader role but will begin reconciling resources, as the main loop of the controller does not require a leader to start processing.
Once the controller is elected as the leader, it will not revisit resources it has previously reconciled. This approach is necessary to update the status of resources and manage the infra resources, such as creating eg proxies after a leader is elected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it doesn't need to revisit or update status, unless some resource has changed, and in that case the Reconcile
would be triggered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will ensure that we reconcile status/infra for resources that have changed before there is an active elected leader.
Use cases:
- Initial startup, when different eg resources can already exists in the cluster.
- Leader goes down during an outage, it takes sometime for the new one to become a leader, and resource changes can happen during this time.
- EG upgrades / re-deployments.
- etc..
It may take over few seconds for a leader to assume leadership and during this time resource status/infra changes won't be handled, this protects against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- init - still trying to understand this use case, since the logic in
New
that waits for a manager to elected for the first time should handle this case - leader down - k8s should handle this case and re-elect a new leader, if there's a delay its worth debugging this in k8s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkodg
k8s won't trigger a reconcile after a leader gets elected. In this impl we trigger it on our end, as we constantly reconcile resources for xds, even when we are not leaders.
We can decide not do this (reconcile resources only when we are leaders) but than we won't be able to serve xds from the non leader eg instances as they won't be in sync.
-
The main reconcile loop does not wait for leader election, it's start immediately on all pods.
func (r *gatewayAPIReconciler) Reconcile(ctx context.Context, _ reconcile.Request) (reconcile.Result, error) {
As there is no option to know which of the pods will get elected, the reconcile loop starts in all instances simultaneously. only the status / infra components start on leader election, so we have to replay for them the events they have missed. -
Is not this expected ? the pod has to wait a lease duration timeout before it attempts to be a leader, as well k8s won't trigger a second full reconcile after a leader gets elected, as in our case the main reconcile loop is already processing items in all pods, including the one that will become the new leader.
https://github.com/kubernetes/client-go/blob/v0.29.3/tools/leaderelection/leaderelection.go#L133
// LeaseDuration is the duration that non-leader candidates will
// wait to force acquire leadership. This is measured against time of
// last observed ack.
//
// A client needs to wait a full LeaseDuration without observing a change to
// the record before it can attempt to take over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks for explaining, it sounds like, we kick the method again to replay the writes, specifically for resources that were updated/created during the leaderless interval ? is it a common pattern in most CNCF projects ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's right, our goal is to close the leaderless interval.
The typical approach in Kubernetes and CNCF ecosystems is probably a specific controller per flow as one for xDS sync, which focuses solely on xDS operations without altering status or infrastructure. Additionally, a separate controller can be used to perform status updates, and make infrastructure modifications.
Based on this, we can outline that the xDS controller does not need to assume a leadership role, whereas the controller handling status and infrastructure updates does. Consequently, Kubernetes will initiate the reconciliation process. So in general there will be 2 very similar reconcile loops -> xds for non leaders & leaders, extra one for leaders only.
While it's feasible to develop a distinct controller for managing xDS reconciliations and another solely for infrastructure and status modifications, such an overhaul might be extensive and potentially less appealing within the existing EG architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, lets stick with this, can you also add a comment, explaining the why e.g. we kick the method again "Retrigger a reconcile after getting elected to replay the translation ( which will trigger status writes and/or infra resource creation) , specifically for resources that were updated/created during the leaderless interval"
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
api/v1alpha1/envoygateway_types.go
Outdated
// Default is 15 seconds. | ||
RetryPeriod *time.Duration `json:"retryPeriod,omitempty"` | ||
// Disabled allows to disable leader election (enabled by default) | ||
Disabled *bool `json:"disabled,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are using Disable in other APIs, lets use Disable
here
e.g.
Disable bool `json:"disable,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed i will change it
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
api/v1alpha1/envoygateway_helpers.go
Outdated
r.Kubernetes.LeaderElection = DefaultLeaderElection() | ||
} | ||
|
||
r.Kubernetes.LeaderElection = DefaultLeaderElection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed cleaning it up.
Signed-off-by: Alexander Volchok <[email protected]>
@@ -195,6 +212,9 @@ func (r *EnvoyGatewayProvider) GetEnvoyGatewayKubeProvider() *EnvoyGatewayKubern | |||
|
|||
if r.Kubernetes == nil { | |||
r.Kubernetes = DefaultEnvoyGatewayKubeProvider() | |||
if r.Kubernetes.LeaderElection == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the case where r.Kubernetes
is not nil, but r.Kubernetes.LeaderElection
is nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
/retest |
Signed-off-by: Alexander Volchok <[email protected]>
…nto leader-election Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alex Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great to see this feature making it in
thanks for addressing all the feedback !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for all the effort and patience!
What type of PR is this?
feat(controller): implement leader election for EG controller
What this PR does / why we need it:
The implementation follows best practices for distributed systems, enhancing the reliability and stability of the EG controller in multi-instance deployments.
Which issue(s) this PR fixes:
This enhancement facilitates the scaling of xDS services and enables leader election support.
High Level Changes
We don't block controllers startup as we set NeedLeaderElection flag to false by default, apply to GatewayAPIController.
The status handler controller postpones it's startup until it wins the leader election. After being elected, The EG GatewayAPIController initiates a reconciliation loop to process all pending statuses, ensuring that the status handler controller can pick up on all the resources and become up-to-date after assuming leadership.
The infra k8s provider is started after a pod is elected as leader, it directly will process all infra resources. no reconcile is required in this case.
It's possible to configure leader elections settings, to gain more control over how the leader election process happen, how quick, how resilient. Incase not provided those will default to k8s base values.
This use cases were manually validated:
Blocking the GRPC Port of a Leader (with IP Tables Rule)
Result: the Envoy proxy is able to discover the xDS rules from at least one of the remaining EGs.
Blocking the GRPC Port of the leader + all secondary pods except one
Result: Despite the widespread blockage, the Envoy proxy manages to obtain xDS rules from the remaining unblocked EG, maintaining operational resilience.
Apply k8s quickstart ( confirm only leader update status / infra)
Result: status updates and infrastructure are exclusively managed by the leader controller.
K8S control plane API is not available for the leader instance
The leader restarts, however, other instances continue to serve xDS and take leadership when possible
#1953