Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(EG K8S Provider): Enable leader election for EG controller #2694

Merged
merged 26 commits into from
Apr 4, 2024
Merged

feat(EG K8S Provider): Enable leader election for EG controller #2694

merged 26 commits into from
Apr 4, 2024

Conversation

alexwo
Copy link
Contributor

@alexwo alexwo commented Feb 25, 2024

What type of PR is this?

feat(controller): implement leader election for EG controller

What this PR does / why we need it:

  1. This PR introduces leader election for the Envoy Gateway (EG) controller to ensure that only one instance of the controller can perform write operations at any given time.
  2. Ensure consistent availability of xDS (Envoy's discovery service) across the system.

The implementation follows best practices for distributed systems, enhancing the reliability and stability of the EG controller in multi-instance deployments.

Which issue(s) this PR fixes:

This enhancement facilitates the scaling of xDS services and enables leader election support.

High Level Changes

  1. We don't block controllers startup as we set NeedLeaderElection flag to false by default, apply to GatewayAPIController.

  2. The status handler controller postpones it's startup until it wins the leader election. After being elected, The EG GatewayAPIController initiates a reconciliation loop to process all pending statuses, ensuring that the status handler controller can pick up on all the resources and become up-to-date after assuming leadership.

  3. The infra k8s provider is started after a pod is elected as leader, it directly will process all infra resources. no reconcile is required in this case.

  4. It's possible to configure leader elections settings, to gain more control over how the leader election process happen, how quick, how resilient. Incase not provided those will default to k8s base values.

leaseDuration : 18s
renewDeadline: 10s
retryPeriod: 2s

This use cases were manually validated:

Blocking the GRPC Port of a Leader (with IP Tables Rule)
Result: the Envoy proxy is able to discover the xDS rules from at least one of the remaining EGs.

Blocking the GRPC Port of the leader + all secondary pods except one
Result: Despite the widespread blockage, the Envoy proxy manages to obtain xDS rules from the remaining unblocked EG, maintaining operational resilience.

Apply k8s quickstart ( confirm only leader update status / infra)
Result: status updates and infrastructure are exclusively managed by the leader controller.

K8S control plane API is not available for the leader instance
The leader restarts, however, other instances continue to serve xDS and take leadership when possible
#1953

Copy link

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 32 lines in your changes are missing coverage. Please review.

Project coverage is 66.46%. Comparing base (ec0f31b) to head (29da944).

Files Patch % Lines
internal/infrastructure/runner/runner.go 0.00% 17 Missing ⚠️
internal/provider/kubernetes/kubernetes.go 73.52% 6 Missing and 3 partials ⚠️
api/v1alpha1/envoygateway_helpers.go 85.71% 3 Missing ⚠️
internal/provider/kubernetes/controller.go 62.50% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2694   +/-   ##
=======================================
  Coverage   66.45%   66.46%           
=======================================
  Files         160      161    +1     
  Lines       22553    22632   +79     
=======================================
+ Hits        14988    15042   +54     
- Misses       6696     6717   +21     
- Partials      869      873    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@guydc
Copy link
Contributor

guydc commented Feb 29, 2024

Should this PR also include some work in the infrastructure controller?

@alexwo
Copy link
Contributor Author

alexwo commented Feb 29, 2024

Should this PR also include some work in the infrastructure controller?
Hi @guydc , We can attempt to ensure that there is only a single controller that can create/update e.g., infra related resources at a time.

Yes, the side effect at this point is that infra updates will result in concurrent updates and extra revision conflict's / retrys.

@alexwo
Copy link
Contributor Author

alexwo commented Mar 1, 2024

Should this PR also include some work in the infrastructure controller?
Hi @guydc , We can attempt to ensure that there is only a single controller that can create/update e.g., infra related resources at a time.

The side effect at this point is that infra updates will result in concurrent updates and extra revision conflict's / retrys.

Added changes to ensure that infra will be created only by the elected EG instance.

@alexwo alexwo changed the title feat(EG K8S Provider): implement leader election for EG controller feat(EG K8S Provider): Implement leader election for EG controller Mar 3, 2024
@alexwo alexwo changed the title feat(EG K8S Provider): Implement leader election for EG controller feat(EG K8S Provider): Enable leader election for EG controller Mar 3, 2024
@alexwo alexwo marked this pull request as ready for review March 5, 2024 15:53
@alexwo alexwo requested a review from a team as a code owner March 5, 2024 15:53
return
case <-r.Elected:
initInfra()
r.Logger.Info("started")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log can also be moved to initInfra func

}, nil
}

func parseEnvDuration(envVar string, duration **time.Duration) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also make duration as a return value instead of using secondary pointer? (more like a gopher this way.

@@ -73,6 +79,13 @@ func New(cfg *rest.Config, svr *config.Server, resources *message.ProviderResour
return nil, fmt.Errorf("unable to set up ready check: %w", err)
}

// Emit elected & continue with deployment of infra resources
go func() {
if <-mgr.Elected(); true {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if <-mgr.Elected(); true {
if <-mgr.Elected() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using <-mgr.Elected(), given that it's a channel of type struct{}.

@alexwo
Copy link
Contributor Author

alexwo commented Mar 6, 2024

/retest

1 similar comment
@alexwo
Copy link
Contributor Author

alexwo commented Mar 27, 2024

/retest

@@ -1,5 +1,10 @@
deployment:
envoyGateway:
leaderElection:
enabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesnt look like the right way to set these field, suggest creating a field in the Kubernetes section of EnvoyGateway config

type EnvoyGatewayKubernetesProvider struct {

also we dont need an explicit enabled field, can use the instantiation of a leaderElection ptr type to decide whether to enable it or not

This will then need to be passed to the provider via

type Config struct {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this config need to be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and moved under config.envoyGateway

@@ -113,6 +113,7 @@ install-ratelimit:
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
kubectl rollout status --watch --timeout=5m -n envoy-gateway-system deployment/envoy-gateway
kubectl wait --timeout=5m -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available
tools/hack/deployment-exists.sh "app.kubernetes.io/name=envoy-ratelimit" "envoy-gateway-system"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this doing the same thing as L117 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubectl wait command isn't effective for deployment resources that haven't been created yet.
Therefore, it's necessary for the envoy-rate limit deployment to be in place before we can utilize kubectl wait to ensure it's in the available state.

Since the Envoy gateway requires additional time now to start up as leader and manage the rate limit deployment, this addon serves to mitigate that delay.

@@ -43,7 +44,7 @@ func (r *Runner) Start(ctx context.Context) (err error) {
if err != nil {
return fmt.Errorf("failed to get kubeconfig: %w", err)
}
p, err := kubernetes.New(cfg, &r.Config.Server, r.ProviderResources)
p, err := kubernetes.New(cfg, &r.Config.Server, r.ProviderResources, r.Elected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't be needed, and should be available in r.Config.Server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

r.Logger.Info("started")
// Wait for leader if leader election is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding more explicit comments that highlight what and why this is happening - "block, and only allow the leader to create infrastructure resources"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@arkodg
Copy link
Contributor

arkodg commented Mar 29, 2024

finally got to this, thanks for building this out ! adding some comments

@alexwo
Copy link
Contributor Author

alexwo commented Mar 29, 2024

finally got to this, thanks for building this out ! adding some comments

Thanks!

@alexwo
Copy link
Contributor Author

alexwo commented Mar 29, 2024

/retest

}

type LeaderElection struct {
// LeaseDuration defines the time non-leader contenders will wait before attempting to claim leadership. It's based on the timestamp of the last acknowledged signal. The default setting is 15 seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you follow the kubebuilder way of definition defaults, and also add optional tags
this may not be a CRD, but it helps with doc generation

@@ -42,6 +42,14 @@ spec:
- server
- --config-path=/config/envoy-gateway.yaml
env:
- name: ENVOY_GATEWAY_LEADER_ELECTION_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be removed

Copy link
Contributor Author

@alexwo alexwo Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg this allows to disable leader election? or we actually don't want to allow to configure it via helm at all? only via envoy spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a related note below

internal/cmd/server.go Outdated Show resolved Hide resolved
@@ -901,6 +901,13 @@ func (r *gatewayAPIReconciler) addFinalizer(ctx context.Context, gc *gwapiv1.Gat

// watchResources watches gateway api resources.
func (r *gatewayAPIReconciler) watchResources(ctx context.Context, mgr manager.Manager, c controller.Controller) error {
// trigger a reconcile after getting elected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this is needed

Copy link
Contributor Author

@alexwo alexwo Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon its initial startup, the Envoy gateway will not assume the leader role but will begin reconciling resources, as the main loop of the controller does not require a leader to start processing.

Once the controller is elected as the leader, it will not revisit resources it has previously reconciled. This approach is necessary to update the status of resources and manage the infra resources, such as creating eg proxies after a leader is elected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it doesn't need to revisit or update status, unless some resource has changed, and in that case the Reconcile would be triggered

Copy link
Contributor Author

@alexwo alexwo Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will ensure that we reconcile status/infra for resources that have changed before there is an active elected leader.

Use cases:

  • Initial startup, when different eg resources can already exists in the cluster.
  • Leader goes down during an outage, it takes sometime for the new one to become a leader, and resource changes can happen during this time.
  • EG upgrades / re-deployments.
  • etc..

It may take over few seconds for a leader to assume leadership and during this time resource status/infra changes won't be handled, this protects against it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • init - still trying to understand this use case, since the logic in New that waits for a manager to elected for the first time should handle this case
  • leader down - k8s should handle this case and re-elect a new leader, if there's a delay its worth debugging this in k8s

Copy link
Contributor Author

@alexwo alexwo Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg
k8s won't trigger a reconcile after a leader gets elected. In this impl we trigger it on our end, as we constantly reconcile resources for xds, even when we are not leaders.

We can decide not do this (reconcile resources only when we are leaders) but than we won't be able to serve xds from the non leader eg instances as they won't be in sync.

  • The main reconcile loop does not wait for leader election, it's start immediately on all pods.

    func (r *gatewayAPIReconciler) Reconcile(ctx context.Context, _ reconcile.Request) (reconcile.Result, error) {

    As there is no option to know which of the pods will get elected, the reconcile loop starts in all instances simultaneously. only the status / infra components start on leader election, so we have to replay for them the events they have missed.

  • Is not this expected ? the pod has to wait a lease duration timeout before it attempts to be a leader, as well k8s won't trigger a second full reconcile after a leader gets elected, as in our case the main reconcile loop is already processing items in all pods, including the one that will become the new leader.
    https://github.com/kubernetes/client-go/blob/v0.29.3/tools/leaderelection/leaderelection.go#L133

// LeaseDuration is the duration that non-leader candidates will
// wait to force acquire leadership. This is measured against time of
// last observed ack.
//
// A client needs to wait a full LeaseDuration without observing a change to
// the record before it can attempt to take over. 

Copy link
Contributor

@arkodg arkodg Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks for explaining, it sounds like, we kick the method again to replay the writes, specifically for resources that were updated/created during the leaderless interval ? is it a common pattern in most CNCF projects ?

Copy link
Contributor Author

@alexwo alexwo Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right, our goal is to close the leaderless interval.

The typical approach in Kubernetes and CNCF ecosystems is probably a specific controller per flow as one for xDS sync, which focuses solely on xDS operations without altering status or infrastructure. Additionally, a separate controller can be used to perform status updates, and make infrastructure modifications.

Based on this, we can outline that the xDS controller does not need to assume a leadership role, whereas the controller handling status and infrastructure updates does. Consequently, Kubernetes will initiate the reconciliation process. So in general there will be 2 very similar reconcile loops -> xds for non leaders & leaders, extra one for leaders only.

While it's feasible to develop a distinct controller for managing xDS reconciliations and another solely for infrastructure and status modifications, such an overhaul might be extensive and potentially less appealing within the existing EG architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, lets stick with this, can you also add a comment, explaining the why e.g. we kick the method again "Retrigger a reconcile after getting elected to replay the translation ( which will trigger status writes and/or infra resource creation) , specifically for resources that were updated/created during the leaderless interval"

alexwo added 2 commits April 1, 2024 13:22
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
// Default is 15 seconds.
RetryPeriod *time.Duration `json:"retryPeriod,omitempty"`
// Disabled allows to disable leader election (enabled by default)
Disabled *bool `json:"disabled,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using Disable in other APIs, lets use Disable here
e.g.

Disable bool `json:"disable,omitempty"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed i will change it

r.Kubernetes.LeaderElection = DefaultLeaderElection()
}

r.Kubernetes.LeaderElection = DefaultLeaderElection()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed cleaning it up.

Signed-off-by: Alexander Volchok <[email protected]>
@@ -195,6 +212,9 @@ func (r *EnvoyGatewayProvider) GetEnvoyGatewayKubeProvider() *EnvoyGatewayKubern

if r.Kubernetes == nil {
r.Kubernetes = DefaultEnvoyGatewayKubeProvider()
if r.Kubernetes.LeaderElection == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the case where r.Kubernetes is not nil, but r.Kubernetes.LeaderElection is nil

Copy link
Contributor Author

@alexwo alexwo Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

alexwo added 4 commits April 3, 2024 18:17
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
@alexwo
Copy link
Contributor Author

alexwo commented Apr 3, 2024

/retest

alexwo added 6 commits April 4, 2024 07:20
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alex Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
Signed-off-by: Alexander Volchok <[email protected]>
@alexwo
Copy link
Contributor Author

alexwo commented Apr 4, 2024

/retest

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great to see this feature making it in
thanks for addressing all the feedback !

@arkodg arkodg requested review from a team and zhaohuabing April 4, 2024 07:07
Copy link
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for all the effort and patience!

@arkodg arkodg merged commit b9053b8 into envoyproxy:main Apr 4, 2024
22 checks passed
@alexwo alexwo deleted the leader-election branch April 4, 2024 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants