-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve EG Gateway xDS & startup Reliability (custom k8s health prob) #2810
Comments
makes sense, a workaround for this until this is implemented is to wake up slowly i.e. set |
please assign to me |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
@arkodg But currently EG hasn't exposed gateway/charts/gateway-helm/templates/envoy-gateway-deployment.yaml Lines 66 to 71 in 78fe57a
|
I was referring to |
Since EG is xDS resources provider and envoy is consumer, setting longer Workaround here maybe set longer |
@arkodg As a workaround, could we make this EG readiness |
@aoledk if EG's cache is not ready yet but the xds server is ready, we send an empty response gateway/internal/xds/cache/snapshotcache.go Line 202 in 92760c8
will this cause the proxy listeners to drain ? |
@arkodg if cache is not ready at all (non-exist), EG will return nil and not set snapshot for envoy, only set snapshot will trigger sending xDS resources to envoy. So envoy will use its current active xDS resources instead of empty xDS resources, no listener drain. if cache is partly ready (starting EG is still reconciling objects), EG will set partly snapshot for envoy. Then envoy will receive partly xDS resources to replace its current active complete xDS resources, maybe leading to listener drain 1, specially when there are ClientTrafficPolicies not be reconciled. gateway/internal/xds/cache/snapshotcache.go Lines 202 to 214 in 33fceb0
Footnotes |
ah, so afaik the part ready case shouldnt happen because there is one big reconciler,
|
@arkodg It's my mistake, and you're right, partly ready xDS resources will never be generated under this big reconciler. |
thanks for cross checking and brainstorming the edge cases @aoledk ! |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
can we close this @alexwo |
To consider: move the current healthcheck listener and filter chain from static config (bootstrap) to dynamic config generated by EG. This would mean that readdiness checks would only pass once the proxy was programmed at least once. |
great idea @guydc , big +1 |
+1, this will also reduce the size of bootstrap configuration. |
The proposed enhancement involves modifying the controller's "ready" status to accurately reflect the completion and synchronization of xds discovery processes.
Specifically, the "ready" status indicator can transition to "true" when the xDS discovery has fully completed to store it's initial snapshot or when there is no reconciliation required. (empty or new deployment).
This can ensure that envoy proxies are always in sync with latest xDS service and that an EG that has started is able to reconcile.
-> If there is nothing to reconcile -> ready = true
-> if there are changes to reconcile, wait for xDS to complete -> ready = true
-> other wise -> ready = false
Currently, there may be certain cases where xDS is not completely synchronized at startup, which could cause new Envoy proxies to work with an incomplete xDS.
This can provide better guarantees that an operational EG consistently maintains an updated xDS, potentially also can allow avoiding situations where instances startup but fail during the initial reconcile.
Leader Election and multiple instances use case:
Will improve consistency in environments where multiple instances of EG run simultaneously by ensuring they start only once xDS server has persisted the latest state snapshot. #1953
The text was updated successfully, but these errors were encountered: