-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxies connected to the secondary gateway do not receive configuration #4845
Comments
we have same issue, rate limiting is not working in 1.2.3 with multiple controller replicas. We scaled from 2 -> 1 and it seems that everything is working again |
cc @arkodg |
debug logs from rate limit pods when running v1.2.3 and running multiple GET queries to a configured API endpoint.
|
this is a regression from #4809, most likely due to the fact that the secondary status updater will never be ready (wg.Add(1) will not be called for it) so the client will block until the pod becomes the leader. This poses a problem because there are 2 client status calls made from the provider go routine directly
Should these function calls be avoided, and the requests sent over via watchable to the status updater or should we avoid making these calls when the controller is not a leader ? @alexwo @zhaohuabing |
Looks like we also got regression of the 503 issue #4685 (comment). My previous PR tried to consolidate all the status updates to the watchable so they won't block the senders, but there's a race condition between the gateway api runner and the provide runner. I need more time to dig into this if we want to solve it by this approach. |
@zhaohuabing afaik there is no regression for 503 issue with 1 EG replica |
Confirming also that this fixes the issue with ratelimit pods not receiving configuration when running multiple gateway replicas, which resulted in requests returning HTTP 200 even if ratelimits are hit. Ratelimiter pods seem to work ok now. |
Description:
When a proxy is connected to a secondary gateway it does not receive configuration.
This is a new behaviour after the upgrade to
1.2.3
and was not happening with version1.1.0
Repro steps:
Have two gateways configured with leader election (default configuration, version 1.2.3). Scale up envoy deployment and check for proxies connected to the secondary gateway. Those proxies will not receive the configurations complaining that the initial timeout is failing (see logs).
Those proxies will be able to get the configuration once the primary gateway is deleted and the secondary becomes the new primary.
Environment:
Gateway version:
1.2.3
Envoy version:
1.32.1
Kubernetes:
EKS 1.28
Logs:
Default log level is set to
debug
logs from envoy PROXY-POD-NAME:
logs from secondary gateway:
The text was updated successfully, but these errors were encountered: