-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging 503s #3771
Comments
Is envoy returning |
Hi @bmadhavan, envoy is returning the 503s |
We have similar question on our side. We do NOT have any My application see: I can see on "sidecar" envoy:
Cannot see anything corresponding on target envoy (next hop). Looks like either 503 was returned without any log or sidecar envoy reponsed with 503 immdiately? Cannot see anything in metrics described here: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/circuit_breaking ..apart from drop in This is interesting though Any clue what am I hitting here?
This might be related... |
BTW is there any way to improve the response? To return an actual grpc response on such error (or any other error)? |
@dnivra26 @Bplotka You can enable HTTP access logging to get the |
@bmadhavan so from the access logs the response code for the 503s are
with UC being 75%. what could be the reasons for connection termination? the traffic is under limit |
Also related:
|
@htuch : ack |
found these in envoy logs. disconnect. resetting 1 pending requests. a bunch of these. could this be the reason for 503s? if so, how are these caused and what can be done to fix them? @bmadhavan @mattklein123
|
more logs. queueing request due to no available connections
|
Envoy does connection pooling to upstream. So |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
For others that may be facing similar symptoms (low, but constant rate of HTTP 503's), be sure to check out the For example, we had a node.js application that appeared to have an idle timeout set to 5 seconds, but ambassador's default idle timeout is set to 5 minutes. After setting this to 4 seconds |
We had a similar issue with our cluster. After redeploying services, and deleting pods like cilium, envoy, contour, kube-proxy, one of our DevOps engineers did a rolling rebuild of the cluster and resolved the issue.
This was a mind boggling issue, where we had intermittent slowness, 503's, queued requests, and more. But after the rolling update we were good to go. |
Title: Debugging 503s
Description:
We have a service which is returning 503s for like 1% of the total traffic.
This target service has a replication of 100 and the calling service has a replication of 50.
We have updated the circuit_breaking configuration as below
The traffic would be around 10,000 requests/second.
And the latency is around 200ms. So i guess this configuration is sufficient enough to handle this traffic.
From the stats,
there is no upstream_rq_pending_overflow.
We have upstream_rq_pending_failure_eject and upstream_cx_connect_timeout.
I can understand upstream_cx_connect_timeout and we have a connection timeout of 0.25s.
But what could be the other reasons for upstream_rq_pending_failure_eject?
Also any suggestions to debug this 503 issues would be really helpful.
The text was updated successfully, but these errors were encountered: