Latency added by proxy climbs and remains high during load test #2282

jakebanks · 2023-10-18T06:06:17Z

jakebanks
Oct 18, 2023

Hi all,

I will preface this by saying I am by no means an expert in load testing or .NET web server configuration, and I am doubting this is an issue with YARP specifically. Hoping you can shed some light on the situation and guide me in the right direction please.

Issue

We are performing load testing to ensure our YARP deployment can handle the request throughput we require. For our load test, we have YARP running on .NET Core 7 in a Docker container, proxying to a dummy .NET Core 7 Kestrel web server with a single GET endpoint, that delays for 1 second before returning the current datetime.

Under low load scenarios, we see that adding YARP into the request pipeline adds approximately up to tens of milliseconds worth of latency, so the overall request takes approximately (just over) 1 second. We have added a HttpClientTelemetryConsumer and a ForwarderTelemetryConsumer and are happy to see logs being made at each stage of the request (represented by the dots on the root span)

However, under a certain load, response times begin to climb towards 3 seconds, so approximately ~2 seconds of latency added by the proxy (or perhaps more precisely, with the proxy in the request pipeline). What is interesting is that the last log we see indicates OnRequestStart is being called in HttpClientTelemetryConsumer (confusingly within the first ms of the trace), and then nothing further despite the request to the dummy web server eventually succeeding:

This query measures the duration of the outgoing HTTP client call from YARP - the bottom graph depicts the climb from 1s to ~2.6s:

What we’ve tried

We are using AWS and have experimented with scaling either horizontally or vertically.

We’ve tested with 12 servers with 0.25vCPU and 1GB of memory and seen no issues:

We then tested with 6 servers with 2vCPU and 4GB of memory, resulting in the ~2 seconds of additional latency I mentioned earlier:

These results seem like a clue to me; despite upping the hardware by quite a bit, there is some resource that is more plentiful with 12 boxes.

It also appears that the proxy’s CPU and Memory util are OK - maybe there are some bottlenecks I haven’t seen in our AWS metrics such as running out of CPU threads or hitting some connection limit.

We think we have ruled the dummy web server out by:

over provisioning it
performing load tests directly against it (without YARP in the picture)
adding tracing to it and observing responses of ~1 second

At this stage, our next step will be to try and get some HTTP Metrics out. We want to inspect the queue length as we noticed there is a limit of 1024 requests and the first log we don't see is for OnRequestLeftQueue.

Any advice around how to rule YARP in or out, or how to configure YARP to solve this would be greatly appreciated!!

Thank you in advance,
Jake

Answered by jakebanks

Nov 13, 2023

OK I think I am onto something here...

To recap, we have 6 proxies proxying to 12 mock servers via an AWS Application Load Balancer.

Since we are seeing request queuing, we know that MAX_CONCURRENT_STREAMS is being reached. It seems like the default would be 100 but I couldn't find a better reference than this unfortunately. I could see Kestrel's Limits.Http2.MaxStreamsPerConnection defaults to 100 but I think that is inbound.

We have 12 mock servers, so what gives? shouldn't we theoretically have 6 (proxies) * 100 (streams) * 12 (mock targets) = 7200 concurrent requests without queuing? well, because our back end target is just an AWS Application Load Balancer, that is considered the sam…

View full answer

jakebanks · 2023-10-19T03:42:08Z

jakebanks
Oct 19, 2023
Author

I should also add, I don't think we have any rate limiting applied to our routes (as no rate limiting appears to be the default), though I am struggling to explicitly disable it and test.

RateLimiter policy 'disable' not found for route 'route1'.

I've raised an issue here.

0 replies

MihaZupan · 2023-10-19T11:02:45Z

MihaZupan
Oct 19, 2023
Maintainer

What is interesting is that the last log we see indicates OnRequestStart is being called in HttpClientTelemetryConsumer (confusingly within the first ms of the trace), and then nothing further despite the request to the dummy web server eventually succeeding

IHttpClientTelemetryConsumer.OnRequestStart is called as soon as the proxy makes the HttpClient.SendAsync call to the backend destination. Seeing the event early means that other logic before the proxying (e.g. routing, load balancing, rate limiting, custom middleware) wasn't the issue.
Seeing the large gap between the OnRequestStart and when the backend started processing the request means that either the backend was really slow in starting to handle the request, or, more likely, that the proxy took a while to send that request for some reason.

What does your YARP configuration look like - are you using the configuration file / direct forwarding, any custom client configuration, etc.?

You mentioned seeing the OnRequestStart event early. When/if do you see OnRequestLeftQueue/OnRequestHeadersStart?

2 replies

jakebanks Oct 20, 2023
Author

Hi @MihaZupan thanks for the reply!

IHttpClientTelemetryConsumer.OnRequestStart is called as soon as the proxy makes the HttpClient.SendAsync call to the backend destination. Seeing the event early means that other logic before the proxying (e.g. routing, load balancing, rate limiting, custom middleware) wasn't the issue.

I found it strange that in the trace for a low load scenario, it shows OnRequestStart being logged at around the 37ms mark, yet under load the log for this is showing it happening at 0ms, which looks incorrect.

What does your YARP configuration look like - are you using the configuration file / direct forwarding, any custom client configuration, etc.?

We are using direct forwarding to the dummy web server that has its own AWS Application Load Balancer in front of it. We've configured YARP using a IReverseProxyBuilder so no config file here.

Thankfully it is a fairly basic setup in terms of config, we have had to create our own HttpMessageInvoker using IForwarderHttpClientFactory, exactly per the example here in order to customize ResponseHeaderEncodingSelector to support the encoding in one of our legacy systems being proxied to.

You mentioned seeing the OnRequestStart event early. When/if do you see OnRequestLeftQueue/OnRequestHeadersStart?

In a low load scenario, we get these logged as expected, during the load test I don't see a log for that but I'm sure these stages of the request lifecycle are happening as we do eventually send a request and receive a response, albeit with some unexpected latency.

Thanks again.

MihaZupan Oct 20, 2023
Maintainer

Do you know if you are using HTTP/1.1 or HTTP/2 between the proxy and backend?

I found it strange that in the trace for a low load scenario, it shows OnRequestStart being logged at around the 37ms mark, yet under load the log for this is showing it happening at 0ms, which looks incorrect.

In the low load scenario, you may be measuring the startup overhead (such as JITting all the methods).
Once the system is warmed up, it's perfectly reasonable to see OnRequestStart logged at 0ms - the time it takes from when the proxy receives a request and starts to send it out could easily be in the few microseconds range or lower.

Thankfully it is a fairly basic setup in terms of config, we have had to create our own HttpMessageInvoker using IForwarderHttpClientFactory

Can you please share that configuration?
One detail that could explain the observed behavior is if you're recreating the invoker on every request, as that would mean connections aren't being reused. This is why I hinted at OnRequestHeadersStart before, as that would let you know if you've spent a significant amount of time waiting for an available connection.

If you aren't able to obtain those event timestamps, metrics like Http11RequestsQueueDuration/Http20RequestsQueueDuration or OutgoingConnectionsEstablished can also be helpful.

You may also look at the ReverseProxy.Metrics.Sample project that already collects a bunch of such timestamps.

We want to inspect the queue length as we noticed there is a limit of 1024 requests

Just to clarify, this queue limit has nothing to do with OnRequestLeftQueue. It's simply a limit of how many CancellationTokenSource instances we'll keep around in memory, whereas OnRequestLeftQueue is a part of the HTTP connection pooling logic.

If none of this helps, you could try collecting debug-level logs about what the .NET HTTP and networking stacks are doing to see where things are getting stuck.
See example debug project.

jakebanks · 2023-10-24T02:44:52Z

jakebanks
Oct 24, 2023
Author

Do you know if you are using HTTP/1.1 or HTTP/2 between the proxy and backend?

It looks like we are using HTTP/2. I don't know much about this but it sounds like HTTP/2 is more performant.

In the low load scenario, you may be measuring the startup overhead (such as JITting all the methods).
Once the system is warmed up, it's perfectly reasonable to see OnRequestStart logged at 0ms - the time it takes from when the proxy receives a request and starts to send it out could easily be in the few microseconds range or lower.

Thanks, that is good to know.

Can you please share that configuration?

Sure, our config looks like this:

Cluster config - I've trimmed it down a bit to share here, but we actually have 5 routes and 5 clusters (that only differ in name and target). Only one route & cluster is being invoked during the test.

    public static IReverseProxyBuilder LoadProxyConfig(this IReverseProxyBuilder builder)
    {
        var clusterId = "exampleCluster";
        
        builder.LoadFromMemory(new[]
            {
                new RouteConfig
                {
                    RouteId = "exampleRoute",
                    ClusterId = clusterId,
                    Match = new RouteMatch
                    {
                        Hosts = new [] {"exampleloadtestdomain.example.com" },
                    }
                }
            },
            new[]
            {
                new ClusterConfig
                {
                    ClusterId = clusterId,
                    HttpRequest = new ForwarderRequestConfig() { ActivityTimeout = TimeSpan.FromSeconds(900) }, //long timeout to match one proxied system's timeout
                    Destinations = new Dictionary<string, DestinationConfig>(StringComparer.OrdinalIgnoreCase)
                    {
                        { "destination1", new DestinationConfig() { Address = "www.example.com" } }
                    }
                }
            });
        return builder;
    }

and the CustomForwarderHttpClientFactory implementation:

/**
 * ref: https://microsoft.github.io/reverse-proxy/articles/http-client-config.html#custom-iforwarderhttpclientfactory
 */
public class CustomForwarderHttpClientFactory : IForwarderHttpClientFactory
{
    public HttpMessageInvoker CreateClient(ForwarderHttpClientContext context)
    {
        var handler = new SocketsHttpHandler
        {
            UseProxy = false,
            AllowAutoRedirect = false,
            AutomaticDecompression = DecompressionMethods.None,
            UseCookies = false,
            ActivityHeadersPropagator = new ReverseProxyPropagator(DistributedContextPropagator.Current),
            ResponseHeaderEncodingSelector = (_, _) =>  Encoding.UTF8
        };

        var invoker = new HttpMessageInvoker(handler, disposeHandler: true);
        return invoker;
    }
}

One detail that could explain the observed behavior is if you're recreating the invoker on every request, as that would mean connections aren't being reused. This is why I hinted at OnRequestHeadersStart before, as that would let you know if you've spent a significant amount of time waiting for an available connection.

Debugging locally I can see that CustomForwarderHttpClientFactory.CreateClient is being called on startup once per cluster (5 times), but it doesn't appear to be recreating it whilst sending a few requests.

As for the OnRequestHeadersStart timestamp, I did some digging and whilst most traces appear with like in the OP, with OnRequestStart being the last thing logged...

I have actually found a couple that do have logs at the end (but not the start...)! so something is definitely off with data missing from our OpenTelemetry back end for some reason. Anyway, this might be useful to look at!

Request 1:

PerRequestMetrics: {
    "StartTime": "2023-10-17T02:19:22.0820696Z",
    "RouteInvokeOffset": 0.1638,
    "ProxyStartOffset": 0.1685,
    "HttpRequestStartOffset": 1.1,
    "HttpConnectionEstablishedOffset": 0,
    "HttpRequestLeftQueueOffset": 1109.6377,
    "HttpRequestHeadersStartOffset": 1110.1027,
    "HttpRequestHeadersStopOffset": 1110.2028,
    "HttpRequestContentStartOffset": 0,
    "HttpRequestContentStopOffset": 0,
    "HttpResponseHeadersStartOffset": 1110.22,
    "HttpResponseHeadersStopOffset": 2114.195,
    "HttpResponseContentStopOffset": 2114.2544,
    "HttpRequestStopOffset": 2114.211,
    "ProxyStopOffset": 2114.26,
    "Error": 0,
    "RequestBodyLength": 0,
    "ResponseBodyLength": 69,
    "RequestContentIops": 0,
    "ResponseContentIops": 2,
    "DestinationId": "destination1",
    "ClusterId": "adminCluster",
    "RouteId": "adminRoute"
}

Request 2 - this one has a slightly different distribution of where waits might be happening?:

PerRequestMetrics: {
    "StartTime": "2023-10-17T02:19:00.6066264Z",
    "RouteInvokeOffset": 0.5633,
    "ProxyStartOffset": 0.5675,
    "HttpRequestStartOffset": 269.0689,
    "HttpConnectionEstablishedOffset": 0,
    "HttpRequestLeftQueueOffset": 1085.0881,
    "HttpRequestHeadersStartOffset": 1085.097,
    "HttpRequestHeadersStopOffset": 1089.8484,
    "HttpRequestContentStartOffset": 0,
    "HttpRequestContentStopOffset": 0,
    "HttpResponseHeadersStartOffset": 1089.8533,
    "HttpResponseHeadersStopOffset": 2093.7458,
    "HttpResponseContentStopOffset": 2574.9463,
    "HttpRequestStopOffset": 2473.7656,
    "ProxyStopOffset": 2574.9622,
    "Error": 0,
    "RequestBodyLength": 0,
    "ResponseBodyLength": 69,
    "RequestContentIops": 0,
    "ResponseContentIops": 2,
    "DestinationId": "destination1",
    "ClusterId": "adminCluster",
    "RouteId": "adminRoute"
}

If you aren't able to obtain those event timestamps, metrics like Http11RequestsQueueDuration/Http20RequestsQueueDuration or OutgoingConnectionsEstablished can also be helpful.

I will have a go at digging these out, as well as enabling DEBUG logging to see what comes up. Thanks for the examples and suggestions Miha, it is greatly appreciated.

1 reply

jakebanks Nov 12, 2023
Author

Sorry for going silent on this, I had moved onto a different project temporarily. I've got some metrics out - Http20RequestsQueueDuration and OutgoingConnectionsEstablished. I am still not 100% sure on how to interpret or query otlp metrics properly, but it looks like requests are being queued and spending ~1second in the queue (potentially on average according to this, though I'm not sure what to make of my otlp metrics histogram data having max or avg options) in the worst case scenario.

Will hopefully learn more tomorrow, thanks.

jakebanks · 2023-11-12T13:34:38Z

jakebanks
Nov 12, 2023
Author

This post has been really enlightening to me dotnet/runtime#35088

0 replies

jakebanks · 2023-11-13T00:37:45Z

jakebanks
Nov 13, 2023
Author

OK I think I am onto something here...

To recap, we have 6 proxies proxying to 12 mock servers via an AWS Application Load Balancer.

Since we are seeing request queuing, we know that MAX_CONCURRENT_STREAMS is being reached. It seems like the default would be 100 but I couldn't find a better reference than this unfortunately. I could see Kestrel's Limits.Http2.MaxStreamsPerConnection defaults to 100 but I think that is inbound.

We have 12 mock servers, so what gives? shouldn't we theoretically have 6 (proxies) * 100 (streams) * 12 (mock targets) = 7200 concurrent requests without queuing? well, because our back end target is just an AWS Application Load Balancer, that is considered the same server despite having 12 servers behind the scenes. Each of our proxies has 1 connection and is maxing the 100 streams within.

The SocketsHttpHandler by default has EnableMultipleHttp2Connections set to false. Flipping this to true takes us from around ~600 requests/second to ~1300 requests/second.

I am yet to dig into why we are only achieving 1300/second despite a target concurrency of 3000/second. Will respond when I know more.

I'm also yet to understand the implications of enabling this setting, and whether this is even relevant/required for production, where we will have several target load balancers rather than just the one in this fabricated scenario.

Thank you! feeling quite relieved here and nearly certain this isn't a YARP issue.

2 replies

Tratcher Nov 13, 2023
Collaborator

Since we are seeing request queuing, we know that MAX_CONCURRENT_STREAMS is being reached. It seems like the default would be 100 but I couldn't find a better reference than this unfortunately. I could see Kestrel's Limits.Http2.MaxStreamsPerConnection defaults to 100 but I think that is inbound.

The receiving server sets the concurrent stream limit, the client has no say in the matter. Opening additional connections is the workaround. The spec discourages clients from opening additional connections, but it's mainly considering browser scenarios, not server to server. YARP enables EnableMultipleHttp2Connections by default. (Hmm, that's not in the samples/docs).

Thank you! feeling quite relieved here and nearly certain this isn't a YARP issue.

You're in luck, @MihaZupan is the HttpClient perf expert.

Tratcher Nov 13, 2023
Collaborator

#2323

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency added by proxy climbs and remains high during load test #2282

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Latency added by proxy climbs and remains high during load test #2282

jakebanks Oct 18, 2023

Issue

What we’ve tried

Replies: 5 comments · 5 replies

jakebanks Oct 19, 2023 Author

MihaZupan Oct 19, 2023 Maintainer

jakebanks Oct 20, 2023 Author

MihaZupan Oct 20, 2023 Maintainer

jakebanks Oct 24, 2023 Author

jakebanks Nov 12, 2023 Author

jakebanks Nov 12, 2023 Author

jakebanks Nov 13, 2023 Author

Tratcher Nov 13, 2023 Collaborator

Tratcher Nov 13, 2023 Collaborator

jakebanks
Oct 18, 2023

Replies: 5 comments 5 replies

jakebanks
Oct 19, 2023
Author

MihaZupan
Oct 19, 2023
Maintainer

jakebanks Oct 20, 2023
Author

MihaZupan Oct 20, 2023
Maintainer

jakebanks
Oct 24, 2023
Author

jakebanks Nov 12, 2023
Author

jakebanks
Nov 12, 2023
Author

jakebanks
Nov 13, 2023
Author

Tratcher Nov 13, 2023
Collaborator

Tratcher Nov 13, 2023
Collaborator