Respect the Keep-Alive response header on HTTP/1.1 as well #73585

MihaZupan · 2022-08-08T20:13:11Z

Followup after #73020
Contributes to #72958

This changes #73020 to apply to all 1.x responses, not just 1.0.

It also brings back the 1 second offset (discussed at #73020 (comment)) we apply to the timeout to avoid reusing the connection as it's about to time out.
An example from a real service closing the connection right as we've sent the request right about 5 seconds after the last one:

With these changes, an internal service I was testing with experiences no more "Connection reset by peer" request failures.

cc: @halter73

ghost · 2022-08-08T20:13:27Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Followup after #73020
Contributes to #72958

This changes #73020 to apply to all 1.x responses, not just 1.0.

It also brings back the 1 second offset (discussed at #73020 (comment)) we apply to the timeout to avoid reusing the connection as it's about to time out.
An example from a real service closing the connection right as we've sent the request right about 5 seconds after the last one:

With these changes, an internal service I was testing with experiences no more "Connection reset by peer" request failures.

cc: @halter73

Author:	MihaZupan
Assignees:	-
Labels:	`area-System.Net.Http`
Milestone:	7.0.0

MihaZupan · 2022-08-08T20:35:52Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2022-08-08T20:36:07Z

Azure Pipelines successfully started running 1 pipeline(s).

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs

stephentoub · 2022-08-09T02:09:12Z

It also brings back the 1 second offset

I still don't understand why this is the "right" offset. Why 1 second rather than 0.5 seconds or 2 seconds? Why is 1 second somehow magic? Isn't it likely you could come up with similar examples for any "offset" you selected? It seems like we're really just saying "that timeout you asked us to respect, we're going to choose a much smaller value in order to minimize the chances of a race condition", at which point it begs the question why it's ok to not respect the keep-alive value the server is sending (when the whole point of the original PR was to respect it)... if we're not going to keep it alive for the requested duration, are we really respecting it?

MihaZupan · 2022-08-09T10:26:00Z

The race we are trying to avoid spans the time from when we decide to reuse a specific HttpConnection, to when the server starts reading the request and decides that the connection is no longer idle.
This includes network latency (including possible time for TCP retransmission) and processing delays at the client and server.

The exact value of the offset we choose has to balance between:

Avoiding as many of these races, as they otherwise lead to request failures
Reusing the connection as much as possible to avoid wasting resources and latency

1 second is an arbitrary choice I made that should cover the vast majority of such races while not completely preventing connection reuse.
I'm happy to change it to whatever other suggested value though. I don't see the need to go beyond a second, but even 0.5 seconds would likely be good enough in this case.

While arguably the server should be the one with enough leeway here, this does not appear to be the case in the wild. I do think having some leeway on the client to mitigate that is valuable.
For servers that close the connection right at the timeout, #73020 effectively does nothing without the offset.

it begs the question why it's ok to not respect the keep-alive value the server is sending
if we're not going to keep it alive for the requested duration, are we really respecting it?

We are respecting it in the sense that we will never send a request after the timeout.
We are also attempting to never send it so late that the server will receive it after the timeout.

stephentoub · 2022-08-09T10:27:56Z

We are also attempting to never send it so late that the server will receive it after the timeout.

There is no positive value we could pick that would make this a reality.

MihaZupan · 2022-08-09T10:43:23Z

It's not a guarantee, but it should cover the 99%.

stephentoub · 2022-08-09T11:47:20Z

It's not a guarantee, but it should cover the 99%.

Does it? In the tests you've done for 1 second, are those with co-located client and server, or have you validated this 99% with client and server spanning the globe, on slower connections, mobile devices, etc?

rzikm

LGTM

MihaZupan · 2022-08-12T15:59:16Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2022-08-12T15:59:30Z

Azure Pipelines successfully started running 1 pipeline(s).

MihaZupan · 2022-08-12T18:33:03Z

Testing HTTP RTT to multiple different regions:

The worst P999 was 720 ms in my case, meaning a 360 ms offset would have been sufficient 99.9% of the time when talking to the worst server. This still leaves a ~3x buffer for even worse network conditions to maintain 99.9% in the worst-case delay scenarios.

Adjusting the delay between pings for the worst-case failure rate:

And then applying this change:

We do believe a 1 second offset will be sufficient for a substantial amount of scenarios.

Respect the Keep-Alive response header on HTTP/1.1 as well

abd649d

MihaZupan added the area-System.Net.Http label Aug 8, 2022

MihaZupan added this to the 7.0.0 milestone Aug 8, 2022

ghost assigned MihaZupan Aug 8, 2022

MihaZupan marked this pull request as ready for review August 8, 2022 20:26

MihaZupan requested a review from a team August 8, 2022 20:26

halter73 approved these changes Aug 8, 2022

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs Show resolved Hide resolved

rzikm approved these changes Aug 9, 2022

View reviewed changes

MihaZupan mentioned this pull request Aug 11, 2022

[release/6.0] Respect the Keep-Alive response header on HTTP/1.0 #73245

Merged

Add some more comments

9a1e83b

MihaZupan merged commit 4f1c56f into dotnet:main Aug 12, 2022

karelz mentioned this pull request Aug 15, 2022

SocketsHttpHandler does not respect the Keep-Alive header #72958

Closed

ghost locked as resolved and limited conversation to collaborators Sep 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect the Keep-Alive response header on HTTP/1.1 as well #73585

Respect the Keep-Alive response header on HTTP/1.1 as well #73585

MihaZupan commented Aug 8, 2022

ghost commented Aug 8, 2022

MihaZupan commented Aug 8, 2022

azure-pipelines bot commented Aug 8, 2022

stephentoub commented Aug 9, 2022 •

edited

Loading

MihaZupan commented Aug 9, 2022

stephentoub commented Aug 9, 2022 •

edited

Loading

MihaZupan commented Aug 9, 2022

stephentoub commented Aug 9, 2022

rzikm left a comment

MihaZupan commented Aug 12, 2022

azure-pipelines bot commented Aug 12, 2022

MihaZupan commented Aug 12, 2022

Respect the Keep-Alive response header on HTTP/1.1 as well #73585

Respect the Keep-Alive response header on HTTP/1.1 as well #73585

Conversation

MihaZupan commented Aug 8, 2022

ghost commented Aug 8, 2022

MihaZupan commented Aug 8, 2022

azure-pipelines bot commented Aug 8, 2022

stephentoub commented Aug 9, 2022 • edited Loading

MihaZupan commented Aug 9, 2022

stephentoub commented Aug 9, 2022 • edited Loading

MihaZupan commented Aug 9, 2022

stephentoub commented Aug 9, 2022

rzikm left a comment

Choose a reason for hiding this comment

MihaZupan commented Aug 12, 2022

azure-pipelines bot commented Aug 12, 2022

MihaZupan commented Aug 12, 2022

stephentoub commented Aug 9, 2022 •

edited

Loading

stephentoub commented Aug 9, 2022 •

edited

Loading