.NET Core 2.1 new HttpClient based on .NET sockets and Span<T>. seems to have a problem #27611

dkhanaferov · 2018-10-11T20:38:12Z

I have a strange issue I'm trying to triage having to do with the new HttpClient on .NET Core 2.1. Since the HttpClient has been completely rewritten to use a different low level library for handling HTTP requests. I'm wondering if anyone has seen any issues with the new implementation.

What I'm seeing is a strange case where my application (.NET Core 2.1) which sends a POST request to some API periodically (every 10 seconds) a few of times every 15 min it will throw an exception with the error: An error occurred while sending the request.; The server returned an invalid or unrecognized response.

No other details are available, it's just an exception when I make a call like this:

using (var res = await _httpClient.PostAsync(uriBuilder.Uri, new StringContent(serializedRequestBody, Encoding.UTF8, "application/json")))
            {
               //Do something here
            }

The exception caught is a System.Net.Http.HttpRequestException and it has some inner exception with the above error message.

So as I mentioned this does NOT happen all time, it happens seemingly at random, or at least I can not discern any particular pattern. All I can say is these POST requests are made once every 10 seconds 24/7 and anywhere between 5% and 10% of the POST requests fail with the above exception.

So used tcpdump and piped it into wireshark to examine the requests to see what's actually happening when the requests fail and what i see is the following:

On a good POST I see: my app sends the request to server, server sends response back, my app sends ACK to server and server responds with FIN,ACK. Done. Good Stuff.

On POST which gets the above exception I see the following: my app sends the request to server, and almost immediately after (like a few milliseconds after) my application sends FIN, ACK to server.

This seems consistent with what I see in my application logs, which show that the request duration is 0 before the exception is thrown.

So what it looks like to me is, my application sends the request and then immediately after closes the connection for some reason. However, I don't understand why this happens. I tried comparing the raw HTTP requests (good POST vs bad POST) to see any differences and I can not see any difference.

One last thing to mention, is that I ONLY see this in applications running on .NET Core 2.1. When I run my application on .NET 2.0 I do not see this problem. Also when I use the same library (where the HTTP call is being made) in the .NET 4.5.1 application (I use multi-targeting to compile the library targeting .net standard and net451) I also do NOT see this problem. So it seems to affect only .NET Core 2.1

The text was updated successfully, but these errors were encountered:

LuizAdolphs · 2018-10-11T20:50:59Z

I don't know if is possible, but maybe you can investigate that part of framework and try to understand what is happening (assuming that the httpclient wasn't completely rewritten)

stephentoub · 2018-10-12T12:39:06Z

@dkhanaferov, any chance you'd be able to simplify your case down to a runnable repro we could try? Could you share the wireshark trace of the failing request?

dkhanaferov · 2018-10-12T21:29:11Z

In the screenshot after the very last POST there is a FIN, ACK and it shows Ack=7187, so I look back I see the previous FIN,ACK had sequence=7186. Now, I'm by far not an expect in TCP or networking so I maybe saying something completely dumb, but does that mean that the last FIN,ACK (which comes from my host to server) is essentially my host FIN,ACK'ing the previous FIN,ACK (from server to my host) and essentially closing the connection.

So since the next POST is made to the same host:port, using the same connection and yet the connection is closed (via that last FIN,ACK) that's why I never get a response back?

Since my host FIN,ACK'ing the previous FIN,ACK just happens to be by chance right after the next POST, the connection is closed as it's supposed to be because FIN,ACK means we're done. But the next POST is attempting to re-use the same connection and it just gets screwed there.

This would explain the apparent randomness of this issue.

dkhanaferov · 2018-10-13T23:01:38Z

Some additional evidence:

I was able to fix the issue i.e. prevent the connection from being closed and exception being thrown by setting the following appcontext switch: AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false); Which according to documentation will revert to using the older libcurl implementation of HttpClient on Linux (I'm only running on Linux so I haven't tested this on Windows)

I then ran the application for 48 hours and did not see a single error. While without setting that switch the error appear a few times every 15 min or so.

I will try to put together a small repo which could be used to reproduce this issue next, apologies for not having done this earlier but this is affecting a prod app so I'm trying to fix it as best as I can first.

geoffkizer · 2018-10-31T18:57:19Z

There's an unavoidable timing issue with these sorts of scenarios. The server is going to close an idle connection sooner or later. If it closes the connection just as we are trying to reuse it for a subsequent request, then things get complicated. We will retry the request on another connection if we are sure it is safe to do so. But if we've sent part of the request body then we won't resend then request, because we can't be sure it's safe to do so.

I'm guessing that's what is happening here.

I'm not sure why you see different behavior with the libcurl implementation. It could be that libcurl just has a different connection reuse policy and so is not hitting this issue. Or it could be that libcurl is more aggressive about retrying requests. It would be interesting to see a trace of libcurl and compare how it is handling this situation.

karelz · 2018-11-02T18:12:03Z

@dkhanaferov any luck on smaller repro we could verify locally?

wfurt · 2019-02-22T19:16:52Z

Should we consider using TCP keep-alive @geoffkizer?
I know it wont completely fix the race condition but we added it to curl handler to help with getting through firewalls and load-ballancers.

Aside from that this does not seems actionable and perhaps should be closed?

geoffkizer · 2019-02-22T20:18:50Z

I don't see how TCP keep-alive would help here. The server is closing the connection because it's decided the connection is idle.

geoffkizer · 2019-02-22T20:20:19Z

I'll echo what I said above:

It would be interesting to see a trace of libcurl and compare how it is handling this situation.

This seems like the best way to investigate further.

wfurt · 2019-02-22T20:20:50Z

In some cases I saw, it was not server it self but networking gear along the way. From client tcpdump, you cannot tell the difference.

geoffkizer · 2019-02-22T20:36:23Z

I honestly don't have a good sense for whether TCP keep alive would help much. I don't know how common it is for a router to drop a connection before the server does. My guess is it's not very common, because servers usually have pretty small connection timeouts (e.g. the default for Apache is 5 seconds).

geoffkizer · 2019-02-22T20:45:13Z

That said, I do think it's worthwhile to understand why libcurl doesn't seem to hit the same issue.

wfurt · 2019-02-22T21:00:05Z

can you isolate runable repro @dkhanaferov? I can investigate more.

davidsh · 2019-04-28T22:18:07Z

@dkhanaferov

We've added more diagnostics and better error messages as of PR dotnet/corefx#37251. We very rarely now have a simple error message of "The server returned an invalid or unrecognized response." The errors will be more descriptive about what part of the server response might be "invalid or unrecognized".

Please try out the latest daily build or wait until .NET Core 3.0 Preview 5. If you are able to repro the problem, please open a new issue. Thanks.

stephentoub · 2019-04-28T22:25:58Z

wait until .NET Core 3.0 Preview 5

Just FYI, the error message changes from dotnet/corefx#37251 will be in Preview 6.

davidsh closed this as completed Apr 28, 2019

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.NET Core 2.1 new HttpClient based on .NET sockets and Span<T>. seems to have a problem #27611

.NET Core 2.1 new HttpClient based on .NET sockets and Span<T>. seems to have a problem #27611

dkhanaferov commented Oct 11, 2018

LuizAdolphs commented Oct 11, 2018

stephentoub commented Oct 12, 2018

dkhanaferov commented Oct 12, 2018

dkhanaferov commented Oct 13, 2018

geoffkizer commented Oct 31, 2018

karelz commented Nov 2, 2018

wfurt commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

wfurt commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

wfurt commented Feb 22, 2019

davidsh commented Apr 28, 2019

stephentoub commented Apr 28, 2019

.NET Core 2.1 new HttpClient based on .NET sockets and Span<T>. seems to have a problem #27611

.NET Core 2.1 new HttpClient based on .NET sockets and Span<T>. seems to have a problem #27611

Comments

dkhanaferov commented Oct 11, 2018

LuizAdolphs commented Oct 11, 2018

stephentoub commented Oct 12, 2018

dkhanaferov commented Oct 12, 2018

dkhanaferov commented Oct 13, 2018

geoffkizer commented Oct 31, 2018

karelz commented Nov 2, 2018

wfurt commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

wfurt commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

geoffkizer commented Feb 22, 2019

wfurt commented Feb 22, 2019

davidsh commented Apr 28, 2019

stephentoub commented Apr 28, 2019