-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET Core 2.1 new HttpClient based on .NET sockets and Span<T>. seems to have a problem #27611
Comments
I don't know if is possible, but maybe you can investigate that part of framework and try to understand what is happening (assuming that the httpclient wasn't completely rewritten) |
@dkhanaferov, any chance you'd be able to simplify your case down to a runnable repro we could try? Could you share the wireshark trace of the failing request? |
In the screenshot after the very last POST there is a FIN, ACK and it shows Ack=7187, so I look back I see the previous FIN,ACK had sequence=7186. Now, I'm by far not an expect in TCP or networking so I maybe saying something completely dumb, but does that mean that the last FIN,ACK (which comes from my host to server) is essentially my host FIN,ACK'ing the previous FIN,ACK (from server to my host) and essentially closing the connection. So since the next POST is made to the same host:port, using the same connection and yet the connection is closed (via that last FIN,ACK) that's why I never get a response back? Since my host FIN,ACK'ing the previous FIN,ACK just happens to be by chance right after the next POST, the connection is closed as it's supposed to be because FIN,ACK means we're done. But the next POST is attempting to re-use the same connection and it just gets screwed there. This would explain the apparent randomness of this issue. |
Some additional evidence: I was able to fix the issue i.e. prevent the connection from being closed and exception being thrown by setting the following appcontext switch: I then ran the application for 48 hours and did not see a single error. While without setting that switch the error appear a few times every 15 min or so. I will try to put together a small repo which could be used to reproduce this issue next, apologies for not having done this earlier but this is affecting a prod app so I'm trying to fix it as best as I can first. |
There's an unavoidable timing issue with these sorts of scenarios. The server is going to close an idle connection sooner or later. If it closes the connection just as we are trying to reuse it for a subsequent request, then things get complicated. We will retry the request on another connection if we are sure it is safe to do so. But if we've sent part of the request body then we won't resend then request, because we can't be sure it's safe to do so. I'm guessing that's what is happening here. I'm not sure why you see different behavior with the libcurl implementation. It could be that libcurl just has a different connection reuse policy and so is not hitting this issue. Or it could be that libcurl is more aggressive about retrying requests. It would be interesting to see a trace of libcurl and compare how it is handling this situation. |
@dkhanaferov any luck on smaller repro we could verify locally? |
Should we consider using TCP keep-alive @geoffkizer? Aside from that this does not seems actionable and perhaps should be closed? |
I don't see how TCP keep-alive would help here. The server is closing the connection because it's decided the connection is idle. |
I'll echo what I said above:
This seems like the best way to investigate further. |
In some cases I saw, it was not server it self but networking gear along the way. From client tcpdump, you cannot tell the difference. |
I honestly don't have a good sense for whether TCP keep alive would help much. I don't know how common it is for a router to drop a connection before the server does. My guess is it's not very common, because servers usually have pretty small connection timeouts (e.g. the default for Apache is 5 seconds). |
That said, I do think it's worthwhile to understand why libcurl doesn't seem to hit the same issue. |
can you isolate runable repro @dkhanaferov? I can investigate more. |
We've added more diagnostics and better error messages as of PR dotnet/corefx#37251. We very rarely now have a simple error message of "The server returned an invalid or unrecognized response." The errors will be more descriptive about what part of the server response might be "invalid or unrecognized". Please try out the latest daily build or wait until .NET Core 3.0 Preview 5. If you are able to repro the problem, please open a new issue. Thanks. |
Just FYI, the error message changes from dotnet/corefx#37251 will be in Preview 6. |
I have a strange issue I'm trying to triage having to do with the new
HttpClient
on .NET Core 2.1. Since theHttpClient
has been completely rewritten to use a different low level library for handling HTTP requests. I'm wondering if anyone has seen any issues with the new implementation.What I'm seeing is a strange case where my application (.NET Core 2.1) which sends a POST request to some API periodically (every 10 seconds) a few of times every 15 min it will throw an exception with the error:
An error occurred while sending the request.; The server returned an invalid or unrecognized response.
No other details are available, it's just an exception when I make a call like this:
The exception caught is a
System.Net.Http.HttpRequestException
and it has some inner exception with the above error message.So as I mentioned this does NOT happen all time, it happens seemingly at random, or at least I can not discern any particular pattern. All I can say is these POST requests are made once every 10 seconds 24/7 and anywhere between 5% and 10% of the POST requests fail with the above exception.
So used tcpdump and piped it into wireshark to examine the requests to see what's actually happening when the requests fail and what i see is the following:
On a good POST I see: my app sends the request to server, server sends response back, my app sends ACK to server and server responds with FIN,ACK. Done. Good Stuff.
On POST which gets the above exception I see the following: my app sends the request to server, and almost immediately after (like a few milliseconds after) my application sends FIN, ACK to server.
This seems consistent with what I see in my application logs, which show that the request duration is 0 before the exception is thrown.
So what it looks like to me is, my application sends the request and then immediately after closes the connection for some reason. However, I don't understand why this happens. I tried comparing the raw HTTP requests (good POST vs bad POST) to see any differences and I can not see any difference.
One last thing to mention, is that I ONLY see this in applications running on .NET Core 2.1. When I run my application on .NET 2.0 I do not see this problem. Also when I use the same library (where the HTTP call is being made) in the .NET 4.5.1 application (I use multi-targeting to compile the library targeting .net standard and net451) I also do NOT see this problem. So it seems to affect only .NET Core 2.1
The text was updated successfully, but these errors were encountered: