-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete socket send with fast writer and slow reader #610
Comments
This issue does not reproduce for me if I run It also doesn't reproduce for me if I run It also doesn't reproduce for me if I redirect curl's stdout; for example, |
strace on the Python server:
strace on the curl client:
I'm mildly perturbed that curl is attempting to push its error message through the kernel one byte at a time. But that's presumably after the actual failure has occurred. |
@aseering - Thanks for reporting the issue and the repro steps. I was able to reproduce this issue and I have an understanding of the bug. |
I'm not sure if the issue I am experiencing is related but I think it is. I get constant "net::ERR_CONTENT_LENGTH_MISMATCH" errors from Chrome when I try to visit my uWSGI HTTP server. When using Flask as webserver, I get frequent "net::ERR_CONNECTION_RESET" (in the Flask stacktrace I get some Broken Pipe errors, I will attach them soon. I don't always get these Broken Pipe stack traces). This same app works fine on a virtual machine for both uWSGI and Flask. If you need any more information, please do ask. Edit: Client Chrome is not running from WSL. Edit 2: I'm being silly. Let me include some information. Edit 3: (Running the Flask webserver here) When trying to access my app from Firefox running on WSL itself, the file just doesn't fully transfer over (JS in my case), causing the script to fail. So, it doesn't matter whether the client is Windows or WSL.. When trying to fetch the file with curl (CURL v7.47.0 on WSL) I get
When trying to fetch the same on Windows (CURL v7.49.1 on Windows) I get
Edit 4:
|
@sunilmut -- you mentioned a little while back that you have a repro for this; just wondering where it is on your priority list? There's this ticket and there's #616; if you keep tracing there are about a dozen tickets that refer to those two; etc. It would be great to hear something from y'all about what your plans are for this collection of issues. |
FYI -- this still reproduces on build 14955 and Ubuntu 16.04. Interestingly, during this download, CPU usage on the machine goes up a lot. But that usage doesn't show up in |
@aseering - Thanks for reviving this thread again. I was out for couple of weeks and so the delay in the response. I will take this up immediately, since I agree that this could be manifesting in many different forms. I will post back once I have something. |
Thanks you so much for the update. Once this is fixed I can actually move my dev environment fully to WSL! Thanks to you too @aseering for your continous support for WSL, you're everywhere ;). |
@aseering - So, I have root caused the problem. The way the repro is setup, the server (python) side after sending the last byte, calls shutdown and closes the socket. Watching the data transfer under the debugger, it appears that whenever the socket is being closed by python, invariably, there is pending data to be sent. When there is pending data in the socket (with linger disabled), WSL closes the socket in the background. The way the background logic works is that it will wait for the last byte to be sent or a timeout, whichever comes first. After that it will force close the socket. In this case, we are not hitting the timeout and WSL force closes the socket after sending the last byte. Force closing the socket causes an abortive disconnect, which disregards any data that has not been received so far by the peer (i.e curl in this case). |
@sunilmut -- hm, good question... As a programmer, my instinctive expectation is that there would be no simple timeout. I can't immediately find anything online that contradicts that expectation. I would expect applications that care about such things to manage them explicitly using If the client dies or otherwise goes unresponsive while the server is trying to send data, then the connection will naturally hit TCP's built-in retry-count limit and get torn down. That's true regardless of whether the server is still trying to send, but, given that the kernel buffer is of bounded size, it does guarantee that the socket will close in bounded (though very large in the worst case) time. Basically, TCP itself guarantees that the send must still be making progress. If it's making progress, why kill it? If the answer is "because I'm worried about running out of kernel memory or ephemeral ports", then your app is dealing with high load and should be using features like |
The primary motivation for our design decisions is compatibility.
Do you have anything (link) that supports this? My searches are returning empty handed. Maybe that is what you also mean by |
Yeah, I can't find any documentation regarding a timeout either (unless |
Regularly-scheduled ping: This still reproduces for me on build 14971. Though, interestingly, curl hangs for several seconds before reporting that the transfer has closed -- I thought I remembered it erroring out immediately before. Also, a few more people appear to have hit this issue, and have filed new tickets. (Relevant tickets are linked above. Mentioning them in case anyone is subscribed to just this ticket, as I think github won't then send you a notification e-mail.) |
Today, I have been investigating the possibility of switching to Windows from Ubuntu (since I pretty much only use the command line anyway (apart from gvim); unfortunately, I think I'm seeing this too :
Python 2.7.6. I'm not sure how to find the versions of the Microsoft s/w, but I installed it today. |
So what was the answer to your pretty astute question: "how long does the socket linger in the background (maybe by trying to do a graceful disconnect) after sending the last byte, before forcefully closing the connection?" |
@therealkenc - For getting maximum reliability guarantees from the TCP stack, the TCP socket should not be closed until both the ends have been fully disconnected (or without the full termination handshake complete). To prevent server side sockets in a hanging state (from misbehaving clients that do not close their ends), there is timeout built it some systems. When the user mode socket is closed, a disconnect request (or |
Is this fix included in insider build 14986? Just wondering so I know I can upgrade and test it |
@xBytez Unfortunately, no. The fix got checked in the dev branch 4-5 days ago. The release branch runs behind the dev branch on an average 2-3 weeks (this is not a hard and fast rule, but just from my experience). Just to give you an understanding of the timeline. I am also eager to get this validated. |
@sunilmut Thanks for the information. Was just curious. I also found the WSL Release notes on the MSDN blog so I'll just have a look at that ;). Thanks again! |
It seems that 14986 caused this problem occuring more often. I'm using gulp's http server, so this issue is more visible in the browser when loading assets (larger CSS or JS files). Before 14986 i was able to refresh browser few times until it stopped interrupting with assets download, but now it seems i can't do that anymore. Issue is more persistent. Any ideas on new build release date (+/-)? |
The fix has made it to the release branch. It depends on which build they snap for the Insider release (which is not in our (WSL team's) control). If not in the upcoming Insider release build, it should definitely be in the next one after that. I don't have any more insight into the timelines/dates etc. Thanks for your patience. |
As for now (an hour of playing with stuff around after upgrade this morning), works perfectly ;-) Thank You! |
Same here! I was able to run my Flask server just fine. Thank you! |
@krzysztofantczak & @xBytez - Really glad to know that things are working so far. Will keep the issue open for sometime, before closing it out. |
This is also working great for me. Thanks! |
Glad to know. Thanks @aseering |
This might be related to #493 ; I'm not sure. It certainly feels very similar. Reproduced under build 14376.
Open up two WSL terminals. In the first, run the following two commands:
Then, in the second, run the following:
Expected:
Actual:
The exact number of bytes remaining varies, though it's consistently in the vicinity of 500kb for me.
The text was updated successfully, but these errors were encountered: