-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: permanently broken connection with error "read: connection reset by peer" when response body is not closed #36700
Comments
You were probably affected by #24138 |
Ugh. Thanks. This is about (not) having timeouts on the client side, right? It's unfortunate that it affects brand new requests and that it permanently breaks all new requests. Could we add a (when-broken-only?) connection re-use timeout without breaking the compatibility promise? It seems like connection re-use decisions are entirely up to net/http. Any suggestions for workarounds? It's tricky since I am actually doing long polling requests, where I actively want to keep connections open for hours, so I don't want to set aggressive timeouts. Would it help to call |
A different culprit than lack of timeout might be lack of resume-after-suspend notification. That would let us ping the server on resume and reconnect if nec. Also discussed in #36141 (comment) Maybe unrelated, but the CL for #23459 (net.Dial TCP keepalive on by default) didn't touch net/http. Does http.Client not use net.Dial, or is this disabled for http.Client by a different CL? |
Given that we don't know whether 1.13 was affected, I suspect this should not be milestoned to 1.14 at this point. |
@josharian Did you enable http2 on the Server? If so, curious what happens with http/1.1. |
I used net/http out of the box. I will run for a bit with http2 disabled and see what happens. |
In 1.13 (and presumably earlier; not sure), Setting a short keepalive helped in my case (a plain TCP/TLS context, not HTTP). Eventually I plan to disable keepalive and use a server pulse to detect a dead link. From net/http/transport.go:
|
@networkimprov, that seems unrelated. |
I finally tracked this down. It turns out that there was a code path in which in I didn't close resp.Body. Timeouts didn't help. Although this was user error (sorry and thanks!), I do wonder whether there's a way to detect this, since it was not easy to find. One option is a vet check that, similar to the lostcancel check, checks that resp.Body.Close gets called on all exits from a function. Another option would be some kind of check at runtime in the http package, although I don't know net/http well enough to know whether that is feasible. |
Correction: timeouts were also necessary (in addition to properly closing resp.Body). |
Closing as user error. |
@josharian, could you provide some more detail about how this resulting from a missing The documentation for the
It does not say anything about what happens for HTTP/2 connections, so at the very least there seems to be a documentation issue here. |
This also seems like something we could detect and report, using a finalizer-based approach similar to the one described in #24739: if the |
Yeah, I don't think forgetting to close resp.Body should cause anything worse than a TCP connection leak. It certainly shouldn't affect unrelated HTTP fetches. |
I'm honestly not sure. I tried to reproduce using multiple simpler ssetup and failed. There was clearly something else going on as well that closing the Body helped with. I went looking to try to understand better, but I've already spent over a full day on this, and I'm not really inclined to spend more time digging. I do think a finalizer-based unclosed unreachable Body detection would be a good idea, since that is definitely a bug, regardless of what precise consequences it has. I suspect it'd be better to log instead of panicking--net/http already logs when it notices an ineffectual WriteHeader call, so there's some precedent for it. |
Not closing the response body presumably claused flakyness in the registry replacer when it tries to read a response body[0], see[1] for why this can happen. As this seems to be a common mistake in our codebase, this change activates a linter for it. [0] https://issues.redhat.com/browse/DPTP-1692 [1] golang/go#36700
Possibly related: I just saw this error message in our CI system. I couldn't reproduce it, but I can't see why closing idle connections should cause this kind of error:
|
This makes so much sense. Guess which one keeps giving me |
#599 went in a wrong direction, so I revert it here: it makes all `TestMultipleClients` tests flaky since not all members are necessarily joined after the updates Instead, to fix the `TestMultipleClientsWithMixedLabelsAndExpectFailure` test, I add a couple more labeled members, that way, even if we reach 3 updates (which happened some times), we'll never get to 5 with a single member Also, try to fix the `TestTLSServerWithLocalhostCertWithClientCertificateEnforcementUsingClientCA1` test by closing GRPC connections. Maybe this (golang/go#36700) is the issue?
#599 went in a wrong direction, so I revert it here: it makes all `TestMultipleClients` tests flaky since not all members are necessarily joined after the updates Instead, to fix the `TestMultipleClientsWithMixedLabelsAndExpectFailure` test, I add a couple more labeled members, that way, even if we reach 3 updates (which happened some times), we'll never get to 5 with a single member Also, try to fix the `TestTLSServerWithLocalhostCertWithClientCertificateEnforcementUsingClientCA1` test by closing GRPC connections. Maybe this (golang/go#36700) is the issue?
* Really fix flaky `TestMultipleClients` tests #599 went in a wrong direction, so I revert it here: it makes all `TestMultipleClients` tests flaky since not all members are necessarily joined after the updates Instead, to fix the `TestMultipleClientsWithMixedLabelsAndExpectFailure` test, I add a couple more labeled members, that way, even if we reach 3 updates (which happened some times), we'll never get to 5 with a single member Also, try to fix the `TestTLSServerWithLocalhostCertWithClientCertificateEnforcementUsingClientCA1` test by closing GRPC connections. Maybe this (golang/go#36700) is the issue? * Set `MaxIdleConnsPerHost` * New attempt to fix this test. Could it be aborted connections? * Retry RSTs
i suppose someone can open a new issue if they can still reproduce it. |
What version of Go are you using (
go version
)?Go 1.14 beta 1
Does this issue reproduce with the latest release?
Not sure; I can't compile my program with 1.13.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Made a series of http requests from a Go client to a Go server. The requests were vanilla HTTP requests, using
http.Get
.I lost connectivity at some point. When I regained connectivity, all subsequent http requests failed with error
read: connection reset by peer
. I waited quite a long time, and it never recovered.It has happened a couple of times, but doesn't reproduce reliably (which is unsurprising, since me closing my laptop lid is not exactly a precision affair).
This looks very similar to #34978, although I don't know when exactly the connectivity failed.
Tentatively marking as Go 1.14.
cc @bradfitz @fraenkel @tombergan
The text was updated successfully, but these errors were encountered: