Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: half-closed connection triggers request cancellation #18527

Open
benburkert opened this issue Jan 5, 2017 · 14 comments
Open

net/http: half-closed connection triggers request cancellation #18527

benburkert opened this issue Jan 5, 2017 · 14 comments
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@benburkert
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

What did you do?

https://play.golang.org/p/DnXwH5qJkD

What did you expect to see?

A client that sends a request followed immediately by a FIN can read a response from the server.

What did you see instead?

The half-close is detected and the request is cancelled immediately.

Does this issue reproduce with the latest release (go1.7.4)?

No

System details

go version go1.8beta2 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/benburkert"
GORACE=""
GOROOT="/Users/benburkert/sdk/go1.8beta2"
GOTOOLDIR="/Users/benburkert/sdk/go1.8beta2/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qs/qkt9twmx4qg379d6f8kxl1vm0000gn/T/go-build530443777=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
GOROOT/bin/go version: go version go1.8beta2 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.8beta2 X:framepointer
uname -v: Darwin Kernel Version 16.3.0: Thu Nov 17 20:23:58 PST 2016; root:xnu-3789.31.2~1/RELEASE_X86_64
ProductName:	Mac OS X
ProductVersion:	10.12.2
BuildVersion:	16C67
lldb --version: lldb-360.1.70
@bradfitz
Copy link
Contributor

bradfitz commented Jan 6, 2017

Is this theoretical or does any popular client in the wild actually use half-closed TCP or TLS connections?

I considered this but couldn't find a problem in practice.

@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 6, 2017
@benburkert
Copy link
Contributor Author

We have an internal client that fires-and-forget's requests to a service. Those requests may take some time to process, and the server uses the request's contexts for timeouts unrelated to the client connection. AFAIK there is no way to prevent the "client gone" detection from canceling the context.

Other web servers do support this behavior, but it's not covered by any RFC: http://mailman.nginx.org/pipermail/nginx/2008-September/007388.html

Supporting clients that want to fire-and-forget requests could be added by ignoring the read EOF if the request sets the Connection: close header. But this is also a behavior unspecified by any RFC. Another alternative is to add an "ignore client gone" field to Server.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 6, 2017

What do you mean by "fire-and-forget"? It goes out of its way to do a shutdown on its TCP connection, rather than writing the request body, reading the response, and calling close once? I've never seen any client do that.

Thanks for the nginx link.

@mnot, is there any HTTP RFC which says how clients and/or servers should behave here with respect to half-closed (closed by client) requests?

@bradfitz bradfitz added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jan 6, 2017
@bradfitz bradfitz added this to the Go1.8Maybe milestone Jan 6, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Jan 6, 2017

I'm tagging this Go1.8Maybe to think about more, but I think it's too late to change anything here. The workaround is your application can ignore the Handler's Request.Context(). Or you can remove the shutdown from your internal HTTP client, IIUC.

@mnot
Copy link

mnot commented Jan 6, 2017

The relevant part of the specs is:
http://httpwg.org/specs/rfc7230.html#connection.management

... but that doesn't address this situation. If you think it should, log an issue in https://github.com/httpwg/http11bis

@bradfitz
Copy link
Contributor

bradfitz commented Jan 6, 2017

@mnot, done: httpwg/http-core#22

@bradfitz
Copy link
Contributor

bradfitz commented Jan 6, 2017

@benburkert, is your internal client making HTTP/1.0 requests? (See httpwg/http-core#22 (comment))

I'm still trying to understand what your client's motivation is.

@rsc
Copy link
Contributor

rsc commented Jan 9, 2017

Ping @benburkert

@benburkert
Copy link
Contributor Author

benburkert commented Jan 9, 2017

I described the issue as a problem with a client that depends on unspecified behavior hoping to clarify the issues but it seems to have had the opposite effect. Clients should not depend on unspecified behavior and we will fix our client to not half-close connections.

However, from a server's perspective normal browsers behavior can look identical to the fire-and-forget client I described above: a user closes their browser soon after initiating request for an asset. The server sees a connection, a valid request, and a FIN packet in very short succession.

Deciding to abort the request/response isn't so straight forward. A pageview endpoint would likely want to proceed without cancellation so that the page load is recorded. A large asset endpoint probably wants to drop the response as soon as the FIN is detected.

Because the HTTP spec does not cover handling a client that terminates the TCP connection unexpectedly, it seems that this it is left up to the library and application developers. I'm in favor of net/http deciding that an early EOF from the client triggers a cancellation of the request's context since it is ultimately up to the author of the handler to use the context or not.

But a handler that uses the request's context can behave differently between 1.7 and 1.8 due to EOF triggered cancellations. This probably won't be noticed by most but pageview style handlers will be affected, along with proxying handlers that want to let the backend decide how to handle an early client EOF.

I'm in favor of closing this issue because, even if there were a non-hacky way to support the 1.7 behavior, as @bradfitz said it's too late in the release cycle to do anything about it. The few handlers that are effected by this will have to be update to be more explicit about how they handle context cancellation, which seems fine.

@benburkert
Copy link
Contributor Author

benburkert commented Jan 9, 2017

one last thought: it may be benifitial to use a custom context.WithCancel so that the error on a early client EOF can be something other than context.Canceled. I believe that would make it easier to filter out this event from other context cancellations, though I haven't given it much thought.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 9, 2017

I went to add support for not canceling contexts for HTTP/1.0 POST/PUT requests, wrote a test I expected to fail but then I discovered Go didn't read the request body which confused me for a second until I also discovered that HTTP/1.0 always required a Content-Length (httpwg/http-core#22 (comment)).

So I'm inclined to do nothing here, at least for now.

@benburkert, I agree that some Handlers may want to continue processing after the client has gone away. But like you said, they can use a different Context instead.

We can do the release candidate and see how things go. We can revisit this if there are problems.

@bradfitz bradfitz closed this as completed Jan 9, 2017
@DanielMorsing
Copy link
Contributor

I encountered a variation on this bug today. It turns out a big-name CDN actually does do half-close on http/1.1.

I'm not sure how you'd detect a half-closed TCP connection. We could assume that the request end being closed when Connection: close is present means that the Close is a half one, but that leaves us with no way of detecting a regular close for the common case.

@bradfitz bradfitz added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. help wanted labels Dec 19, 2017
@bradfitz bradfitz modified the milestones: Go1.8Maybe, Unplanned Dec 19, 2017
@bradfitz bradfitz reopened this Dec 19, 2017
@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Dec 19, 2017
@szuecs
Copy link

szuecs commented Apr 14, 2022

Half-closed connections happen also if you have an AWS NLB in front.
The test code above has a different set of TCP packets, because with NLB you can get an http request body within a FIN packet

20:07:52.359590 IP 3.66.162.240.10407 > 10.149.94.96.9999: Flags [FP.], seq 14405:14620, ack 12932, win 48, options [nop,nop,TS val 3651909883 ecr 1329295683], length 215
E.../.@...=-.B..
.^`(.'....kV..\...0Dq.....
....O;qCGET / HTTP/1.1
X-Flow-Id: dffcd3e3-c08c-4742-b418-3dda2865d9df
Host: nlb-test.example.org
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.13 (Java/11.0.9.1)
Accept-Encoding: gzip,deflate

Also interesting for us here is the Connection: Keep-Alive in a FIN packet, so the client calling through AWS NLB is trying to let the connection open, but NLB seems to decide we should close this for now.
I opened a support case in AWS, but this also shows half-closed connections can happen more in the wild than we expected before. The OK rate is about 99.99% in our measurements, but it can impair latency p999 (Don't ask me about the missing 9).

Just for the record the test code above looks like this in TCP (request is not in FIN packet but just after request sent was FIN packet sent):

11:55:44.839911 IP 127.0.0.1.50402 > 127.0.0.1.9090: Flags [S], seq 1339180433, win 65495, options [mss 65495,sackOK,TS val 640133587 ecr 0,nop,wscale 7], length 0
E..<1.@.@.............#.O.E..........0.........
&'..........
11:55:44.839934 IP 127.0.0.1.9090 > 127.0.0.1.50402: Flags [S.], seq 1542450215, ack 1339180434, win 65483, options [mss 65495,sackOK,TS val 640133588 ecr 640133587,nop,wscale 7], length 0
E..<..@.@.<.........#...[..'O.E......0.........
&'..&'......
11:55:44.839951 IP 127.0.0.1.50402 > 127.0.0.1.9090: Flags [.], ack 1, win 512, options [nop,nop,TS val 640133588 ecr 640133588], length 0
E..41.@.@.............#.O.E.[..(.....(.....
&'..&'..
11:55:44.840209 IP 127.0.0.1.50402 > 127.0.0.1.9090: Flags [P.], seq 1:73, ack 1, win 512, options [nop,nop,TS val 640133588 ecr 640133588], length 72
E..|1.@.@.._..........#.O.E.[..(.....p.....
&'..&'..GET / HTTP/1.1
Host: 127.0.0.1:9090
User-Agent: Go-http-client/1.1


11:55:44.840229 IP 127.0.0.1.9090 > 127.0.0.1.50402: Flags [.], ack 73, win 512, options [nop,nop,TS val 640133588 ecr 640133588], length 0
E..4W|@[email protected]........#...[..(O.E......(.....
&'..&'..
11:55:44.840245 IP 127.0.0.1.50402 > 127.0.0.1.9090: Flags [F.], seq 73, ack 1, win 512, options [nop,nop,TS val 640133588 ecr 640133588], length 0
E..41.@.@.............#.O.E.[..(.....(.....
&'..&'..

@szuecs
Copy link

szuecs commented Apr 25, 2022

#22158 seems to be related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants