Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown instead of close. #759

Merged
merged 1 commit into from
Sep 10, 2019
Merged

Conversation

kazu-yamamoto
Copy link
Contributor

This patch lets Warp to use shutdown instead of close.

Background: when Warp sends GOAWAY in HTTP/2, close is called immediately. The kernel of the browser sends TCP ACK regarding to GOAWAY, but the socket of the server is already closed. So, TCP RST is sent back from the server to the client.

"UNIX Network Programming" suggests to use shutdown with SHUT_WR. This makes sure that the TCP ACK is processed in the server properly and the socket is closed in the proper timing. Note that the combination of Linger and close does not help.

Cc: @nh2

@nh2
Copy link

nh2 commented Jun 28, 2019

I confirmed that this fixes two bugs which h2spec finds.

@kazu-yamamoto Can you post how h2spec is invoked and the details of the problems it finds?

It'd be very interested to learn how that works.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding

  • some detail to the commit message (so that if this breaks people and they bisect it, they can easily find what's going on), and
  • a comment to the code that explains the situation (because it isn't obvious why connClose doesn't just call close, and it's not an easy topic, so I think comments are warranted)

Ideally also a link to issue #673.

@@ -64,7 +64,7 @@ socketConnection s = do
connSendMany = Sock.sendMany s
, connSendAll = sendall
, connSendFile = sendFile s writeBuf bufferSize sendall
, connClose = close s
, connClose = shutdown s ShutdownSend `E.catch` \(SomeException _) -> return ()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why is it sufficient to call shutdown but never close afterwards?
  2. What if shutdown() returns ENOTCONN, do we leak the socket?
  3. Is the catch-all exception handler good, or should it be more specific, so that we can observe bugs in the code? For example, should it not be at max some IOError? Should it perhaps be even specific to some errno we can expect, like ENOTCONN?
  4. Of the 4 steps I list in Missing response for certain POST requests nodejs/node#12339 (comment), this does only 2 steps. Don't we have to
    • Keep read()ing from the socket until the empty string is read
    • Call close() on the socket
  5. What about the case of misbehaving clients that never close their side? Will they be able to drain resources from the server by keeping ports open? Should there be some timeout/setting/mechanism to close() if the other side doesn't close() after some expected duration?

Even if these are fine, I think it deverses a code comment why they are fine.

@kazu-yamamoto
Copy link
Contributor Author

% h2spec http2/5.4 http2/7
  5. Streams and Multiplexing
    5.4. Error Handling
      5.4.1. Connection Error Handling
        × 1: Sends an invalid PING frame for connection close
          -> The endpoint MUST close the TCP connection
             Expected: Connection closed
               Actual: Error: read tcp 127.0.0.1:56529->127.0.0.1:80: read: connection reset by peer

  7. Error Codes
    × 1: Sends a GOAWAY frame with unknown error code
      -> The endpoint MUST NOT trigger any special behavior.
         Expected: Connection closed
                   PING Frame (length:8, flags:0x01, stream_id:0, opaque_data:h2spec)
           Actual: Error: read tcp 127.0.0.1:56578->127.0.0.1:80: read: connection reset by peer

@kazu-yamamoto
Copy link
Contributor Author

As you can see, h2spec expects connection close in both cases but receives TCP RST.

@kazu-yamamoto
Copy link
Contributor Author

It is already Saturday in Japan. So, I will answer other stuffs in Monday.

@kazu-yamamoto
Copy link
Contributor Author

@snoyberg I guess that it's better for WarpTLS to use shutdown instead of close. I will also take care of this in next Monday.

@snoyberg
Copy link
Member

I'd defer to @nh2 on my review, his points subsume what I noticed. The biggest question I was left with was why not do both shutdown and close.

@snoyberg
Copy link
Member

Oh, also: making the same change to warp-tls would make sense to me.

@kazu-yamamoto
Copy link
Contributor Author

Quick reply: shutdown is a graceful close. There are alternatives. You don't have to call both. Rather, we must not call both for graceful closing.

@nh2
Copy link

nh2 commented Jun 30, 2019

You don't have to call both.

Is that true?

It contradicts my understanding of man 2 shutdown and man 2 close and this StackOverflow answer:

However, close is the way to actually destroy a socket.

With shutdown, you will still be able to receive pending data the peer already sent.

The second sentence suggests that you can continue to use the socket FD in some ways after calling shutdown(); this implies that the socket is still allocated in the kernel.

That is in line with what I wrote in the link and text posted in my detail comment above: #759 (comment)

@kazu-yamamoto
Copy link
Contributor Author

@nh2 You are right. I considered only normal cases. I will consider error cases.

@kazu-yamamoto
Copy link
Contributor Author

@nh2 I rewrote the code and did push -f.

I'm not 100% convinced yet. What happens if TCP FIN is not returned from the client while receiving data? Does TCP timeout surely happen?

@kazu-yamamoto
Copy link
Contributor Author

  • I guess that we cannot use SO_RCVTIMEO which results in EWOULDBLOCK on timeout. Sockets in Haskell are non-blocking where EWOULDBLOCK is also returned in many cases. We cannot distinguish them. What we can do is to set sockets back to blocking, set SO_RCVTIMEO and call recv(2) (not Haskell's recv). All parts are not provided from the network library.
  • SO_KEEPALIVE would be another option. But the timeout is too long in some OSes and the timeout value cannot change for each socket.

@kazu-yamamoto
Copy link
Contributor Author

OK. As one of maintainers of the network library, I decided to implement gracefulClose in network.

haskell/network#417

@kazu-yamamoto
Copy link
Contributor Author

I did push -f to use gracefulClose provided in the network lib.
We should wait for haskell/network#417

@kazu-yamamoto
Copy link
Contributor Author

@snoyberg It appeared that this PR makes the tests very slow. This is probably because http-client does not sent TCP FIN immediately after Manager is not in use. closeManager is now obsoleted. Are there any ways to close all connections under Manager explicitly and immediately?

@snoyberg
Copy link
Member

There's nothing currently. If I added something on master to close out all open connections, would you be able to test that it speeds up the tests?

@kazu-yamamoto
Copy link
Contributor Author

Yes, of course!

@snoyberg
Copy link
Member

Actually, there might be a simpler solution: how about setting managerConnCount to 0? Or maybe managerIdleConnectionCount?

@kazu-yamamoto
Copy link
Contributor Author

@snoyberg It works. Thanks!

But testing is still slow for other reasons. I try to understand where the slow down comes from.

Copy link
Member

@snoyberg snoyberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

is called immediately. This is reported in yesodweb#673.

We can reproduce this with h2spec:

% h2spec http2/5.4 http2/7
  5. Streams and Multiplexing
    5.4. Error Handling
      5.4.1. Connection Error Handling
        × 1: Sends an invalid PING frame for connection close
          -> The endpoint MUST close the TCP connection
             Expected: Connection closed
               Actual: Error: read tcp 127.0.0.1:56529->127.0.0.1:80: read: connection reset by peer

  7. Error Codes
    × 1: Sends a GOAWAY frame with unknown error code
      -> The endpoint MUST NOT trigger any special behavior.
         Expected: Connection closed
                   PING Frame (length:8, flags:0x01, stream_id:0, opaque_data:h2spec)
           Actual: Error: read tcp 127.0.0.1:56578->127.0.0.1:80: read: connection reset by peer

To fix this, we use "gracefulClose" in the network library.
kazu-yamamoto added a commit to kazu-yamamoto/wai that referenced this pull request Sep 10, 2019
@kazu-yamamoto kazu-yamamoto merged commit aee7377 into yesodweb:master Sep 10, 2019
@kazu-yamamoto kazu-yamamoto deleted the shutdown branch September 10, 2019 02:39
@kazu-yamamoto
Copy link
Contributor Author

Since the PR of network has been merged, I merged this PR.
Also, new versions of warp and warp-tls have been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate how warp handles HTTP/2 GOAWAY
3 participants