Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect EOF signaling remote server closed connection #143

Conversation

yeraydiazdiaz
Copy link
Contributor

Fixes #96

Raise ConnectionClosedByRemote and handle on send

Yeray Diaz Diaz added 2 commits July 25, 2019 12:20
Raise ConnectionClosedByRemote and handle on `send`
httpx/dispatch/http11.py Outdated Show resolved Hide resolved
@@ -108,6 +108,9 @@ def __init__(

return data

def is_connection_dropped(self) -> bool:
return self.stream_reader.at_eof()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should clarify this seemingly simplistic implementation. The StreamReaderProtocol.connection_lost callback calls stream_reader.feed_eof which in turn sets the EOF flag which at_eof returns.

@@ -117,6 +117,8 @@ def __init__(
Send a single `h11` event to the network, waiting for the data to
drain before returning.
"""
if self.reader.is_connection_dropped():
raise NotConnected
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the connection status here also prevents us from sending the request event which, as @tomchristie mentioned, makes things ambiguous for non-idempotent requests.

response = await http.request("GET", "http://127.0.0.1:8000/")
await server.shutdown() # shutdown the server to close the connection before receiving the data

with pytest.raises(httpx.exceptions.NotConnected):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is NotConnected the right exception here? Maybe we should wrap it into a more descriptive Exception and error message?

@tomchristie
Copy link
Member

Okay - this is looking great! Nice work.

Now, rather than using NotConnected we should instead expose .is_connection_dropped() on the HTTP11Connection and HTTP2Connection classes, and the general HTTPConnection. Then the connection pool should check it explicitly on acquiring a keep-alive connection. (Rather than attempting to send and catching NotConnected if it occurs.)

@sethmlarson sethmlarson self-requested a review July 25, 2019 16:23
@yeraydiazdiaz
Copy link
Contributor Author

That sounds good for the two-request scenario, but wouldn't we still need an exception to cover to the last test case where the connection is lost before the response body is read?

@tomchristie
Copy link
Member

Not exactly.

We’ll remove NotConnected completely, and treat any stream exception that occurs once we start sending as a “disconnected while sending” case.

We will want to make sure we’ve got well defined exception cases for those events, but that doesn’t need to be part of this PR since it’s a different issue from the “keep alive connection had expired” case.

@njsmith
Copy link

njsmith commented Jul 25, 2019

Yeah, the right logic is: only put idle connections in the pool immediately after a successful call to start_next_cycle. When taking an idle connection from the pool to reuse, first check if the underlying socket is readable (ignoring any TLS layer). Implementing this looks very different on different backends, but they can all do it one way or another. If the socket is readable, close it and discard it and try again; otherwise go ahead and use it.

It seems like y'all are still spending a lot of time reinventing stuff here :-/. Do you have an abstraction over backends yet, or is everything still using asyncio only?

@sethmlarson
Copy link
Contributor

Btw @njsmith you meant "is NOT readable" for your close and disregard condition?

@njsmith
Copy link

njsmith commented Jul 25, 2019

No, I mean readable. Remember, we think the connection is supposed to be idle, so there shouldn't be any data to read. If it's marked readable, then that means the server is either sending some data in violation of the protocol, or is sending a 408 Request Timeout, or (most likely) has simply closed the connection. (select and friends mark a socket readable when the other side has closed it, which is a little weird, but does make sense – recv will return b"", but it won't block.)

But, we shouldn't actually try to read from the connection while doing this check, because if it's not readable, then we'll block forever, and you wouldn't expect taking an idle connection out of the pool to be a blocking operation that needs a timeout. So it's a weird special-case concept. Trio sockets actually have a special method just to handle this case: python-trio/trio#760

@tomchristie
Copy link
Member

Let’s try not to railroad this PR with extraneous stuff.

There’s an isolated issue with the “is this connection still alive” on reaquiry of keep-alive connections, because I’d incorrectly assumed that “try writing to the socket” would be the robust way to check for that case. This PR resolves that, and now just needs a little extra rejigging to stop using an unneeded exception-based flow.

(Nothing to do with h11 state or anything else like that here)

Yup we have a concurrency backend interface, let’s discuss that against a separate issue.

@njsmith
Copy link

njsmith commented Jul 25, 2019

There’s an isolated issue with the “is this connection still alive” on reaquiry of keep-alive connections, because I’d incorrectly assumed that “try writing to the socket” would be the robust way to check for that case.

Right, the problem with using writing to check for connection liveness is that you can't tell whether the server started processing the request or not. So if a write fails, you have to give up and report that back to the user, and let them decide what they want to do about it. But if you can tell that the server closed the connection before you started writing, then you know it's safe to go ahead and open a new connection, so the client library can handle it transparently without getting the user involved.

I believe that the right way to check for readability on different backends is:

  • On asyncio, using the asyncio Stream API: well, you actually can't do it using public APIs. There's StreamReader.at_eof, but that will fail if the HTTP server is polite and sends a 408 Request Timeout before it closes the connection. I think the best you can do is:

    def asyncio_stream_is_readable(stream):
        return stream._buffer or stream._eof

    or else giving up on the Stream API and using protocols/transports directly.

  • On trio, using the trio Stream API: this isn't exposed in Trio's stream abstraction, but we know that we don't have just any stream, but rather a SocketStream, possibly with one or more layers of SSLStream wrapped around it. So you can do:

    def trio_stream_is_readable(stream):
        # Peek through any SSLStream wrappers to get the underlying SocketStream
        while hasattr(stream, "transport_stream"):
            stream = stream.transport_stream
        assert isinstance(stream, trio.SocketStream)
        return stream.socket.is_readable()

    (This requires the next release of trio, which will be out soon.)

  • On raw sockets: If you only care about recent Python 3, then you can do something like: https://github.com/python-trio/trio/blob/a9bb9343487c2c64234267d66e04beeccafe2974/trio/_socket.py#L482

    If you need to be fully portable, then things are more complicated: https://github.com/urllib3/urllib3/blob/master/src/urllib3/util/wait.py

    (But you need most of that anyway to do I/O on raw sockets anyway.)

(Does asyncio provide any way to get at the raw socket underneath a StreamReader? If it does then that might let you do something that doesn't use private APIs, though it'll be pretty convoluted.)

@tomchristie
Copy link
Member

Closing in favor of #145 which adds a last little bit of clean up on top of @yeraydiazdiaz's great work here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants