Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Transport.request() return a context manager? #145

Closed
florimondmanca opened this issue Aug 9, 2020 · 9 comments
Closed

Should Transport.request() return a context manager? #145

florimondmanca opened this issue Aug 9, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@florimondmanca
Copy link
Member

florimondmanca commented Aug 9, 2020

Prompted by https://github.com/encode/httpx/pull/998#issuecomment-671070588…

cc @tomchristie

Prior art

Right now our transport API looks like this... I'm using a slightly edited version of the README, where we only inspect the response headers:

async with httpcore.AsyncConnectionPool() as http:
    http_version, status_code, reason_phrase, headers, stream = await http.request(
        method=b'GET',
        url=(b'https', b'example.org', 443, b'/'),
        headers=[(b'host', b'example.org'), (b'user-agent', 'httpcore')]
    )

    print(status_code, body)

Although it's definitely not obvious to the un-trained eye, this code could be problematic in some situations: the stream object is a wrapper over an open socket, and we're not enforcing that users close it properly before exiting the context of the associated request.

This example runs fine, but it's mostly by chance. AsyncConnectionPool may open and close connections on a per-request basis, but this doesn't pose a problem because the lifespan of a connection typically outranges that of a request (eg because of Keep-Alive). So the overall machinery occurs at the transport level. In other words, when the AsyncConnectionPool block exits, all underlying connections are closed, and so the underlying socket is closed, and nothing leaks.

But…

Case study: per-request resourceful transport case

Consider the case of ASGITransport in encode/httpx#998, or any other example that requires a per-request I/O or concurrency resource (in the ASGITransport case, a task group for running the ASGI app in the background).

Then currently we must do some unclassy __aenter__()/__aexit__() dance with that resource, so that request() returns a stream containing some kind of reference to that resource, with an aclose_func=... that properly closes said resource. (This is tricky. I think we need to always pass sys.exc_info() to __aexit__() so that the resource is aware that it might be being closed in response to an exception within the block, but I'm not sure).

For example...

from somewhere import create_async_resource

class MyTransport:
    async def request(self, ...):
        resource = create_async_resource()

        await resource.__aenter__()  # "Acquire" resource.

        async def aclose():
            await resource.__aexit__(*sys.exc_info())  # "Cleanup" resource.

        stream = httpcore.AsyncByteStream(..., aclose_func=aclose)

        return (*..., stream)

And now my variant of the README example will be broken for this particular transport:

async with MyTransport() as http:
    http_version, status_code, reason_phrase, headers, stream = await http.request(
        method=b'GET',
        url=(b'https', b'example.org', 443, b'/'),
        headers=[(b'host', b'example.org'), (b'user-agent', 'httpcore')]
    )

    print(status_code, body)

# OOPS! `resource` has leaked...

Leverage transport-level context manager syntax (not satisfactory)

One way to fix it would be to track instances of resource on the transport class, and then on MyTransport.__aexit__() make sure all resource instances are closed. It might not generally be very practical to do this, eg resource might fail if you're trying to close it again. And overall it feels like late patching, instead of fixing the problem as close to its source as possible (the request "context").

Proposal: request-level context manager syntax

On the other hand, what if .request() was returning a context manager, instead of a plain awaitable…?

Then we could do this:

from somewhere import create_async_resource

class MyTransport:
    @asynccontextmanager
    async def request(self, ...):
        async with create_async_resource() as resource:.
            stream = httpcore.AsyncByteStream(...)
            yield (*..., stream)

The usage syntax will change slightly, to become:

async with MyTransport() as http:
    async with http.request(
        method=b'GET',
        url=(b'https', b'example.org', 443, b'/'),
        headers=[(b'host', b'example.org'), (b'user-agent', 'httpcore')]
    ) as (
        http_version,
        status_code,
        reason_phrase,
        headers,
        stream,
    ):
        print(status_code, body)

    # Regardless of whether `stream.aclose()` was called, the stream is now properly closed,
    # and `resource` was cleaned up.

Now the syntax guarantees us that resource will be closed whenever we exit the request context.

Benefits:

  • Stricter following of principles of structured concurrency (which, as we're starting to observe, is generally a good thing).
  • Leads to simpler (as in: "more concise, and less likely to shoot oneself in the foot") implementations for transports that need to hold onto resources for each request.
@tomchristie
Copy link
Member

So, this is a really valuable path to explore, but we do need to be aware of the knock-on implications - there's some big trade-offs to weigh up, and it might well be that it's a bridge too far in purity vs practicality.

I'm fairly sure that having .request() be context managed can't work with the existing httpx streaming API, let's see if I can explain why I think that's the case...

Here's how we handle response streams at the moment.

with httpx.stream(...) as response:
    for chunk in response.iter_text():
        print(chunk)

Now, if the transport .request() method is a context manager, then when we're handling redirects inside the client, we need to step into that context manager, in order to start the request, and have the response information made visible, so as to determine if we need to redirect or not. In pseudo-code it's something like...

with transport.request(...) as response:
    if response.is_redirect:
        # Read the response body, then break out of this context block, and loop making another request.
    else:
         # We've got the final response that we want to return to the user.
         # *However*, returning a response from this point, or exiting the block, will close the response,
         # so we can return an ingested non-streaming response just fine, but we *can't* return a streaming response.

The implication is that the streaming API would need to change, so that any response streaming runs inside the context block. For example...

def handle_stream(response):
    for chunk in response.iter_text():
        print(chunk)

response = httpx.stream(..., on_stream=...)

With an implementation, like...

with transport.request(...) as response:
    if response.is_redirect:
        # Read the response body, then break out of this context block, and loop making another request.
    else:
        if on_stream:
            on_stream(response)
        # Return the response

Anyways...

Pros:

  • It's a nicer API at the plain Transport API level - we're not having to explain "you have to call .close inside a try...finally", instead it's handled for ya.
  • It's possible it'd be a more resilient implementation inside httpcore - having to track the response closing feels a bit fiddly everywhere - might come out cleaner as a context managed API, but we can't really assess that properly without seeing it concretely.
  • Nicer for transports that want eg. a task group context that runs for the lifetime of the request. (Although note that transports can alternately have a task group context that runs for the lifetime of the transport, which is more what you want for eg. background connection management.)

Cons:

  • The high-level streaming API becomes arguably less user friendly.
  • Possibly introduces friction to other existing libraries wanting to adopt httpcore as their backend, because of the constraints it adds on streaming APIs.

@florimondmanca
Copy link
Member Author

@tomchristie I think ExitStack/AsyncExitStack would save the day here, and allow us to nicely convert from the with syntax to the close-callback-based style of the Response model class…

Here's what your redirect example would look like:

from contextlib import ExitStack

exit_stack = ExitStack()

response = exit_stack.enter_context(transport.request(...))

if response.is_redirect:
    # Read the response body, then break out of this context block, and loop making another request.
    for _ in response[4]:  # stream:
        pass
    exit_stack.close()
    return

# We've got the final response that we want to return to the user.
# Return a response which will close the request context on response close.
return Response(..., on_close=exit_stack.close)

Assuming Response.on_close is called properly (which it is, since we enforce streaming w/ with block, or close the response ourselves before returning it in the other case), then this looks like a perfectly valid and correct way of doing things.

Thoughts?

@florimondmanca
Copy link
Member Author

Also…

(Although note that transports can alternately have a task group context that runs for the lifetime of the transport, which is more what you want for eg. background connection management.)

Sure, but I don't really see this as an alternative.

Transport-bound and request-bound contexts are two different things the request API could support. Currently we transport-bound, and this proposal is about adding request-bound support.

(If using trio for example, a nursery that's the same lifespan than the transport won't behave the same way than a nursery tied to a request lifespan, and vice versa. You can cancel a request-level background task on closing the response if you have a request-level nursery, but you can't if all you have is a transport-level nursery. I don't think there's a way around this, at least as trio defines the nursery/task group semantics.)

@tomchristie
Copy link
Member

Def worth digging into yup. Let's just be aware that it would also impact the high-level streaming API, and not get too attached to the issue in case we figure that the practicality / purity trade off isn't worth it. (Eg. if it prevents other libs adopting httpcore because it won't support their streaming API, then it gets a bit awkward.)

@florimondmanca
Copy link
Member Author

@tomchristie Have we already discussed async with http.request() vs async with await http.request()? I believe so, but can't pinpoint where exactly.

My thoughts here are: using async with http.request(), there's no obvious way to switch to the "open a request, close it later" style. http.request() would returns an async context manager, and you'd need to __aenter__() it or feed it to an AsyncExitStack if you'd like to actually kick off the request. OTOH, using async with await http.request(), then it's clearer how to get an "open request": await http.request(). This would return an async manager that just handles the "exit" part — basically sticking very much to trio's concept of an AsyncResource. I'm aware this explanation is a bit abstract but I can't remember what we decided upon, and I think it's worth having the debate.

I think the argument I gave was sticking closer to trio APIs, such as async with await trio.open_file(): ....

@tomchristie
Copy link
Member

Well, let's use this opportunity to talk through this API change...

For these sorts of discussions I find it's generally more valuable to start by talking through the standard sync API, and then look at the corresponding async API.

The existing API that we have is this:

response = http.request(method, url, headers, stream, ext)
status_code, headers, stream, ext = response
try:
    ...
finally:
    stream.close()

The motivation for moving to a context manager is to have a more tightly structured API that always enforces cleaning up when the response stream is no longer required:

with http.request(method, url, headers, stream, ext) as response:
    status_code, headers, stream, ext = response
    ...

Note that stream in this second case is now just a plain old bytes-iterator. - We're not overloading it anymore to also support a .close() method.

The async with await ... dance is motivated by a case where you have an API that can be used either as a context manager, or with an explicit close. In particular, when building an equivalent to the stdlib open(...) function call.

Part of the question there is 1. "Do we want to also support the unconstrained style?" and the other part is 2. "What would the API look like if we did?".

Those answers might well not be the same as trio.open_file(), where they happen to be working against a constraint of "this API should mirror the stdlib open() function".

The argument against supporting both cases is that generally we don't want to make the unconstrained case available. In much the same way that trio's nurseries are deliberate designed to always enforce a context managed style.

Of course we do have an escape hatch for the odd awkward cases...

stack = contextlib.ExitStack()
response = stack.enter_context(http.request(...))
status_code, headers, stream, ext = response
try:
    ...
finally:
    stack.close()

I think the answers to these questions might also be somewhat dependant on if the transport API continues to be a "strictly primitive datatypes only" API, as it currently is.

If we had reasons to switch to an API with "response as an object instance", then there might be a different slant because, for example, this sort of API is pretty typical...

# Either this...
try:
    response = http.request(...)
finally:
    response.close()

# Or this
with http.request(...) as response:
    ...

...and feels more obvious with response.close() as a method, in contrast to the somewhat opaque stream.close().

Anyways, it might be that it's worth us expanding this conversation out a bit, since it really is the core of our last bit of "1.0 API" finalisation. Eg. Perhaps worth chatting out with the urllib3 folks, as we've discussed in the past.

@florimondmanca
Copy link
Member Author

@tomchristie Good point about async with await being about whether we want to allow both usages or just context managers. Given a comment you've made elsewhere that the transport API isn't what 99% of users will be working with, I suppose it's fine if we stick to context managers, and have people figure out that if they need to interface with a callback based client wrapper, then ExitStack is a good way to go.

@florimondmanca
Copy link
Member Author

I'm also quite attached to keeping the transport API as plain as possible, including avoiding any custom models.

We've been able to work with plain data types so far, I'm not sure if this approach has limitations that could be pain points in the future (I'm thinking of special extensions like upgrades, eg WebSocket or H2C), but so far I don't think so.

@tomchristie
Copy link
Member

Closing this now that httpx 0.18 has settled on a plainer style of Transport API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants