Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-detect concurrency backend based on the async environment #389

Closed
florimondmanca opened this issue Sep 25, 2019 · 23 comments
Closed

Auto-detect concurrency backend based on the async environment #389

florimondmanca opened this issue Sep 25, 2019 · 23 comments
Labels
user-experience Ensuring that users have a good experience using the library

Comments

@florimondmanca
Copy link
Member

florimondmanca commented Sep 25, 2019

Currently, if users want to run HTTPX on trio they need to manually pass a TrioBackend to the AsyncClient:

import trio
import httpx
from httpx.concurrency.trio import TrioBackend

async def main():
    async with httpx.AsyncClient(backend=TrioBackend()) as client:
        ...

trio.run(main)

On the other hand, running on asyncio doesn't require passing any backend argument, which is a better user experience we'd like to have in all cases.

We should allow AsyncClient() to auto-detect the async environment it's running in, and use the appropriate backend — and make that the default behavior, instead of asyncio.

Implementation ideas:

  • Create a get_default_concurrency_backend() utility in utils.py. Detecting the async library in use can be done using sniffio. If no backend is available, raise an appropriate exception. (For example, this should currently be the case if we tried to run on curio, since we don't have a CurioBackend.)
  • Use this utility in all places where we currently default to AsyncioBackend as a backend, e.g. AsyncClient, ConnectionPool, etc.
@florimondmanca florimondmanca added good first issue Good for newcomers user-experience Ensuring that users have a good experience using the library labels Sep 25, 2019
@tomchristie
Copy link
Member

tomchristie commented Sep 27, 2019

I'm probably a -1 on this, at least at the current point in time. No great harm in forcing users to be explcit in complex use cases, and help expose what's going on more clearly, rather than trying to hide everything under the carpet.

Some related things that do leap out at me tho are...

  • I wonder if by 1.0 we ought to be using concurrency=TrioConcurrency() for the API naming? "backend" could really mean anything, and isn't clear that what it actually means is "Concurrency Backend"
  • I think we deliberately probably don't want to support a backend=.../concurrency=... argument on the threaded Client, although I've not checked if we do or not.

@tomchristie
Copy link
Member

I might be swayed on it.

@florimondmanca
Copy link
Member Author

I think we deliberately probably don't want to support a backend=.../concurrency=... argument on the threaded Client, although I've not checked if we do or not.

We do support it currently, but there’s a (IMO still to be improved) check that the backend is asyncio-based. This is only for testing/mocking purposes though, so we might as well remove the parameter but keep the backend (or concurrency, which I agree would be a nice refactor) attribute for patching in tests.

@tomchristie
Copy link
Member

so we might as well remove the parameter but keep the backend (or concurrency, which I agree would be a nice refactor) attribute for patching in tests.

Yeah I think that's a good middleground.

@sethmlarson
Copy link
Contributor

sethmlarson commented Sep 27, 2019

Here's my perspective on things bigger-picture. Partially outside the scope of HTTPX but if we're intending the library to be successful this is something that worries me:

Problem: We want to support multiple async libraries to reduce fragmentation.
Problem: We want people to write third-party libraries that use HTTPX. (elasticsearch-py, github3, etc)
Problem: Third-party libraries that use HTTPX also want to support multiple async libraries to reduce fragmentation.

  • Solution 1: Don't auto-detect async library in use, allow user to designate which library is being used.

    • Works for our direct users as they have full control of the HTTPX API
    • Doesn't work for our indirect users through third-party libraries
    • Third party libraries either need their own auto-detection or way to designate just like us and propagate down
  • Solution 2: Auto-detect which async library to use, but don't use a third-party library that is available outside of HTTPX (eg keep using ConcurrencyBackend)

    • Works for our direct users, no need for configuration
    • Third party libraries need to implement their own ConcurrencyBackend-like interface for their own library (or use anyio which IMO is likely)
    • Works for our indirect users as long as all libraries do the above.
  • Solution 3: Use anyio in HTTPX, recommend that library authors using HTTPX as a dependency do the same

    • Async library discovery and handling is left to anyio
    • We would have to switch to structured concurrency (which I believe is where the async space is heading)
    • Works for direct users and third-party users regardless of whether third-party libraries support multiple async libraries or a single async library.

There were discussions elsewhere but I don't think points like this were brought up and wanted to get these ideas out there.

@sethmlarson
Copy link
Contributor

Libraries that are already starting to do this:

@florimondmanca
Copy link
Member Author

florimondmanca commented Sep 27, 2019

Thanks for this write up @sethmlarson! Definitely an important topic.

Third-party libraries that use HTTPX also want to support multiple async libraries to reduce fragmentation.

Just a random thought here: Libraries yes (or maybe), but mostly likely end users that write async apps without thinking too much about the underlying machinery (as they should!) probably don’t have that as a requirement.

recommend that library authors using HTTPX as a dependency do the same

The code examples in the projects you mentioned use anyio.run(main, backend=...) to bootstrap the program. AnyIO docs don’t mention running with the native run() functions either.

If that’s a requirement of anyio, it would mean anyone that wants to use HTTPX has to use anyio, and that unfortunately is unacceptable.

If it’s not a requirement though (I don’t think it actually is!), then we can transparently provide multi-async support via anyio (but that should be considered an implementation detail), and solution 3 sounds about right to me.

(I’m still worried that the async concurrency style that anyio imposes could generate problems for people using HTTPX in more or less advanced ways with asyncio due to a different paradigm. But that’s not really the matter here.)

What I think really is that our choice of async abstraction layer shouldn’t have any impact on how downstream dependents run their programs.

Right now that requirement isn’t satisfied because people who run on anything else than asyncio must pass the concurrency backend explicitly. That may be fine from a first-level user perspective, but that means that this parameter needs to be propagated somehow at higher levels anyway. What’s most likely to happen is that dependents will write their own autodetection to counter the fact that we don’t have it.

I really don’t think explicitness at the API level is necessary nor desirable in the realm of async libraries. I think the fact that they’re running on XYZ-io is explicit enough.

Hopefully the clarifies why auto detection is a must have if we expect HTTPX to be widely used in non-asyncio environments.

@sethmlarson
Copy link
Contributor

Third-party libraries that use HTTPX also want to support multiple async libraries to reduce fragmentation.

Just a random thought here: Libraries yes (or maybe), but mostly likely end users that write async apps without thinking too much about the underlying machinery (as they should!) probably don’t have that as a requirement.

Yep! This was my intention, applications aren't written for more than one async library, they just pick one and do their thing.

I'm very worried about our third-party user-story though. We've done so much for extensibility of the library, middlewares, etc but the current setup would not foster that when it comes time to implement and use those extended features.

recommend that library authors using HTTPX as a dependency do the same

The code examples in the projects you mentioned use anyio.run(main, backend=...) to bootstrap the program. AnyIO docs don’t mention running with the native run() functions either.

If that’s a requirement of anyio, it would mean anyone that wants to use HTTPX has to use anyio, and that unfortunately is unacceptable.

If it’s not a requirement though (I don’t think it actually is!), then we can transparently provide multi-async support via anyio (but that should be considered an implementation detail), and solution 3 sounds about right to me.

It's not a requirement, anyio already uses sniffio to determine the async library to use. I put together this little script to demonstrate:

import anyio
import trio
import asyncio


async def parent():
    async def child(x):
        print(f"sleeping for {x}")
        await anyio.sleep(x)
        print(f"slept for {x}")

    print("start parent")
    async with anyio.create_task_group() as group:
        await group.spawn(child, 1)
        await group.spawn(child, 2)
    print("end parent")


asyncio.run(parent())
trio.run(parent)
anyio.run(parent)

(I’m still worried that the async concurrency style that anyio imposes could generate problems for people using HTTPX in more or less advanced ways with asyncio due to a different paradigm. But that’s not really the matter here.)

Definitely a different paradigm (and a good amount of work too!). I'd say that asyncio picking up a lot of what trio is doing right now is a good indicator it's a beneficial paradigm though.

One of the worries I know was the async with being a requirement to use the Client interface. I was wondering if we'd be able to make HTTP/1.1 work without async with and then the more complex HTTP implementations of HTTP/2 and HTTP/3 require async with due to their background activity?

@thebigmunch
Copy link
Contributor

The code examples in the projects you mentioned use anyio.run(main, backend=...) to bootstrap the program. AnyIO docs don’t mention running with the native run() functions either.

It's not a requirement

And there are examples and caveats in the anyio docs here, FYI.

@sethmlarson
Copy link
Contributor

That makes sense since there's no indication of what async library you want from the anyio.run() scope. Either way our bases are covered since applications and end-users will use the native library .run() function to enter async-land.

@florimondmanca
Copy link
Member Author

And there are examples and caveats in the anyio docs here, FYI.

True that, I actually misread the docs. What I said is wrong:

AnyIO docs don’t mention running with the native run() functions either.

@sethmlarson sethmlarson removed the good first issue Good for newcomers label Sep 30, 2019
@tomchristie
Copy link
Member

tomchristie commented Sep 30, 2019

Context managed client instantiation

I've not yet seen any concrete evidence that backgrounding is strictly neccessary for HTTP/2 / HTTP/3 (vs. the alternative of allowing any background work to occur during any other interactions) I'm open to it potentially being a requirement, but I've not seen that throughly demonstrated at this point in time.

A good point of reference here would be to see how browsers treat HTTP/2 connections to hosts with open tabs - Do they continually ping them in the background? Does the behavior change for inactive tabs?

It'd be a fairly reasonable policy to allow connections to lapse with ping timeouts in cases where we're not performing any network activity with the client at all.

Either ways around, something that I'd like to see addressed there is how the client instantitation pattern looks in a standard web app setup. To explain that it's easiest to just work from the basic threaded case, and a bog-standard WSGI app, and ask "how does client setup look if you want to use a context-managed client instantiation?"

Typically in a flask app or whatever, if you wanted to use a single requests session for all outgoing requests you can do...

client = requests.Session()  # or httpx.Client()

Somewhere in some global app config, and then just import and use client everywhere.

I'm not sure how the pattern would look instead if you want to use a context managed with requests.Session() as client: ... block? (or httpx.Client() block.)

It makes more sense to answer that for the standard client / WSGI case, since it's the simpler variant of the two, but still has the same requirement if you wanted strictly block managed backgrounding.

anyio.

I don't think it really buys us anything given the legwork we've already done in this space.
We'll be compatible with other implementations that're using anyio. Also worth noting that we don't have any additional requirements re. "structured concurrency" - only place it'd be relavant is #50 which is already designed in line with the structured concurrency constraints.

auto-detecting async environment.

Perhaps. I don't think there's any definitive answer on the implict vs. explcit choice here.

As it happens I'd still potentially be open to moving trio support from core into a third party. (I'm totally in favor of the design space that Trio is helping push forward, but it's an open question if fully embracing the ecosystem split is a good trade off from our POV, vs. pushing for stdlib to incrementally adopt their design learnings.)

Really I'd kinda like to hear more first from the trio crew about what (if any) blockers they currently have around adopting httpx? (TBH I didn't realise that active work was still ongoing on the urllib3 branch, so I defo think there's more communication need here)

@sethmlarson
Copy link
Contributor

@tomchristie I have more comments but just to provide some evidence re: disconnecting from not responding to pings: #381

@tomchristie
Copy link
Member

Right, but “transparently reconnect if we try to do something against a disconnected connection” is also a valid remedy there.

Additionally, inactive clients probably should disconnect.

@pquentin
Copy link
Contributor

pquentin commented Oct 1, 2019

Really I'd kinda like to hear more first from the trio crew about what (if any) blockers they currently have around adopting httpx? (TBH I didn't realise that active work was still ongoing on the urllib3 branch, so I defo think there's more communication needed here)

I agree! @njsmith and I should have spent more time explaining why we're continuing the work on our urllib3 branch, and I hope we'll be able to do this in the next few days.

@sethmlarson
Copy link
Contributor

@tomchristie

I'm not sure how the pattern would look instead if you want to use a context managed with requests.Session() as client: ... block? (or httpx.Client() block.)

It makes more sense to answer that for the standard client / WSGI case, since it's the simpler variant of the two, but still has the same requirement if you wanted strictly block managed backgrounding.

What are your thoughts on having the Client only allow HTTP/2 within async with blocks?
We're going to hit more issues with this when we try to implement HTTP/3 with retransmission alarms.

anyio.

I don't think it really buys us anything given the legwork we've already done in this space.
We'll be compatible with other implementations that're using anyio. Also worth noting that we don't have any additional requirements re. "structured concurrency" - only place it'd be relavant is #50 which is already designed in line with the structured concurrency constraints.

I'd argue that this one point would be a very big win for reducing fragmentation. We want library designers to make their library compatible with all async libraries the same as we are.
(ie we don't want github-api-asyncio, github-api-trio, etc)

We talk a lot about shelling functionality out to third-party libraries like retries, caching, middlewares:
Those libraries will be harder to implement without a change to what we're currently doing.

With async env auto-detection we at least make it so that libraries don't have to tell us what backend to use but we basically pass the buck up to the libraries to implement their own pluggable backends for their async tasks. I guarantee the common choice won't be to do what we've done with HTTPX's pluggable backends, it'll be to use anyio.

auto-detecting async environment.

Perhaps. I don't think there's any definitive answer on the implict vs. explicit choice here.

What sort of benefits would we gain for being "explicit"?

From what I see it's either you get the backend setting correct and everything works or you don't and everything explodes.
When is that better than always getting the backend setting correct?

As it happens I'd still potentially be open to moving trio support from core into a third party. (I'm totally in favor of the design space that Trio is helping push forward, but it's an open question if fully embracing the ecosystem split is a good trade off from our POV, vs. pushing for stdlib to incrementally adopt their design learnings.)

I don't see what benefits we gain from moving Trio support to a third-party package.

@tomchristie
Copy link
Member

What are your thoughts on having the Client only allow HTTP/2 within async with blocks?
We're going to hit more issues with this when we try to implement HTTP/3 with retransmission alarms.

I could go either way on it. It's not been clearly demonstrated that it's strictly neccessary ATM, but I'm open to the possibility.

I don't see what benefits we gain from moving Trio support to a third-party package.

Again - I'm open minded on this. I don't think there's any clear path on what's better for the ecosystem as a whole between:

  • Supporting stdlib as the built-in, but with Trio as an optional. Pushing for more trio-like stdlib primitives and packages. (Eg. we really need a nursery-like primitive for stdlib or pypi, and to start always pushing users to only ever branch concurrency using structured blocks, rather than fire-and-potentially-forget tasks.)
  • Supporting both stdlib and Trio built-in, prefering neither, be okay with the trade-off that it'll further fragment the ecosystem between the two.

Wrt. anyio - I'd suggest the sensible approach here would be that if someone's invested in using that, then building out a new concurrency backend that use anyio would be the thing to do. That'd absolutely demonstrate it, and we'd then have the option of plugging that in and dropping our other backends.

we at least make it so that libraries don't have to tell us what backend to use but we basically pass the buck up to the libraries to implement their own pluggable backends for their async tasks.

Again, it depends what we think is best for the ecosystem here. It's not obvious that writing third party packages that support both Trio and stdlib is neccessarily a good thing to advocate for. (It may well be, but there's also drawbacks.)

I just think that we ought to be a bit cautious about what direction we're generally advocating/documenting for Python's async space right now. "Support all the things" is not neccessarily the preffered option.

My rule of thumb here has generally been that Trio is more throughly designed, but stdlib is where we are, and that helping guide users towards using stdlib with structured concurrency approaches is likely the happy path. (Gradually phasing out or sidelining the lower-level primitives.) although I'm open to discussion on that.

Short: I think I'm very probably okay with us using sniffio to automatically switch between our two existing options, since that's a small change footprint. I'm also okay with someone independantly building out a concurrency backend using anyio - that'd give as options as to where we want to go.

@pquentin
Copy link
Contributor

pquentin commented Nov 7, 2019

I agree! @njsmith and I should have spent more time explaining why we're continuing the work on our urllib3 branch, and I hope we'll be able to do this in the next few days.

Oops! More than one month has passed by. @njsmith told me he would like to write a proper comparison at some point, but that takes time and energy. I certainly find it mentally taxing to explain why I prefer to contribute to project A and not project B, when the two projects have similar goals!

I know Nathaniel has some technical reasons that I would not be able to explain well. So this will only be part of the story, but I can explain why I am working on the urllib3 fork since January 2018: I don't want to waste the huge amount of real world experience urllib3 has. It seems wasteful to have to relearn this by waiting for a slow trickle of bug reports. In fact much of my work on the fork has been upstreamed (here's a recent example) and this code now benefits the millions of urllib3 downloads that happen every day. And we continue to merge the changes made in urllib3 itself regularly, and all the sync tests for the core library actually pass, so we'll soon be able to focus on improving the async support.

@tomchristie
Copy link
Member

Ah s'all good. The important thing from my POV is that we make sure we're communicating and learning between ourselves - I'm seeing such a positive and constructive attitude on all sides, and yeah I totally dig that sometimes it's easier to just demonstrate first.

There's plenty of difficult trade-offs wrt. maturing the async ecosystem in Python - what I'm really appreciating at the moment is that I'm seeing a healthy respect for different folks taking different tacks onto helping be part of that.

@tomchristie
Copy link
Member

One thing that I do think would be interesting here would be attempting to implement an urllib3-based dispatch class, with HTTPX's client as the frontend API. Tho I don't actually have any extra headroom myself to actually dig into that right now. 🤣

@tomchristie
Copy link
Member

Come back just to the "should we auto-detect the concurrency backend, based on the async environment.

Looking at it now, the answer to me (coming to it with a fresh pair of eyes) is a clear yes - of course we should be using autodection (with sniffio) - why wouldn't we want to do that?

I think our interface here ought to be...

class Client:
   def __init__(self, ..., backend = typing.Union[str, ConcurrencyBackend])
       ...

Usage:

client = Client()  # Auto detect backend
client = Client(backend='trio')  # Or 'asyncio'. Potentially also 'anyio', 'curio', 'twisted' options at some point.
client = Client(backend=AsyncioBackend())

@millefalcon
Copy link
Contributor

This has been added now, no ?

self.backend = lookup_backend(backend)

@tomchristie
Copy link
Member

In master now, yup! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user-experience Ensuring that users have a good experience using the library
Projects
None yet
Development

No branches or pull requests

6 participants