-
-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-detect concurrency backend based on the async environment #389
Comments
I'm probably a -1 on this, at least at the current point in time. No great harm in forcing users to be explcit in complex use cases, and help expose what's going on more clearly, rather than trying to hide everything under the carpet. Some related things that do leap out at me tho are...
|
I might be swayed on it. |
We do support it currently, but there’s a (IMO still to be improved) check that the backend is asyncio-based. This is only for testing/mocking purposes though, so we might as well remove the parameter but keep the backend (or concurrency, which I agree would be a nice refactor) attribute for patching in tests. |
Yeah I think that's a good middleground. |
Here's my perspective on things bigger-picture. Partially outside the scope of HTTPX but if we're intending the library to be successful this is something that worries me: Problem: We want to support multiple async libraries to reduce fragmentation.
There were discussions elsewhere but I don't think points like this were brought up and wanted to get these ideas out there. |
Libraries that are already starting to do this: |
Thanks for this write up @sethmlarson! Definitely an important topic.
Just a random thought here: Libraries yes (or maybe), but mostly likely end users that write async apps without thinking too much about the underlying machinery (as they should!) probably don’t have that as a requirement.
The code examples in the projects you mentioned use anyio.run(main, backend=...) to bootstrap the program. AnyIO docs don’t mention running with the native run() functions either. If that’s a requirement of anyio, it would mean anyone that wants to use HTTPX has to use anyio, and that unfortunately is unacceptable. If it’s not a requirement though (I don’t think it actually is!), then we can transparently provide multi-async support via anyio (but that should be considered an implementation detail), and solution 3 sounds about right to me. (I’m still worried that the async concurrency style that anyio imposes could generate problems for people using HTTPX in more or less advanced ways with asyncio due to a different paradigm. But that’s not really the matter here.) What I think really is that our choice of async abstraction layer shouldn’t have any impact on how downstream dependents run their programs. Right now that requirement isn’t satisfied because people who run on anything else than asyncio must pass the concurrency backend explicitly. That may be fine from a first-level user perspective, but that means that this parameter needs to be propagated somehow at higher levels anyway. What’s most likely to happen is that dependents will write their own autodetection to counter the fact that we don’t have it. I really don’t think explicitness at the API level is necessary nor desirable in the realm of async libraries. I think the fact that they’re running on XYZ-io is explicit enough. Hopefully the clarifies why auto detection is a must have if we expect HTTPX to be widely used in non-asyncio environments. |
Yep! This was my intention, applications aren't written for more than one async library, they just pick one and do their thing. I'm very worried about our third-party user-story though. We've done so much for extensibility of the library, middlewares, etc but the current setup would not foster that when it comes time to implement and use those extended features.
It's not a requirement, anyio already uses import anyio
import trio
import asyncio
async def parent():
async def child(x):
print(f"sleeping for {x}")
await anyio.sleep(x)
print(f"slept for {x}")
print("start parent")
async with anyio.create_task_group() as group:
await group.spawn(child, 1)
await group.spawn(child, 2)
print("end parent")
asyncio.run(parent())
trio.run(parent)
anyio.run(parent)
Definitely a different paradigm (and a good amount of work too!). I'd say that asyncio picking up a lot of what trio is doing right now is a good indicator it's a beneficial paradigm though. One of the worries I know was the |
And there are examples and caveats in the anyio docs here, FYI. |
That makes sense since there's no indication of what async library you want from the anyio.run() scope. Either way our bases are covered since applications and end-users will use the native library .run() function to enter async-land. |
True that, I actually misread the docs. What I said is wrong:
|
Context managed client instantiationI've not yet seen any concrete evidence that backgrounding is strictly neccessary for A good point of reference here would be to see how browsers treat HTTP/2 connections to hosts with open tabs - Do they continually ping them in the background? Does the behavior change for inactive tabs? It'd be a fairly reasonable policy to allow connections to lapse with ping timeouts in cases where we're not performing any network activity with the client at all. Either ways around, something that I'd like to see addressed there is how the client instantitation pattern looks in a standard web app setup. To explain that it's easiest to just work from the basic threaded case, and a bog-standard WSGI app, and ask "how does client setup look if you want to use a context-managed client instantiation?" Typically in a flask app or whatever, if you wanted to use a single requests session for all outgoing requests you can do... client = requests.Session() # or httpx.Client() Somewhere in some global app config, and then just import and use I'm not sure how the pattern would look instead if you want to use a context managed It makes more sense to answer that for the standard client / WSGI case, since it's the simpler variant of the two, but still has the same requirement if you wanted strictly block managed backgrounding. anyio.I don't think it really buys us anything given the legwork we've already done in this space. auto-detecting async environment.Perhaps. I don't think there's any definitive answer on the implict vs. explcit choice here. As it happens I'd still potentially be open to moving Really I'd kinda like to hear more first from the trio crew about what (if any) blockers they currently have around adopting httpx? (TBH I didn't realise that active work was still ongoing on the urllib3 branch, so I defo think there's more communication need here) |
@tomchristie I have more comments but just to provide some evidence re: disconnecting from not responding to pings: #381 |
Right, but “transparently reconnect if we try to do something against a disconnected connection” is also a valid remedy there. Additionally, inactive clients probably should disconnect. |
I agree! @njsmith and I should have spent more time explaining why we're continuing the work on our urllib3 branch, and I hope we'll be able to do this in the next few days. |
What are your thoughts on having the Client only allow HTTP/2 within
I'd argue that this one point would be a very big win for reducing fragmentation. We want library designers to make their library compatible with all async libraries the same as we are. We talk a lot about shelling functionality out to third-party libraries like retries, caching, middlewares: With async env auto-detection we at least make it so that libraries don't have to tell us what backend to use but we basically pass the buck up to the libraries to implement their own pluggable backends for their async tasks. I guarantee the common choice won't be to do what we've done with HTTPX's pluggable backends, it'll be to use anyio.
What sort of benefits would we gain for being "explicit"? From what I see it's either you get the backend setting correct and everything works or you don't and everything explodes.
I don't see what benefits we gain from moving Trio support to a third-party package. |
I could go either way on it. It's not been clearly demonstrated that it's strictly neccessary ATM, but I'm open to the possibility.
Again - I'm open minded on this. I don't think there's any clear path on what's better for the ecosystem as a whole between:
Wrt. anyio - I'd suggest the sensible approach here would be that if someone's invested in using that, then building out a new concurrency backend that use anyio would be the thing to do. That'd absolutely demonstrate it, and we'd then have the option of plugging that in and dropping our other backends.
Again, it depends what we think is best for the ecosystem here. It's not obvious that writing third party packages that support both Trio and stdlib is neccessarily a good thing to advocate for. (It may well be, but there's also drawbacks.) I just think that we ought to be a bit cautious about what direction we're generally advocating/documenting for Python's async space right now. "Support all the things" is not neccessarily the preffered option. My rule of thumb here has generally been that Trio is more throughly designed, but stdlib is where we are, and that helping guide users towards using stdlib with structured concurrency approaches is likely the happy path. (Gradually phasing out or sidelining the lower-level primitives.) although I'm open to discussion on that. Short: I think I'm very probably okay with us using |
Oops! More than one month has passed by. @njsmith told me he would like to write a proper comparison at some point, but that takes time and energy. I certainly find it mentally taxing to explain why I prefer to contribute to project A and not project B, when the two projects have similar goals! I know Nathaniel has some technical reasons that I would not be able to explain well. So this will only be part of the story, but I can explain why I am working on the urllib3 fork since January 2018: I don't want to waste the huge amount of real world experience urllib3 has. It seems wasteful to have to relearn this by waiting for a slow trickle of bug reports. In fact much of my work on the fork has been upstreamed (here's a recent example) and this code now benefits the millions of urllib3 downloads that happen every day. And we continue to merge the changes made in urllib3 itself regularly, and all the sync tests for the core library actually pass, so we'll soon be able to focus on improving the async support. |
Ah s'all good. The important thing from my POV is that we make sure we're communicating and learning between ourselves - I'm seeing such a positive and constructive attitude on all sides, and yeah I totally dig that sometimes it's easier to just demonstrate first. There's plenty of difficult trade-offs wrt. maturing the async ecosystem in Python - what I'm really appreciating at the moment is that I'm seeing a healthy respect for different folks taking different tacks onto helping be part of that. |
One thing that I do think would be interesting here would be attempting to implement an |
Come back just to the "should we auto-detect the concurrency backend, based on the async environment. Looking at it now, the answer to me (coming to it with a fresh pair of eyes) is a clear yes - of course we should be using autodection (with I think our interface here ought to be... class Client:
def __init__(self, ..., backend = typing.Union[str, ConcurrencyBackend])
... Usage: client = Client() # Auto detect backend
client = Client(backend='trio') # Or 'asyncio'. Potentially also 'anyio', 'curio', 'twisted' options at some point.
client = Client(backend=AsyncioBackend()) |
This has been added now, no ? httpx/httpx/dispatch/connection_pool.py Line 95 in 81edb1b
|
In |
Currently, if users want to run HTTPX on
trio
they need to manually pass aTrioBackend
to theAsyncClient
:On the other hand, running on
asyncio
doesn't require passing anybackend
argument, which is a better user experience we'd like to have in all cases.We should allow
AsyncClient()
to auto-detect the async environment it's running in, and use the appropriate backend — and make that the default behavior, instead of asyncio.Implementation ideas:
get_default_concurrency_backend()
utility inutils.py
. Detecting the async library in use can be done using sniffio. If no backend is available, raise an appropriate exception. (For example, this should currently be the case if we tried to run oncurio
, since we don't have aCurioBackend
.)AsyncioBackend
as abackend
, e.g.AsyncClient
,ConnectionPool
, etc.The text was updated successfully, but these errors were encountered: