Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel requests #50

Closed
tomchristie opened this issue May 9, 2019 · 4 comments
Closed

Parallel requests #50

tomchristie opened this issue May 9, 2019 · 4 comments
Milestone

Comments

@tomchristie
Copy link
Member

tomchristie commented May 9, 2019

We could consider adding a concurrency API to simplify making multiple requests in parallel.

I'll give sync examples here, but we'd also have equivelent async cases too.

If you have a number of requests that you'd like to send in paralell, then...

with client.parallel() as parallel:
    homepage = parallel.get("http://example.com/homepage")
    another = parallel.get("http://example.com/another")
    response = homepage.get_response()
    response = another.get_response()

Alternatively, if you don't need to care about getting the responses back out-of-order, then:

with client.parallel() as parallel:
    for page_number in range(0, 10):
        parallel.get(f"http://example.com/{page_number}")
    while parallel.pending:
        response = parallel.next_response()

Nice things here:

  • Bring the goodness of async's lightweight parallelization to standard threaded code.
  • For the async case, users don't need to touch lower-level flow-forking primitives. asyncio.gather or whatevs.
@tomchristie tomchristie changed the title Concurrency Parallel requests May 14, 2019
@StephenBrown2
Copy link
Contributor

StephenBrown2 commented Jun 25, 2019

Currently I'm using requests-async to generate a bunch of requests to get simultaneously and using asyncio.gather and to run them all (Modified from actual code that splits up a large range into multiple "pages"):

async def get_async_results(*, kwargs):
    ...
    queries = []
    for i in ...:
        # modify params dict
        queries.append(async_session.get("/services.json", params=params))
    return await asyncio.gather(*queries)

results = asyncio.run(get_async_results(kwargs))
return sorted([inner for outer in results for inner in outer])

And then I join the results together and return the resulting formatted data.

I recently came across the requests-toolbelt again and found it has a solution for threading as well:
https://toolbelt.readthedocs.io/en/latest/threading.html
https://github.com/requests/toolbelt/blob/master/requests_toolbelt/threaded/pool.py

Is the second example something I could use in my current code? I'd imagine something as simple as:

results = []
with client.parallel() as parallel:
    for page_number in range(0, 10):
        parallel.get(f"http://example.com/{page_number}")
    while parallel.pending:
        results.extend(parallel.next_response())
results.sort()

Would get me far.

@yunstanford
Copy link
Contributor

Just curious, why not just asyncio.gather ?

@StephenBrown2
Copy link
Contributor

The ability to loop through several generated requests and extend a list of results is appealing, and I haven't figured out how to do that with asyncio.gather() yet.

@tomchristie
Copy link
Member Author

Let's just leave this up to the primatives of whatever concurrency framework the user is working with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants