Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-104144: Skip scheduling a done callback if a TaskGroup task completes eagerly #104140

Merged
merged 6 commits into from
May 5, 2023

Conversation

itamaro
Copy link
Contributor

@itamaro itamaro commented May 3, 2023

gh-97696 introduced eager tasks factory, which speeds up some async-heavy workloads by up to 50% when opted in.

installing the eager tasks factory applies out-of-the-box when creating tasks as part of a TaskGroup, e.g.:

asyncio.get_event_loop().set_task_factory(asyncio.eager_task_factory)

async with asyncio.TaskGroup() as tg:
    tg.create_task(coro1)
    tg.create_task(coro2)
    tg.create_task(coro3)

coro{1,2,3} will eagerly execute the first step, and potentially complete without scheduling to the event loop if the coros don't block.

the implementation of TaskGroup uses callbacks internally that end up getting scheduled to the event loop even if all the tasks were able to finish synchronously, and blocking the coroutine in which TaskGroup() was awaited, preventing the task from completing eagerly even if otherwise it could.

applications that use multiple levels of nested TaskGroups can benefit significantly from eagerly completing multiple levels without blocking, as implemented in this PR by skipping scheduling the done callback if the future is done.

Benchmarks

this makes the async pyperformance benchmarks up to 4x faster (!!), using a patch to pyperformance that adds "eager" flavors and uses TaskGroups instead of gather

3.12-base.20230503.async.3.json
===============================

Performance version: 1.0.7
Python version: 3.12.0a7+ (64-bit) revision da1980afcb
Report on Linux-5.15.0-1033-aws-x86_64-with-glibc2.31
Number of logical CPUs: 72
Start date: 2023-05-03 22:18:14.628900
End date: 2023-05-03 22:36:35.397630

3.12-tgcb.20230503.async.3.json
===============================

Performance version: 1.0.7
Python version: 3.12.0a7+ (64-bit) revision 5397cd9f62
Report on Linux-5.15.0-1033-aws-x86_64-with-glibc2.31
Number of logical CPUs: 72
Start date: 2023-05-03 22:38:45.960219
End date: 2023-05-03 22:54:49.008951

+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| Benchmark                     | 3.12-base.20230503.async.3.json | 3.12-tgcb.20230503.async.3.json | Change       | Significance           |
+===============================+=================================+=================================+==============+========================+
| async_tree_cpu_io_mixed       | 843 ms                          | 827 ms                          | 1.02x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager              | 383 ms                          | 92.9 ms                         | 4.12x faster | Significant (t=189.26) |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed | 708 ms                          | 412 ms                          | 1.72x faster | Significant (t=180.43) |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager_io           | 1.40 sec                        | 1.39 sec                        | 1.01x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_eager_memoization  | 515 ms                          | 232 ms                          | 2.22x faster | Significant (t=183.24) |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_io                 | 1.37 sec                        | 1.35 sec                        | 1.01x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_memoization        | 644 ms                          | 634 ms                          | 1.01x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+
| async_tree_none               | 517 ms                          | 508 ms                          | 1.02x faster | Not significant        |
+-------------------------------+---------------------------------+---------------------------------+--------------+------------------------+

@itamaro itamaro changed the title gh-NNNN: Skip scheduling a done callback if a TaskGroup task completes eagerly gh-104144: Skip scheduling a done callback if a TaskGroup task completes eagerly May 3, 2023
@itamaro itamaro marked this pull request as ready for review May 3, 2023 23:52
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

How sure are you that skipping the set-name call if name is None is always correct? (It seems fine to me, but in theory some 3rd party task implementation could do other stuff there.)

@itamaro
Copy link
Contributor Author

itamaro commented May 5, 2023

How sure are you that skipping the set-name call if name is None is always correct? (It seems fine to me, but in theory some 3rd party task implementation could do other stuff there.)

I don't know about all possible 3p task implementations. At least looking at uvloop, it seems it uses asyncio.Task directly.
the asyncio Task.set_name returns immediately if name is NULL, so just saving the function call is a pretty minor optimization that we can probably revert with no noticeable regression (I can measure it). If this will make interop with 3p Task implementations better I think it's worth it.

@gvanrossum
Copy link
Member

I don't think it will be measurable -- it looks to be a very minor part of creating a task (even more so with gh-103767). It just is one more thing we'd have to put in what's new (and it's not covered by the PR title :-).

@itamaro
Copy link
Contributor Author

itamaro commented May 5, 2023

agreed, let me revert that part!

@gvanrossum gvanrossum merged commit 52d8f36 into python:main May 5, 2023
@@ -0,0 +1 @@
Optimize :class:`asyncio.TaskGroup` when using :func:`asyncio.eager_task_factory`. Skip scheduling done callbacks when all tasks finish without blocking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually the scheduling is skipped on a per task basis rather than when all tasks finish without blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a fix in itamaro@ac1ee82 (PR depends on gh-104251 first)

@itamaro itamaro deleted the taskgroups-skip-done-callback branch May 7, 2023 22:16
jbower-fb pushed a commit to jbower-fb/cpython-jbowerfb that referenced this pull request May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants