Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python blocking progress improvements #312

Draft
wants to merge 5 commits into
base: branch-0.41
Choose a base branch
from

Conversation

pentschev
Copy link
Member

The cost of creating/processing Python asyncio is far from negligible, costing around 10ns per task, see https://github.com/pentschev/python-overhead for reference. With that in mind it's possible to improve Python blocking progress mode by not creating a new task every time to rearm the worker, and instead reuse the same task that remains running forever.

The same ~10ns per operation that are the cost of asyncio tasks are then saved in progress mode, see results below.

Before
$ UCX_TLS=tcp python -m ucxx.benchmarks.send_recv --no-detailed-report --n-iter 100000 --n-bytes 1 --backend ucxx-async --progress-mode blocking
Server Running at 10.33.225.163:47421
Client connecting to server at 10.33.225.163:47421
Roundtrip benchmark
================================================================================
Iterations                | 100000
Bytes                     | 1 B
Number of buffers         | 1
Object type               | numpy
Reuse allocation          | False
Transfer API              | TAG
Progress mode             | blocking
UCX_TLS                   | tcp
UCX_NET_DEVICES           | all
================================================================================
Device(s)                 | CPU-only
Server CPU                | affinity not set
Client CPU                | affinity not set
================================================================================
Bandwidth (average)       | 16.08 kiB/s
Bandwidth (median)        | 16.69 kiB/s
Latency (average)         | 60713 ns
Latency (median)          | 58518 ns
After
$ UCX_TLS=tcp python -m ucxx.benchmarks.send_recv --no-detailed-report --n-iter 100000 --n-bytes 1 --backend ucxx-async --progress-mode blocking
Server Running at 10.33.225.163:50523
Client connecting to server at 10.33.225.163:50523
Roundtrip benchmark
================================================================================
Iterations                | 100000
Bytes                     | 1 B
Number of buffers         | 1
Object type               | numpy
Reuse allocation          | False
Transfer API              | TAG
Progress mode             | blocking
UCX_TLS                   | tcp
UCX_NET_DEVICES           | all
================================================================================
Device(s)                 | CPU-only
Server CPU                | affinity not set
Client CPU                | affinity not set
================================================================================
Bandwidth (average)       | 23.96 kiB/s
Bandwidth (median)        | 24.84 kiB/s
Latency (average)         | 40765 ns
Latency (median)          | 39319 ns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant