-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Async stream handlers #4474
Conversation
import weakref | ||
|
||
from .core import CommClosedError | ||
from .metrics import time | ||
from .utils import sync, TimeoutError, parse_timedelta | ||
from .protocol.serialize import to_serialize | ||
|
||
if typing.TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found adding type annotations to the stream_handlers
object was useful in implementing this (cf #2803), allowing type checkers to catch which handlers were async or sync. However, I needed this pretty unsightly hack to avoid a circular import, just to get the type name in the module scope. I'm not happy about it, and it could be removed. On the other hand, it's kind of nice for refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does from __future__ import annotations
help at all here? Admittedly that is Python 3.7+ only. Though we are planning to drop Python 3.6 soon ( #4390 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had to remove annotations
in 4f1944b for 3.6 support. Though it didn't help that much, it mostly meant I didn't have to quote the type names below. The circular import problem remained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakirkham it doesn't help - you still need those imports. from __future__ import annotations
helps when you have a class accepting itself or returning itself in its methods, or when class A in a module accepts or returns class B which is declared afterwards in the same module.
T = TypeVar("T") | ||
|
||
|
||
def asyncify(func: Callable[..., T]) -> Callable[..., Awaitable[T]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some places I've used an async wrapper to have a lighter footprint on the API, but more functions could just be made async in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I almost raised a question about this until I looked at how these methods were used. I agree that making them sync by default is probably the right choice. This seemed like a good solution to me.
Ooh, fun. I look forward to seeing which of the many different ways of profiling you choose :) |
So, a thought just came to me. I wonder if all of this could be solved by
lru cache.
…On Mon, Feb 1, 2021, 2:36 PM Ian Rose ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In distributed/pubsub.py
<#4474 (comment)>:
> import weakref
from .core import CommClosedError
from .metrics import time
from .utils import sync, TimeoutError, parse_timedelta
from .protocol.serialize import to_serialize
+if typing.TYPE_CHECKING:
Yeah, I had to remove annotations in 4f1944b
<4f1944b>
for 3.6 support. Though it didn't help that much, it mostly meant I didn't
have to quote the type names below. The circular import problem remained.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4474 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDYWHBF5WNQBWKF7DDS44UHDANCNFSM4W5KM4JQ>
.
|
@@ -3,20 +3,25 @@ | |||
from contextlib import suppress | |||
import logging | |||
import threading | |||
import typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a personal esthetic preference, I find that using from typing import ...
is always a nicer option. It does lead to collisions with from collections.abc import ...
, which can however be worked around after you drop Python 3.6 support, add from __future__ import annotations
everywhere, and you run a Python 3.9-compatible version of mypy (on python 3.7+).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree with you here @crusaderky. In this case I kept it this way because the thing I'm using is typing.TYPE_CHECKING
, and I didn't really want that floating around the module namespace. No strong preference either way, though.
Expanding on this a bit more (that thought came to me last night and I wanted to get it out quickly). I think that we actually prefer being able to use both sync and async handlers, but we're choosing not to do this for performance reasons, mostly because the is_coroutine_function = functools.lru_cache(inspect.is_coroutine_function) This does open up questions about how expensive |
This is a good point, I'll test it out |
Okay, here is a look at some profiling results. I've mostly followed the same strategy detailed by @mrocklin in #4443 (comment) using viztracer and shuffling the same large timeseries. The specific durations for the functions in question are small, but they are typically called thousands of times a second, so it does add up. I've looked at three cases:
Specifically, I'm looking at the profile of
|
Thanks for the detailed results Ian 😄 Do we know how many plugins are developed for Distributed? |
Oh sorry one more question, was the asyncio case using Tornado or uvloop? Wouldn't expect a difference, but would be good to know if there happened to be one (as we are using the latter in performance sensitive cases atm) |
Good question. I'm not aware of any third-party ones myself, but I don't see any reason why ones couldn't do something similar to this one.
This was with |
Thank you for the detailed analysis @ian-r-rose It sounds like we're going with lru? |
I'm happy to go with that, it's a simpler change for the time being (though I generally like more predictable async vs sync APIs where possible). |
I'm happy either way with a mild preference for lru, just because it seems
less likely to have unintended consequences.
I'm also totally happy to be overruled here. The work in this PR helps to
make things more predictable, which has real value and which we could use
more of.
…On Wed, Feb 3, 2021, 3:10 PM Ian Rose ***@***.***> wrote:
It sounds like we're going with lru?
I'm happy to go with that, it's a simpler change for the time being
(though I generally like more predictable async vs sync APIs where
possible).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4474 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTE4J4CLQ3PHT47B353S5HJV3ANCNFSM4W5KM4JQ>
.
|
Went ahead and merged Ian's LRU cache PR ( #4481 ). It's a pretty light PR. This coroutine check also only happens in a handful of places. That said, I don't think it rules this async approach. There may be other reasons (as noted above) for going this route. |
I'll close this as #4474 is in, but will keep the branch around in case we want to revisit later |
Fixes #4469.
I haven't done much profiling yet to see what kind of performance improvements this has (if any), I'll take a look at that next and post some results here.