Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performances: How to bind the correct parent span with asyncio? #1187

Closed
apinsard opened this issue Sep 11, 2021 · 3 comments
Closed

Performances: How to bind the correct parent span with asyncio? #1187

apinsard opened this issue Sep 11, 2021 · 3 comments

Comments

@apinsard
Copy link

Hello,

I'm trying to implement Sentry performances tracing in an async context. Unfortunately, when I try to do it in a generic way, using sentry_sdk.Hub.current.scope.span, it fails to find the correct parent span to start a child.

For instance, if I do the following:

# -*- coding: utf-8 -*-

__all__ = ["AioZipFile"]

ZIP_CONFIG = deepcopy(DEFAULT_CONFIG)
ZIP_CONFIG["common_async_attrs"].extend(["extract", "extractall"])


class TracedZipFile(ZipFile):

    def extractall(self, path=None, *args, **kwargs):
        destination = path or "current directory"
        span = sentry_sdk.Hub.current.scope.span
        with span.start_child(op='aio', description=f"Extract zip file to {destination}"):
            return super().extractall(path, *args, **kwargs)


class ZipFileObj(FileObj):
    CONFIG = ZIP_CONFIG


class AioZipFile(FileIO):
    OPEN = TracedZipFile
    FILEOBJ = ZipFileObj

class Command(AsyncBaseCommand):

    async def _url_open(self, url):
        ...

    async def _unpack_zip(self, conf):
        source = conf['source'].format(V=con['version']
        async with AioZipFile(BytesIO(await self._urlopen(source))) as ar:
            await ar.extractall(conf['_tmp_dir'])

    async def get_distlib(self, name, conf):
        with self.main_span.start_child(
            op='get_distlib', description=f"[{name}-{conf['version']}] {conf['source']}"):
            ...
            if conf['type'] == 'zip':
                await self._unpack_zip(conf)
            ...

    async def handle(self, *args, **options):
        ...
        self.main_span = sentry_sdk.Hub.current.scope.span
        await asyncio.gather(*(self.get_distlib(*d) for d in distlibs.items()))

It ends up like this:

Screenshot 2021-09-11 at 14-59-43 Performance - Event Details - cocoonr - Sentry

All zip files extractions are bound to the same span, while each were handled by a different get_distlib span.

Of course I can handle it specifically for this script, and create the span in my code instead of implementing it to my custom TracedZipFile class:

class AioZipFile(FileIO):
    OPEN = ZipFile  # zipfile.ZipFile Instead of custom TracedZipFile
    FILEOBJ = ZipFileObj

class Command(AsyncBaseCommand):

    async def _unpack_zip(self, conf, span):  # Add span argument to get the actual current span
        source = conf['source'].format(V=con['version']
        async with AioZipFile(BytesIO(await self._urlopen(source))) as ar:
            with span.start_child(op='io', description=f"Extract zip file to {conf['_tmp_dir']}"):
                # Explicitely wrap extractall call in a span
                await ar.extractall(conf['_tmp_dir'])

    async def get_distlib(self, name, conf):
        with self.main_span.start_child(
            op='get_distlib', description=f"[{name}-{conf['version']}] {conf['source']}"
        ) as span:
            ...
            if conf['type'] == 'zip':
                await self._unpack_zip(conf, span)  # Pass span through methods calls
            ...

This works perfectly:

Screenshot 2021-09-11 at 15-17-59 Performance - Event Details - cocoonr - Sentry

However, this requires to pass the parent span through the whole chain of function calls, and always create spans manually, which is quite annoying.

I could make it partially generic by allowing to pass a span to my custom TracedZipFile:

class TracedZipFile(ZipFile):

    def __init__(self, *args, **kwargs):
        self.parent_span = kwargs.pop('span', None) or sentry_sdk.Hub.current.scope.span
        super().__init__(*args, **kwargs)

    def extractall(self, path=None, *args, **kwargs):
        destination = path or "current directory"
        with self.parent_span.start_child(op='io', description=f"Extract zip file to {destination}"):
            return super().extractall(path, *args, **kwargs)


class ZipFileObj(FileObj):
    CONFIG = ZIP_CONFIG


class AioZipFile(FileIO):
    OPEN = TracedZipFile
    FILEOBJ = ZipFileObj


class Command(AsyncBaseCommand):

    async def _unpack_zip(self, conf, span=None):
        source = conf['source'].format(V=con['version']
        async with AioZipFile(BytesIO(await self._urlopen(source)), span=span) as ar:
            await ar.extractall(conf['_tmp_dir'])

    async def get_distlib(self, name, conf):
        with self.main_span.start_child(
            op='get_distlib', description=f"[{name}-{conf['version']}] {conf['source']}"
        ) as span:
            ...
            if conf['type'] == 'zip':
                await self._unpack_zip(conf, span)  # Pass span through methods calls
            ...

This avoids me to manually create the io span, but I still have to pass the parent span through function calls, which is not very DRY and not always possible.

So my question is: Is there a way to retrieve the correct parent span?

Thank you very much !

@apinsard
Copy link
Author

I struggled a lot but I think I finally found the way to do it.

Here is my utils/sentry.py that contains utilities to make sentry work in an async context:

from functools import wraps

import sentry_sdk


def traced_asyncio_task(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        with sentry_sdk.Hub(client.Hub.current) as hub:
            with hub.scope.span.start_child(op='aiotask') as span:
                span.set_tag('task', func.__name__)
                return await func(*args, **kwargs)
    return wrapper

Now, I just have to wrap my tasks with traced_asyncio_task without any extra work and anywhere I call sentry_sdk.Hub.current.scope.span, I retrieve the correct span:

from utils import sentry

class Command(AsyncBaseCommand):

    async def _url_open(self, url):
        ...

    async def _unpack_zip(self, conf):
        source = conf['source'].format(V=con['version']
        async with AioZipFile(BytesIO(await self._urlopen(source))) as ar:
            await ar.extractall(conf['_tmp_dir'])

    @sentry.traced_asyncio_task
    async def get_distlib(self, name, conf):
        span = sentry_sdk.Hub.current.scope.span
        span.description = f"[{name}-{conf['version']}] {conf['source']}"
        ...
        if conf['type'] == 'zip':
            await self._unpack_zip(conf)
        ...

    async def handle(self, *args, **options):
        ...
        await asyncio.gather(*(self.get_distlib(*d) for d in distlibs.items()))

Screenshot 2021-09-13 at 14-03-16 Performance - Event Details - cocoonr - Sentry

Can you confirm this is the way to go? Maybe this would benefit from being documented somewhere ?

@AbhiPrasad
Copy link
Member

Yeah that decorator is probably your best bet going forward.

We need to do some work to improve the documentation here, and possibly expose some nice decorators/wrapper func that make async use cases easier to use. Thanks for bringing this up, we will work on it.

@github-actions
Copy link

github-actions bot commented Mar 7, 2022

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants