Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New app signals proposal to track connections and requests #2685

Open
pfreixes opened this issue Jan 23, 2018 · 9 comments
Open

New app signals proposal to track connections and requests #2685

pfreixes opened this issue Jan 23, 2018 · 9 comments

Comments

@pfreixes
Copy link
Contributor

pfreixes commented Jan 23, 2018

Long story short

Right now there is not a set of signals in place to track connections and request for Aiohttp web server. The proposal would implement the following subset of signals, attached to the application.

  • on_connection_created(app, context) Triggered when a new connection has been created.
  • on_connection_closed(app, context) Triggered when a connection has been closed.
  • on_request_start(app, context, request) Triggered when a new request start.
  • on_request_end(app, context, request, response) Triggered when a requests finishes.

All signals will receive as a parameter the app instance and a context that would help the user to pass information to the next semantic signal. For example, the context related with the connection will be instantiated when the on_connection_created is called and the value will be passed as a parameter for both connections methods, having the same for the request flavor.

These signals will be only triggered by the root app.

Taking into account that already exists a group of signals for the App and these differ a bit of the proposed ones, lack of context and so on. We could discuss on have a new module called web_tracing that will implement the same pattern as the client tracing [1], this might imply some name changing.

My 2cent about my cons with other ways to implement that:

  • Middlewares are chained between the root app and nested apps, implement the request signals implicitly as a middleware IMHO is prone error and weak. Better having an ad-hoc set of signals that are never triggered by the nested apps.
  • I would prefer to implement these signals out of the app scope, perhaps having them as a new parameter of the run_app that implements only these new signals. The problem with this solution is the incompatibility with environments that use the workers strategy to start the application such as gunicorn.

[1] https://github.com/aio-libs/aiohttp/blob/master/aiohttp/tracing.py

@asvetlov
Copy link
Member

Agree in general but let's polish client tracing first.

Also I think the best place for landing server signals is not Application but AppRunner.
Signals are decoupled from application logic most likely.

@pfreixes
Copy link
Contributor Author

Stopped until #2686 gets done.

@pfreixes
Copy link
Contributor Author

pfreixes commented Feb 5, 2018

@asvetlov any idea how can we put the signals in the AppRunner making this feature available when gunicorn it's used?

@kowalski
Copy link
Contributor

I would like to throw in two ideas. I have an internal library that does request tracing and saves each request to the database. For our application it was essential that we grab all the outgoing traffic as often a request has financial consequences and we need to maintain full history.

The tracing feature seems very close to what we developed internally, however it missing:

  1. Ability to grab request_body. Is there any plan on adding a signal triggered after writing the body ? Perhaps I could help with that if I get the guidelines on desired naming of the signal.

  2. Ability to make tracing_config take in parameters from the callers context. Let me explain this with code:

            pivots = [{'name': 'balancing_group_id', 'value': group.id}]
            saver = get_request_saver(self.app, pivots)
            async with self.client.request_saver(saver):
                async with self.client.get(url) as resp:
                    ...
    

    In the example above by passing the identifier we can later easily filter all the requests done in scope of the particular group.

    Under the hoods this could be implemented as:

    diff --git a/aiohttp/client.py b/aiohttp/client.py
    index 93eff65..b04abf0 100644
    --- a/aiohttp/client.py
    +++ b/aiohttp/client.py
    @@ -139,10 +138,25 @@ class ClientSession:
             self._response_class = response_class
             self._ws_response_class = ws_response_class
    
    -        self._trace_configs = trace_configs or []
    -        for trace_config in self._trace_configs:
    +        self.task_locals = TaskLocals(loop=loop)
    +        self._static_trace_configs = trace_configs
    +        for trace_config in trace_configs:
                 trace_config.freeze()
    
    +    def _get_trace_configs(self):
    +        return self.task_locals.get('trace_configs', []) + self._static_trace_configs
    +
    +    def _set_trace_configs(self, tracing_configs):
    +        self.task_locals.set('trace_configs', tracing_configs)
    +        for trace_config in trace_configs:
    +            trace_config.freeze()
    +
    +    @async_contextmanager
    +    async def trace_configs(self, trace_config):
    +        self.set_trace_configs(tracing_configs)
    +        yield
    +        self.set_trace_configs([])
    +
         def __init_subclass__(cls):
             warnings.warn("Inheritance class {} from ClientSession "
                           "is discouraged".format(cls.__name__),
    @@ -256,7 +270,7 @@ class ClientSession:
                     trace_config.trace_config_ctx(
                         trace_request_ctx=trace_request_ctx)
                 )
    -            for trace_config in self._trace_configs
    +            for trace_config in self._get_trace_configs()
             ]
    
             for trace in traces:
    

    Implementation above would allow to both have the trace config global for the session as well as adding them dynamically with async with session.trace_configs.

    For the sake of comlicity I should tell what TaskLocals is. It's the same concept as contextvars which will be introduced in python 3.7 (https://docs.python.org/dev/library/contextvars.html#module-contextvars). The implementation I borrowed from peewee_async library. It looks as follows:

    class TaskLocals:
        """Simple `dict` wrapper to get and set values on per `asyncio`
     task basis.
    
     The idea is similar to thread-local data, but actually *much* simpler.
     It's no more than a "sugar" class. Use `get()` and `set()` method like
     you would to for `dict` but values will be get and set in the context
     of currently running `asyncio` task.
    
     When task is done, all saved values is removed from stored data.
     """
        def __init__(self, loop):
            self.loop = loop
            self.data = {}
    
        def get(self, key, *val):
            """Get value stored for current running task. Optionally
         you may provide the default value. Raises `KeyError` when
         can't get the value and no default one is provided.
         """
            data = self.get_data()
            if data is not None:
                return data.get(key, *val)
            elif len(val):
                return val[0]
            else:
                raise KeyError(key)
    
        def set(self, key, val):
            """Set value stored for current running task.
         """
            data = self.get_data(True)
            if data is not None:
                data[key] = val
            else:
                raise RuntimeError("No task is currently running")
    
        def get_data(self, create=False):
            """Get dict stored for current running task. Return `None`
         or an empty dict if no data was found depending on the
         `create` argument value.
    
         :param create: if argument is `True`, create empty dict
                        for task, default: `False`
         """
            task = asyncio.Task.current_task(loop=self.loop)
            if task:
                task_id = id(task)
                if create and not task_id in self.data:
                    self.data[task_id] = {}
                    task.add_done_callback(self.del_data)
                return self.data.get(task_id)
    
        def del_data(self, task):
            """Delete data for task from stored data dict.
         """
            del self.data[id(task)]
    

@pfreixes
Copy link
Contributor Author

Hi @kowalski thanks for your feedback, appreciate it!

Just a couple of comments to try to put all of us in the same page. This issue is about the tracing in the server side, to track those news connections and requests that flows into the server. For the Client side perspective, Aiohttp already has an implementation of the Client tracing since 3.0 [1].

So, first of all. This is about the server or the client side?

Regarding the Client side, current implementation allows passing static and dynamic configurations, maybe the solution is a bit overcomplicated but you should be able to make both things in your trace configs. The usage of one pattern or another was on the air at the moment of developing this feature, contextvars were discarded because Aiohttp gives support for Python versions > 3.5.

So the current implementation does allow to pass contextual information per request using the trace_request_ctx, but the developer has the freedom of using other mechanisms such as contextvars, TaskLocal, aiocontext, .... that will circumvent the current implementation and will allow pass information implicitly between the caller and the signals.

[1] https://docs.aiohttp.org/en/stable/client_advanced.html#client-tracing

@kowalski
Copy link
Contributor

Hi @pfreixes, sorry for confusion. I was not aware this issue is about the server-side tracing.
Indeed my comment was regarding the outgoing traffic, so the requests done with ClientSession.

Kindly point me what is the right way to make suggestions about extending that feature. Should I open another issue ?

I'm aware of tracing mechanism and I that I can pass the trace_request_ctx to get/post/put/delete methods. However I don't think it's very practical. Please consider a situation when a third-party library is used, which wraps the individual http calls into an application-logic methods. Take boto as an example. It doesn't provide access to individual http requests, it exposes services specific to AWS resources. To be able to use TraceConfig with variables passed from the context, boto would have to update it's interfaces to optionally take in trace_request_ctx on all the calls and pass it to underlying methods of ClientSession. It clearly will not happen.

On the other hand I can see a way to have context variables in the TraceConfig, but it's a hack. In my code I could have a TraceConfig subclass which holds the context in TaskLocals and exposes the async_contextmanager to set it.

For this reason my suggestion is to allow dynamic changes of ClientSession._trace_configs itself, without the need of changing calls to methods making requests.

@pfreixes
Copy link
Contributor Author

please, @kowalski let's have a discussion in another issue that relates with client tracing and context integration.

@kowalski
Copy link
Contributor

Ok, I've splitted this to 2 new issues:
#2754 (for context variables)
#2755 (for getting request body)

@tailhook
Copy link
Contributor

Agree in general but let's polish client tracing first.

Any update on this issue? Is client implementation mature enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants