Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middleware factory is run for every single request #2225

Closed
butla opened this issue Aug 25, 2017 · 18 comments
Closed

Middleware factory is run for every single request #2225

butla opened this issue Aug 25, 2017 · 18 comments
Labels

Comments

@butla
Copy link
Contributor

butla commented Aug 25, 2017

Long story short

Maybe that's just my unwaranted assumption, but I thought that a middleware factory should run once for every handler, creating the wrapped handlers. Turns out, it's run for every request. This seems to be terribly inefficient.

Is this an expected behaviour, something that I'm seeing for some strange reasons, or a regression in the library?

Expected behaviour

Run each middleware factory for each handler at one point, at the start of application's lifecycle. Let the new handlers handle requests from that point without being constantly recreated.

Actual behaviour

Middleware factory is run for every request to a handler.

Steps to reproduce

What I'm creating is a middleware that will parse JWT tokens. I need a mechanism to exculde this parsing on some handlers, since some will be accesses without authorization. I didn't use auihttp_security, since it imposes structure that I don't need (I don't have sessions, for example). I may open-source this code later.

The important bits of my code (with some alterations):

# authorization.py

from functools import wraps

from aiohttp import web
import jwt

NON_SECURED_HANDLERS_KEY = "set of endpoints that don't expect a token"
USER_ID_KEY = 'user ID given by token middleware'

async def token_reading_middleware(app, handler):
    """Reads the token from the authorization header and determines the user ID based on the
    scope inside it.

    Raises 401 if there's something wrong with the header or it's missing.
    
    This check can be disabled by adding the function to the set of non secured handlers
    (on the application object, under `NON_SECURED_HANDLERS_KEY`).
    """
    # assume that APP_CONFIG is imported
    app_secret = app[APP_CONFIG].app_secret

    @wraps(handler)
    async def middleware_handler(request):
        """Adds the user ID parsed from the authorization header to request or raises 401 error.
        """
        # let's say that this function is implemented somewhere and returns 401 is something's wrong with the header
        user_id = _get_user_id_from_auth_header(request, app_secret)
        request[USER_ID_KEY] = user_id
        return await handler(request)

    # this is to tell us that the factory code has been run
    print('XXX', flush=True)
    
    # let's say that this function determines if the handler is esxccepted from security or not
    if _is_except_from_security(handler, app):
        return handler
    return middleware_handler

# app.py
from aiohttp import web

async def _health_check(_):
    return web.HTTPNoContent()


loop =  asyncio.get_event_loop()
app = web.Application(
    loop=loop,
    middlewares=[authorization.token_reading_middleware])
app.router.add_route('GET','/', _health_check)

web.run_app(
    app,
    host='0.0.0.0',
    port=8080,
    loop=loop,
)

After starting the app (python app.py) I call http://localhost:8080 consecutively. I'm getting 401s because I don;t pass the authorization header.

The problem is, that he application keeps printing 'XXX' for every request, to I know that it jumps through all the hoops of preparing the wrapper and wrapping the handler every time.

Your environment

AioHTTP 2.2.5
Python 3.6.2
Kubuntu 16.04.3 LTS x64

@asvetlov
Copy link
Member

The behavior is intentional, it's here starting from very first middleware support implementation.
Do you have numbers how slow current implementation is and how it could be improved?

@samuelcolvin
Copy link
Member

I'm slightly shocked by this; until I read this issue I had always assumed factories where called once, not for every request. I never even thought to check.

If they're called for every request, what's the point in having the outer coroutine?

I've just looked now at a project I'm working on, almost every middleware has processing outside the "handler" which I had assumed would only be called once. None of this processing will take very long so when testing or deploying with small or moderate usage it wouldn't be obvious, but if performance becomes important this could have been a problem.

Apparently this surprising behaviour isn't documented. (or am I being blind?)

Solutions (in order of my preference):

  1. call middleware factories once when freezing the app.
  2. remove the outer coroutine so middlewares are just simple coroutines.
  3. add a big fat prominent warning to the docs that the behaviour of middleware is not intuitive.

I'm guessing 1) is too big a change to be accepted? (My guess would be, 90%+ of developers assume this behaviour anyway, so it shouldn't cause problems)

Perhaps 2) would be possible if current behaviour was allowed but just raises a depreciation warning for a few releases?

@justanr
Copy link

justanr commented Aug 27, 2017

Looks like the point of running the middlewares each time to thread the actual request handler through the middleware chain, so number 1 can't be done.

As for number 2, I'd imagine that's so any middleware setup that needs to do IO doesn't block.

Number 3 is probably the best, unless the outer coroutine could be deprecated.

@asvetlov
Copy link
Member

There is closed PR for composing middlewares chain only once: #565

I'm not convinced if we should optimize this code unless benchmark will show significant performance degradation by chain creation.

@butla
Copy link
Contributor Author

butla commented Aug 28, 2017

@asvetlov As for the measurements of performance and implementation suggestions, I'll try to prepare something today. Gotta look at the closed pr as well.

@samuelcolvin
Copy link
Member

I'm not convinced if we should optimize this code unless benchmark will show significant performance degradation by chain creation.

I wouldn't expect it to make any difference under normal conditions (and it doesn't, see below), the point is that their's a risk developers assume the factories are only called once and therefore wrongly call setup code in the outer coroutine.

This is a particularly big problem if people release helper packages implemented as middleware where some setup is required, because in this case there's no way at all to add initialisation logic when just adding middleware.

I've tried removing the factory:

        if resp is None:
            handler = match_info.handler
            for app in match_info.apps[::-1]:
                # for factory in app._middlewares:
                #     handler = yield from factory(app, handler)
                for m in app._middlewares:
                    handler = partial(m, handler=handler)

            resp = yield from handler(request)

so middleware is defined as just a simple coroutine:

async def my_middleware(request, handler):
    return await handler(request)

it seems to has no effect on performance.

@asvetlov
Copy link
Member

Initial idea for middleware was that middleware itself looks like regular handler: it accepts request and returns response instance (or raises http exception).
To process middlewares chain there is a need for middleware factory which accepts next handler (doesn't matter is it another middleware or final web handler) and returns a new middleware instance.
Proposal from @samuelcolvin works pretty well but the ship has sailed, I don't sure if we need adding new api with existing approach deprecation without any benefit.

Regarding to docs update for more explicit describing current behavior -- I very appreciate it.

@samuelcolvin
Copy link
Member

samuelcolvin commented Aug 28, 2017

It would be pretty easy to allow "old style middleware" and just raise a warning but process it as we do now.

I completely understand what you mean by "without any benefit" (no bugs fixed, no new feature), but I think there are some benefits:

  1. The new style is cleaner, less code, quicker to read, quicker to write.
  2. As I said before the old style leaves a big bad bear trap for developers to fall into: assuming the outer coroutine is only called once and using it for setup code.

Perhaps we should add a note to the docs and leave it for now. If other developers admit they fell into this trap, then perhaps there's sufficient reason to make the change?

@asvetlov
Copy link
Member

@samuelcolvin I'm +0 for proposed change.
Let's do the following: you'll make a Pull Request with full implementation and test coverage.
Then we'll open a kind of voting for collecting feedback from other devs.

@butla
Copy link
Contributor Author

butla commented Aug 29, 2017

I like the simplicity of @samuelcolvin's solution, it's definitely a more straightforward way of getting the current functionality. But I also think, that there may be a benefit to middleware factories that are run just once when freezing the app.

I know it's not an acceptable implementation, but my POC looks like this:

diff --git a/aiohttp/web.py b/aiohttp/web.py
index 6caee4c..65f1cf4 100644
--- a/aiohttp/web.py
+++ b/aiohttp/web.py
@@ -301,8 +301,8 @@ class Application(MutableMapping):
         if resp is None:
             handler = match_info.handler
             for app in match_info.apps[::-1]:
-                for factory in app._middlewares:
-                    handler = yield from factory(app, handler)
+                # assert below needs to look at the middlewares of the last app...
+                pass
 
             resp = yield from handler(request)
 
@@ -446,6 +446,16 @@ def run_app(app, *, host=None, port=None, path=None, sock=None,
                 # add_signal_handler is not implemented on Windows
                 pass
 
+        # I know I'm doing that for only one app, which won't fly with subapps.
+        # This is just a hack to show my thinking.
+        for resource in app.router.resources():
+            # another dirty hack; implementing this whole thing properly will probably take some
+            # thinking about interfaces of a few classes.
+            http_handler = resource._routes[0]._handler
+            for factory in app._middlewares:
+                http_handler = loop.run_until_complete(factory(app, http_handler))
+            resource._routes[0]._handler = http_handler
+
         try:
             print("======== Running on {} ========\n"
                   "(Press CTRL+C to quit)".format(', '.join(uris)))

Available here.

For my application (and middleware factory), the app can respond with 401 about twice as fast: 2488 requests/second on commit d2236cfe (the last one from master) vs 4090 reqs/sec on 3ab39dd97 (mine)

It'll matter when the middleware factory does a bit more complex setup, or calls something over the network, e.g. get some public keys to verify JWTs.

Right now, with the current implementation of middlewares, I'll be doing that this one-time preparation outside of the framework, so I'll have a factory, that produces a factory, that produces the wrapped handler. Smells like corporate Java :)

I know that changing the way it works right now can potentially break someones code, so we totally shouldn't do it. If we'd be going with my approach, we can introduce a new parameter - middleware_factories that would be mutually exclusive with middlewares. As for the approach with streamlined middleware coroutines, I don't know what be a good name for the parameter. But I also think that setting it should turn off the old middlewares.

@samuelcolvin
Copy link
Member

It'll matter when the middleware factory does a bit more complex setup

You should be doing this in startup coroutines and attaching the result to the app with app['public_keys'] = await get_public_keys(). That will avoid calling the logic after startup and will be easier to understand for anyone else who looks at your app.

@butla
Copy link
Contributor Author

butla commented Aug 29, 2017

Yeah, that's one way of doing it. The thing about being easy to understand depends on the specific cases, I think. For example, with the way I'm setting up my JWT parsing middleware right now I have a high cohesion and decoupling. The middleware's client (the code that set's up the application) doesn't worry about the needs of the middleware, the middleware code sets itself up.

But right now I'll be doing more or less what you describe, because middlewares don't work the way I assumed :) To be precise it'll be a factory function creating the current standard middleware factory, and it will be called from the code that does the application setup (in a file called app_setup.py, in my case).

@butla
Copy link
Contributor Author

butla commented Aug 30, 2017

Another thing occurred to me about validity of my approach. I need to exclude some endpoints from the authorization checks. I want implicit inclusion and explicit exclusion in authorization to avaid security issues by omission of including a handler in authorization.

So I don't only need to have a startup code for the middleware - I need it to be able to go through the handlers and decide whether it will wrap them or not. And my case is a bit more complex than looking up a handler function in a set to see if it's excluded:

async def token_reading_middleware(app, handler):
    @wraps(handler)
    async def middleware_handler(request):
        # ...omitted...

    # Getting to the original functions, so we can disregard the wrappers that AioHTTP creates
    # and other potential middlewares on the processing chain.
    handler_original_function = _get_original_function(handler)
    non_secured_original_handlers = {_get_original_function(f)
                                     for f in app.get(NON_SECURED_HANDLERS_KEY, [])}
    if handler_original_function in non_secured_original_handlers:
        return handler
    return middleware_handler

def _get_original_function(function):
    """Assuming that `function` is wrapped in any level of decorators, this will return the original
    function without any decorators.
    """
    while hasattr(function, '__wrapped__'):
        function = function.__wrapped__
    return function

True, that can be done during handling of every request, and the slowdown isn't significant compared to all the thing that the framework is doing, but by gradually cutting down these kinds of things we can get a faster framework. And if I'm faced with picking frameworks that have similar functionality, I will pick the faster one.

@samuelcolvin
Copy link
Member

I suspect you'd be better off using request.match_info as is done here I know you can get the route name, but without looking I imagine it should also be able to get the coroutine as well if that's what you really need.

@butla
Copy link
Contributor Author

butla commented Aug 30, 2017

@samuelcolvin That's a good idea, I'll do that :) But my general case remains. @asvetlov You would consider my one time factories when I provide a proper implementation?

@butla
Copy link
Contributor Author

butla commented Oct 20, 2017

@samuelcolvin Thanks for the work :)
The recommended way of getting the app object in middlewares now is to get it from the request?

@asvetlov
Copy link
Member

Yes. It's the same object.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants