Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastAPI/Starlette - Inconsistent placement of the OpenTelemetryMiddleware #795

Open
phillipuniverse opened this issue Nov 9, 2021 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@phillipuniverse
Copy link
Contributor

The Starlette application class (which is the parent of the FastAPI application class) hardcodes the ServerErrorMiddleware to appear as the very first middleware. This means that if there is an error in some downstream middleware, handling the request with ServerErrorMiddleware will not have an active trace.

Also, if you happen to do something like this:

app = FastAPI(...)

app.add_middleware(...)
app.add_middlware(....)

The additional middleware added after the first __init__ of FastAPI will also not have an active trace established.

There is a relatively easy way around this to ensure that the OpenTelemetry middleware is always first:

class _InstrumentedFastAPI(fastapi.FastAPI):
    _tracer_provider = None
    _excluded_urls = None
    _server_request_hook: _ServerRequestHookT = None
    _client_request_hook: _ClientRequestHookT = None
    _client_response_hook: _ClientResponseHookT = None

    def build_middleware_stack(self) -> ASGIApp:
        original_mw_app = super().build_middleware_stack()
        return OpenTelemetryMiddleware(
            original_mw_app,
            excluded_urls=_InstrumentedFastAPI._excluded_urls,
            default_span_details=_get_route_details,
            server_request_hook=_InstrumentedFastAPI._server_request_hook,
            client_request_hook=_InstrumentedFastAPI._client_request_hook,
            client_response_hook=_InstrumentedFastAPI._client_response_hook,
            tracer_provider=_InstrumentedFastAPI._tracer_provider,
        )

FastAPI version - 0.70.0
Starlette version - 0.16.0
Python version - 3.9

@phillipuniverse phillipuniverse added the bug Something isn't working label Nov 9, 2021
@phillipuniverse
Copy link
Contributor Author

Slightly more concrete use case:

    application = FastAPI(
        title="My API"
    )

    async def unhandled_exception_handler(request: Request, exc: Exception) -> JSONResponse:
        logger.exception("Unhandled exception")
        
        return JSONResponse(
            status_code=HTTP_500_INTERNAL_SERVER_ERROR,
            content=jsonable_encoder({"title": "Unhandled error"),
        )

    application.add_exception_handler(Exception, unhandled_exception_handler)

In this case the add_exception_handler is hooked up as the error handler for ServerErrorMiddleware. Because of this and because the OpenTelemetry middleware is installed as "user middleware", the logger.exception invocation within the unhandled_exception_handler will not have a trace id or span id as the span has already ended.

@phillipuniverse
Copy link
Contributor Author

phillipuniverse commented Nov 9, 2021

The workaround for now is to add an opentelemetry_pre_instrument entrypoint. Here is my patching function:

import opentelemetry
import typing
from starlette.types import ASGIApp
from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware
from opentelemetry.instrumentation.fastapi import _InstrumentedFastAPI
from opentelemetry.instrumentation.fastapi import _get_route_details


def patch_middleware_ordering():
    """
    Replaces the default OpenTelemetry instrumentation class with one that is assured to always be added to the very
    beginning of the middleware stack, even before ServerMiddleware. Necessary until https://github.com/open-telemetry/opentelemetry-python-contrib/issues/795
    is resolved
    """

    class OpenTelemetryFirstMiddlewareFastApi(_InstrumentedFastAPI):

        def add_middleware(self, middleware_class: type, **options: typing.Any) -> None:
            if middleware_class == OpenTelemetryMiddleware:
                return
            else:
                super().add_middleware(middleware_class, **options)

        def build_middleware_stack(self) -> ASGIApp:
            app = super().build_middleware_stack()
            return OpenTelemetryMiddleware(
                app,
                excluded_urls=_InstrumentedFastAPI._excluded_urls,
                default_span_details=_get_route_details,
                server_request_hook=_InstrumentedFastAPI._server_request_hook,
                client_request_hook=_InstrumentedFastAPI._client_request_hook,
                client_response_hook=_InstrumentedFastAPI._client_response_hook,
                tracer_provider=_InstrumentedFastAPI._tracer_provider,
            )


    opentelemetry.instrumentation.fastapi._InstrumentedFastAPI = OpenTelemetryFirstMiddlewareFastApi

Then you need an entrypoint for fastapi under opentelemtry_pre_instrument. In my case I am using Poetry so my pyproject.toml looks like this:

[tool.poetry.plugins."opentelemetry_pre_instrument"]
fastapi = "mylibrary.tracing.contrib.fastapi:patch_middleware_ordering"

With this change I am able to see trace/span ids in logger invocations from ServerErrorMiddleware.

I would like to contribute this back but the problem is I don't think this will work for the non-auto-instrument case which does add_middleware:

@staticmethod
def instrument_app(
app: fastapi.FastAPI,
server_request_hook: _ServerRequestHookT = None,
client_request_hook: _ClientRequestHookT = None,
client_response_hook: _ClientResponseHookT = None,
tracer_provider=None,
excluded_urls=None,
):
"""Instrument an uninstrumented FastAPI application."""
if not hasattr(app, "_is_instrumented_by_opentelemetry"):
app._is_instrumented_by_opentelemetry = False
if not getattr(app, "_is_instrumented_by_opentelemetry", False):
if excluded_urls is None:
excluded_urls = _excluded_urls_from_env
else:
excluded_urls = parse_excluded_urls(excluded_urls)
app.add_middleware(
OpenTelemetryMiddleware,
excluded_urls=excluded_urls,
default_span_details=_get_route_details,
server_request_hook=server_request_hook,
client_request_hook=client_request_hook,
client_response_hook=client_response_hook,
tracer_provider=tracer_provider,
)
app._is_instrumented_by_opentelemetry = True
else:
_logger.warning(
"Attempting to instrument FastAPI app while already instrumented"
)

This whole process is a little shady anyway. It seems a little strange to me that Starlette/FastAPI does not give first-class support to fully control the final middleware ordering.

Also let me just say, I have been really impressed with all of the extension points that I have seen within this project. It was really nice to be able to self-serve and unblock myself in a not as nasty way. Great work!!

@lskrajny-marcura
Copy link

Hi, is it possible to monkey patch it without poetry, trying to do in multiple ways, but OpenTelemetryFirstMiddlewareFastApi class is not used :/

@phillipuniverse
Copy link
Contributor Author

@lskrajny-marcura depends on how you are kicking off the instrumentation process. The poetry part above is not important per se, the important part is that you add the monkey patching as an entry_point.txt such that it runs prior to any default open telemetry instrumentation.

@lskrajny-marcura
Copy link

Have a simple app with main.py where i am first creating FastApi/Starlette app and then manually adding all opentelemetry instrumentations.

Having patching on top of main.py seems "too late"

@lzchen
Copy link
Contributor

lzchen commented Mar 28, 2023

@phillipuniverse
Do you foresee any repercussions for adding the tracing middleware first? Would it conflict with other middlewares?

@phillipuniverse
Copy link
Contributor Author

phillipuniverse commented Mar 28, 2023

Do you foresee any repercussions for adding the tracing middleware first? Would it conflict with other middlewares?

@lzchen it has not in my experience. I have been running this patch in production since November 9, 2021 with no issues.

I did have to make one change in FastAPI 0.92/Starlette 0.24 with the change to lazily-instantiated middleware but it was pretty straightforward (I'll update my answer above to include the change, it's not there currently).

In my scenario this change is crucial for us to be able to be able to annotate unhandled exceptions in our application with a trace id in our external error reporting system.

FWIW I would consider the main problem to be a FastAPI problem where it limits full control over the entire middleware list.

@lzchen
Copy link
Contributor

lzchen commented Mar 28, 2023

@phillipuniverse
Makes sense to me. Would you consider creating a PR for this?

@phillipuniverse
Copy link
Contributor Author

No problem!

@Kyle-sandeman-mrdfood
Copy link

Hi, this is still an issue even in fastapi==0.103.2.
I am using both app.add_middleware(MyCustomMW) and custom exception handlers

Here are my debugging steps, if it will help anyone understand the problem better

In your asgi instrumentation's __init__.py I added two print statements to show exactly when the span starts and whether it is recording (i.e. in the with ..use_span and if ..is_recording)
With auto instrumentation I get these logs

[request begins]
------- MyCustomMW: current span = None 
---- OTelMW: started span
---- OTelMW: recording
[other logs]
------- MyCustomMW: cleanup step, span = None
---- OTelMW: finally clause reached

And after using a patch similar to the one posted earlier in this issue:

otel_mw = find_otel_mw()
cls, options = otel_mw

app.user_middleware.remove(otel_mw)
stack = app.build_middleware(stack)
app.middleware_stack = cls(app=stack, **options)

Then I get:

[request begins]
---- OTelMW: started span
---- OTelMW: recording
------- MyCustomMW: current span = _Span(name...)
[other logs]
------- MyCustomMW: cleanup step, span = _Span(name=...)
---- OTelMW: finally clause reached

And just to inspect the middleware stack creation, I added prints to the various __init__ functions:
original:

init OpenTelemetryInstrumentation middleware
init MyCustomMW middleware

patched:

init MyCustomMW middleware
init OpenTelemetryInstrumentation middleware

@alexmojaki
Copy link
Contributor

+1 for this. It seems that unhandled errors cause the span to be missing response headers and a status code and that the response itself misses the traceresponse header.

However we should assume that OpenTelemetryMiddleware might itself raise errors, either from bugs in OTel or from user code. For example, nothing currently handles exceptions raised by the provided server_request_hook. If these errors aren't handled then I think that might mean a response isn't sent at all. There could of course be a giant try/except around all the OTel code, but it seems most sensible to defer to any custom error handling via a second ServerErrorMiddleware, i.e:

def build_middleware_stack(self) -> ASGIApp:
    inner_server_error_middleware = super().build_middleware_stack()
    otel_middleware = OpenTelemetryMiddleware(inner_server_error_middleware, ...)
    if isinstance(inner_server_error_middleware, ServerErrorMiddleware):  # usually true
        outer_server_error_middleware = ServerErrorMiddleware(
            app=otel_middleware,
            handler=inner_server_error_middleware.handler,
            debug=inner_server_error_middleware.debug,
        )
    else:
        # Something else seems to have patched things, or maybe Starlette changed.
        # Just create a default ServerErrorMiddleware.
        outer_server_error_middleware = ServerErrorMiddleware(app=otel_middleware)
    return outer_server_error_middleware

Or maybe:

import copy

def build_middleware_stack(self) -> ASGIApp:
    final_middleware = super().build_middleware_stack()
    otel_middleware = OpenTelemetryMiddleware(final_middleware, ...)
    if hasattr(final_middleware, "app"):
        final_middleware_copy = copy.copy(final_middleware)
        final_middleware_copy.app = otel_middleware
        return final_middleware_copy
    else:
        return ServerErrorMiddleware(otel_middleware)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants