Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceId, SpanId 0 in logs from exception handler for Exception class - FastAPI app #3477

Open
amit12297 opened this issue Oct 16, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@amit12297
Copy link

amit12297 commented Oct 16, 2023

Describe your environment
python = 3.10.12
fastapi = 0.92.0
uvicorn = 0.20.0
opentelemetry-distro = 0.41b0

Steps to reproduce
When we log from exception_handler in FastAPI, traceId and spanId is coming as 0. Here is an example:

import logging

import uvicorn
from fastapi import FastAPI
from starlette.requests import Request
from starlette.responses import JSONResponse

logger = logging.getLogger("activity")


async def unhandled_exception_handler(request: Request, exc: Exception):
    logger.error("Log with 0 trace-id and span-id")
    return JSONResponse(content={"message": "Something went wrong"}, status_code=500)


app = FastAPI(exception_handlers={Exception: unhandled_exception_handler})


@app.get("/")
def read_root():
    logger.info("Log with valid trace-id and span-id")
    return {"Hello": "World"}


@app.get("/break")
def break_path():
    logger.info("Log with valid trace-id and span-id")
    1 / 0
    return {"Unreachable": "Code"}


if __name__ == "__main__":
    uvicorn.run(app=app, port=8080, host="127.0.0.1", reload=False)

I have created a github repo to demonstrate the issue. Please checkout: https://github.com/amit12297/python-otel-log-issue and follow the readme

What is the expected behavior?
Actual(Non-zero) traceId and spanId in logs from exception handler

What is the actual behavior?
traceId=0 and spanId=0 in logs from exception handler

2023-10-17 00:45:22,334 INFO [activity] [main.py:29] [trace_id=8799f34465cd9905efcee0dce40633b4 span_id=32efef8c434b3a07 resource.service.name=log-issue-repro trace_sampled=True] - Log with valid trace-id and span-id
2023-10-17 00:45:22,339 ERROR [activity] [main.py:14] [trace_id=0 span_id=0 resource.service.name=log-issue-repro trace_sampled=False] - Log with 0 trace-id and span-id

Additional context
As you can see in the code above I have registered an exception handler for the base exception class

exception_handlers={Exception: unhandled_exception_handler}

FastAPI checks if any handler is registered for 500/Exception, if yes then it sets that as handler for ServerErrorMiddleware.
Here is the FastAPI code:

        for key, value in self.exception_handlers.items():
            if key in (500, Exception):
                error_handler = value
            else:
                exception_handlers[key] = value

        middleware = (
            [Middleware(ServerErrorMiddleware, handler=error_handler, debug=debug)]
            + self.user_middleware
            + [
                Middleware(
                    ExceptionMiddleware, handlers=exception_handlers, debug=debug
                ),
                # Add FastAPI-specific AsyncExitStackMiddleware for dependencies with
                # contextvars.
                # This needs to happen after user middlewares because those create a
                # new contextvars context copy by using a new AnyIO task group.
                # The initial part of dependencies with yield is executed in the
                # FastAPI code, inside all the middlewares, but the teardown part
                # (after yield) is executed in the AsyncExitStack in this middleware,
                # if the AsyncExitStack lived outside of the custom middlewares and
                # contextvars were set in a dependency with yield in that internal
                # contextvars context, the values would not be available in the
                # outside context of the AsyncExitStack.
                # By putting the middleware and the AsyncExitStack here, inside all
                # user middlewares, the code before and after yield in dependencies
                # with yield is executed in the same contextvars context, so all values
                # set in contextvars before yield is still available after yield as
                # would be expected.
                # Additionally, by having this AsyncExitStack here, after the
                # ExceptionMiddleware, now dependencies can catch handled exceptions,
                # e.g. HTTPException, to customize the teardown code (e.g. DB session
                # rollback).
                Middleware(AsyncExitStackMiddleware),
            ]
        )

As far as I know, ServerErrorMiddleware is the outermost middleware.

When there is an error, context gets detached by OpentelemetryMiddleware before control reaches ServerErrorMiddleware's exception handler that I registered. Hence when the log is written it does not get traceId and spanId.

I tested this out, and made changes in

opentelemetry/trace/init.py -> use_span function

@contextmanager
def use_span(
    span: Span,
    end_on_exit: bool = False,
    record_exception: bool = True,
    set_status_on_exception: bool = True,
) -> Iterator[Span]:
    """Takes a non-active span and activates it in the current context.

    Args:
        span: The span that should be activated in the current context.
        end_on_exit: Whether to end the span automatically when leaving the
            context manager scope.
        record_exception: Whether to record any exceptions raised within the
            context as error event on the span.
        set_status_on_exception: Only relevant if the returned span is used
            in a with/context manager. Defines whether the span status will
            be automatically set to ERROR when an uncaught exception is
            raised in the span with block. The span status won't be set by
            this mechanism if it was previously set manually.
    """
    try:
        excep = None
        token = context_api.attach(context_api.set_value(_SPAN_KEY, span))
        try:
            yield span
        except Exception as exc:
            excep = exc
            raise exc
        finally:
            if excep is None:
                context_api.detach(token)

    except Exception as exc:  # pylint: disable=broad-except
        if isinstance(span, Span) and span.is_recording():
            # Record the exception as an event
            if record_exception:
                span.record_exception(exc)

            # Set status in case exception was raised
            if set_status_on_exception:
                span.set_status(
                    Status(
                        status_code=StatusCode.ERROR,
                        description=f"{type(exc).__name__}: {exc}",
                    )
                )
        raise

    finally:
        if end_on_exit:
            span.end()

and
opentelemetry/instrumentation/asgi/init.py -> call function

    async def __call__(self, scope, receive, send):
        """The ASGI application

        Args:
            scope: An ASGI environment.
            receive: An awaitable callable yielding dictionaries
            send: An awaitable callable taking a single dictionary as argument.
        """
        if scope["type"] not in ("http", "websocket"):
            return await self.app(scope, receive, send)

        _, _, url = get_host_port_url_tuple(scope)
        if self.excluded_urls and self.excluded_urls.url_disabled(url):
            return await self.app(scope, receive, send)

        span_name, additional_attributes = self.default_span_details(scope)

        span, token = _start_internal_or_server_span(
            tracer=self.tracer,
            span_name=span_name,
            start_time=None,
            context_carrier=scope,
            context_getter=asgi_getter,
        )
        attributes = collect_request_attributes(scope)
        attributes.update(additional_attributes)
        active_requests_count_attrs = _parse_active_request_count_attrs(
            attributes
        )
        duration_attrs = _parse_duration_attrs(attributes)

        if scope["type"] == "http":
            self.active_requests_counter.add(1, active_requests_count_attrs)
        try:
            excep = None
            with trace.use_span(span, end_on_exit=True) as current_span:
                if current_span.is_recording():
                    for key, value in attributes.items():
                        current_span.set_attribute(key, value)

                    if current_span.kind == trace.SpanKind.SERVER:
                        custom_attributes = (
                            collect_custom_request_headers_attributes(scope)
                        )
                        if len(custom_attributes) > 0:
                            current_span.set_attributes(custom_attributes)

                if callable(self.server_request_hook):
                    self.server_request_hook(current_span, scope)

                otel_receive = self._get_otel_receive(
                    span_name, scope, receive
                )

                otel_send = self._get_otel_send(
                    current_span,
                    span_name,
                    scope,
                    send,
                    duration_attrs,
                )
                start = default_timer()

                await self.app(scope, otel_receive, otel_send)
        except Exception as exc:
            excep = exc
            raise exc
        finally:
            if scope["type"] == "http":
                target = _collect_target_attribute(scope)
                if target:
                    duration_attrs[SpanAttributes.HTTP_TARGET] = target
                duration = max(round((default_timer() - start) * 1000), 0)
                self.duration_histogram.record(duration, duration_attrs)
                self.active_requests_counter.add(
                    -1, active_requests_count_attrs
                )
            if token:
                if excep is None:
                    context.detach(token)

in my local machine. Basically I am not detaching the context if there is an exception.
After these changes I am getting actual traceId and spanId.

But I am not sure, if the solution I have used is correct or not. Request inputs from python and OTEL experts out here.

@amit12297 amit12297 added the bug Something isn't working label Oct 16, 2023
@lenalebt
Copy link

lenalebt commented Nov 8, 2023

I do have the exact same problem and thought I was doing something wrong, thanks, you saved me a few hours of debugging by posting this :-)!
I wanted to return the trace id in my error response because it's really helpful if people open support tickets to actually have a trace id to follow up on. My current situation is basically that I can return trace ids in all calls that actually return fine, and no trace id for those who fail 😅 .

@dextercai
Copy link

Also have the same problem.

@diogobaeder
Copy link

I'm having the same issue with the gRPC instrumentator. This is pretty bad, because the most valuable logs for me are the ones related to exceptions, if I can't have the context there it means I'm in the dark when trying to debug them.

@joriskoris
Copy link

Is there any progress on this?

@rikkkx
Copy link

rikkkx commented Jan 11, 2024

I've met same issue. But I have possible solution.

solution

just use otel-intrumentation-asgi (link) instead of otel-instumentation-fastapi

from fastapi import FastAPI
from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware

app = FastAPI(...)

# add your FastAPI application routes and settings here

# you need initialized TracerProvider
tracer_provider: TracerProvider = ...

app = OpenTelemetryMiddleware(app, tracer_provider=tracer_provider)

Explaining

opentelemetry-instrumentation-fastapi adds OpenTelemetryMiddleware via app.add_middleware. It will put OpenTelemetryMiddleware in the starlette app user_middleware collection (source).

When starlette handles http request it builds middlewares (source). When building, it places user_middleware under ServerErrorMiddleware, so when response handling comes to ServerErrorMiddleware user middlewares are closed.

So, all you need is to put OpenTelemetryMiddleware as top middleware. So, wrap your fastapi asgi with OpenTelemetryMiddleware and it will work correctly

@ocelotl
Copy link
Contributor

ocelotl commented Jan 16, 2024

@amit12297 @rikkkx can you please open PR with your fix?

@CaueP
Copy link

CaueP commented Jan 29, 2024

I had the same issue using opentelemetry-instrument agent with FastAPI automatic instrumentation.

My solution was similar to @rikkkx 's (thanks!), but I didn't remove opentelemetry-instrumentation-fastapi because the exception log is only added to FastAPI spans, and is missing on ASGI spans.

The FastAPI initialization was the following. Note that TracerProvider wasn't needed, probably due to OpenTelemetry agent automatic setup.

from fastapi import FastAPI
from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware

app = FastAPI(...)

# add your FastAPI application routes and settings here

app = OpenTelemetryMiddleware(app)

The issue I noticed with this approach is that the HTTP request spans got duplicated (2 extra spans) due to ASGI instrumentation and I couldn't find a way to disable it easily - probably filtering out spans would do the work, but it would make
this workaround way too complex.

@CaueP
Copy link

CaueP commented Jul 1, 2024

I had the same issue using opentelemetry-instrument agent with FastAPI automatic instrumentation.

My solution was similar to @rikkkx 's (thanks!), but I didn't remove opentelemetry-instrumentation-fastapi because the exception log is only added to FastAPI spans, and is missing on ASGI spans.

The FastAPI initialization was the following. Note that TracerProvider wasn't needed, probably due to OpenTelemetry agent automatic setup.

from fastapi import FastAPI
from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware

app = FastAPI(...)

# add your FastAPI application routes and settings here

app = OpenTelemetryMiddleware(app)

The issue I noticed with this approach is that the HTTP request spans got duplicated (2 extra spans) due to ASGI instrumentation and I couldn't find a way to disable it easily - probably filtering out spans would do the work, but it would make this workaround way too complex.

After using this workaround for a while, I noticed that using ASGI lib as a wrapper causes a high cardinality on the span name attribute because the parameters received on the URL will be the name of the root span. See an example in the image below.
image

I found a solution by injecting the ASGI middleware through FastAPI, as follows:

from fastapi import FastAPI
from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware
from opentelemetry.instrumentation.fastapi import (
    _excluded_urls_from_env,
    _get_default_span_details,
)


app = FastAPI(...)

# add your FastAPI application routes and settings here

# Adds ASGI OTel middleware with FastAPI's add_middleware:
app.add_middleware(
    OpenTelemetryMiddleware,
    default_span_details=_get_default_span_details,     # Uses FastAPI's span details builder to set span name
    excluded_urls=_excluded_urls_from_env,     # bonus: make it able to exclude URLs through the env var OTEL_PYTHON_EXCLUDED_URLS
)

Root span name after the change:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants