Trace propagation not working on async python with custom instrumentation #3619

rodolfoBee · 2024-10-07T13:21:21Z

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.15.0

Steps to Reproduce

The issue is happening in a custom Python framework consisting of:

microservices running FastAPI instance
aiokafka consumer and producer as async tasks

All run in the same process as separated python tasks (create_task). The tracing instrumentation is done using custom instrumentation, and custom trace propagation. Simple example available here: https://github.com/antonpirker/testing-sentry/blob/main/test-asyncio/main.py

The goal is to have a trace for each message across all services.

Additional question: Python SDK creates a span for each task create with asyncio.create_task, but does this also create a new local scope for each task? Scope management might be the root cause, as the sample uses get_current_scope and get_isolation_scope .

Expected Result

All transactions created for a message going through services have the same traceID allowing sentry to build the complete distributed trace on the server side.

Actual Result

Each transaction has its own traceID, breaking the distributed trace.

The text was updated successfully, but these errors were encountered:

gigaverse-oz · 2024-10-13T19:55:10Z

Thank you very much for the example and for taking the time to review this issue.

I'd like to clarify the expected results and the complexity to help refine the question.

Consider the following flow per message across two services, similar to the provided example:

INPUT → { microservice1: [consumer 1 → producer 1] ---KAFKA MESSAGE---> [microservice2: consumer 2 → producer 2] } → OUTPUT

Each microservice (on different machines) is marked by [ ], and the full trace is enclosed in { }. We are aiming to obtain a unified trace across all services for each message.

Currently, for single-message processing, we successfully achieve the expected trace with the following:

TRACEPARENT and BAGGAGE are included in the Kafka message to propagate the trace, using sentry_sdk.continue_trace to link traces.
Instrumentation is done for each microservice, using asyncio.create_task for concurrent task execution.

Example Code Template:

# Microservice 1 example
async def process_message(self, input_message):
    with sentry_sdk.start_transaction(source=source, *args, **kwargs):
        # Process the message and create tasks
        # Send Kafka message with TRACEPARENT and BAGGAGE

# Microservice 2 example
async def process_message(self, input_message):
    # Continue the trace using sentry_sdk.continue_trace
    with sentry_sdk.start_transaction(
            sentry_sdk.continue_trace(
                input_message.event_trace_details,
                source=source,
                *args,
                **kwargs,
            )
        ):
        # Process the message and create tasks
        # Send Kafka message with TRACEPARENT and BAGGAGE

Main Question:

If we begin processing multiple messages concurrently (e.g., while processing input 1, microservice 1 receives inputs 2 and 3), will the tracing framework properly manage separate scopes for each message?

Will traces for each input remain distinct?
When calling get_current_scope within an active with clause, will the framework maintain correct scope handling for each async task?

Your insights into how Sentry manages scopes in this concurrent async context would be greatly appreciated.

szokeasaurusrex · 2024-10-14T10:55:04Z

@gigaverse-oz, I am a bit confused by your message. Are you still experiencing the problem described by @rodolfoBee, or are you describing a separate problem in your comment? Also, could you please clarify the difference between what you say is already working, and what you are asking about in the "Main Question"?

gigaverse-oz · 2024-10-14T11:15:27Z

Hi @szokeasaurusrex,

Thank you for the follow-up. I apologize for any confusion; I was aiming to clarify the problem.

The main difference between “what is already working” and the “Main Question” is about concurrency. Currently, processing works as expected for a single message at a time, where all async tasks are tied to that one message.

The “Main Question” relates to handling concurrency across multiple messages. Specifically, if the microservice begins processing multiple incoming messages simultaneously, will Sentry’s scope management correctly separate and handle the scopes for each message? Each message’s processing involves multiple asyncio.create_task calls (e.g., data fetching, analysis, transformations), and now we want to process multiple messages concurrently with multiple async tasks for each one.

If it would help clarify, I’d be happy to share the code and discuss further in a quick call. Thank you again for your time!

szokeasaurusrex · 2024-10-14T11:41:46Z

The “Main Question” relates to handling concurrency across multiple messages. Specifically, if the microservice begins processing multiple incoming messages simultaneously, will Sentry’s scope management correctly separate and handle the scopes for each message? Each message’s processing involves multiple asyncio.create_task calls (e.g., data fetching, analysis, transformations), and now we want to process multiple messages concurrently with multiple async tasks for each one.

Have you tried running the microservice with concurrency? I expect this might work, but I am unsure. I would recommend trying out the code and seeing whether it works. If it doesn't, we can look into what changes we would need to make to fix it.

gigaverse-oz · 2024-10-14T12:00:46Z

Alright, I’ll get started on that. There are quite a few changes to make, so I wanted to confirm Sentry support before diving into the heavy lifting.

I’ll keep you updated—it may take a few days to a couple of weeks.

rodolfoBee assigned antonpirker Oct 7, 2024

getsantry bot added the Waiting for: Product Owner label Oct 7, 2024

sl0thentr0py added Type: Bug Something isn't working Question and removed Waiting for: Product Owner labels Oct 9, 2024

getsantry bot added the Waiting for: Product Owner label Oct 13, 2024

antonpirker removed their assignment Oct 14, 2024

getsantry bot removed the Waiting for: Product Owner label Oct 14, 2024

szokeasaurusrex added the Waiting for: Community label Oct 14, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Oct 14, 2024

getsantry bot removed the Waiting for: Product Owner label Oct 14, 2024

szokeasaurusrex added the Waiting for: Community label Oct 14, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Oct 14, 2024

szokeasaurusrex added Waiting for: Community and removed Waiting for: Product Owner labels Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace propagation not working on async python with custom instrumentation #3619

Trace propagation not working on async python with custom instrumentation #3619

rodolfoBee commented Oct 7, 2024

gigaverse-oz commented Oct 13, 2024 •

edited

Loading

szokeasaurusrex commented Oct 14, 2024

gigaverse-oz commented Oct 14, 2024

szokeasaurusrex commented Oct 14, 2024

gigaverse-oz commented Oct 14, 2024

Trace propagation not working on async python with custom instrumentation #3619

Trace propagation not working on async python with custom instrumentation #3619

Comments

rodolfoBee commented Oct 7, 2024

How do you use Sentry?

Version

Steps to Reproduce

Expected Result

Actual Result

gigaverse-oz commented Oct 13, 2024 • edited Loading

Main Question:

szokeasaurusrex commented Oct 14, 2024

gigaverse-oz commented Oct 14, 2024

szokeasaurusrex commented Oct 14, 2024

gigaverse-oz commented Oct 14, 2024

gigaverse-oz commented Oct 13, 2024 •

edited

Loading