Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bug with Dummy Metadata Subsegments #402

Closed
danielebra opened this issue Dec 4, 2023 · 6 comments · Fixed by #403
Closed

Potential Bug with Dummy Metadata Subsegments #402

danielebra opened this issue Dec 4, 2023 · 6 comments · Fixed by #403

Comments

@danielebra
Copy link
Contributor

Hi there,

I am experiencing an issue adding DataDog Python Tracing in a Lambda via the handler redirection approach. Would greatly appreciate insight from other members of the community or core maintainers.

Expected Behavior

An application that is instrumented with ddtrace (python) emits an event to EventBridge. The event is picked up by SQS and then triggers a Lambda (python) that is instrumented with datadog. The traversal through these services created a distributed trace.

Actual Behavior

When the Python Lambda is invoked, an exception is raised during the preparation of the lambda handler.

Steps to Reproduce the Problem

Unit test that attempts to replicate this behaviour: https://github.com/danielebra/datadog-lambda-python/pull/1/files
Event to invoke Lambda with

{
  "Records": [
    {
      "attributes": {
        "ApproximateFirstReceiveTimestamp": 1701063012630,
        "ApproximateReceiveCount": 1,
        "SenderId": "xxx",
        "SentTimestamp": 1701063012628
      },
      "awsRegion": "us-east-1",
      "body": "{\"version\": \"0\", \"id\": \"b77541ed-xxxx-02ee-xxxx-175e4a8036b1\", \"detail-type\": \"xxx\", \"source\": \"xxx\", \"account\": \"xxx\", \"time\": \"2023-11-27T14:01:03Z\", \"region\": \"us-east-1\", \"resources\": [], \"detail\": {\"headers\": {\"correlation_id\": \"1894532486775893021\", \"message_id\": \"f279eb3a-xxxx-4beb-xxxx-8fe56bb76f45\", \"operation_id\": \"2ecf80ef-xxxx-4cf1-xxxx-64f493d7d323\", \"transaction_id\": \"7f81f05e-xxxx-468d-xxxx-c5c684b232f2\", \"event_type\": \"xxx\", \"event_ts\": \"2023-11-27T14:01:03.065691\"}, \"payload\": {}, \"_datadog\": {\"x-datadog-trace-id\": \"1894532486775893021\", \"x-datadog-parent-id\": \"8491654256788888308\", \"x-datadog-sampling-priority\": \"1\", \"x-datadog-tags\": \"_dd.p.dm=-1\", \"traceparent\": \"00-00000000000000001a4abb4251b6ec1d-75d86a8e4b55daf4-01\", \"tracestate\": \"dd=s:1;t.dm:-1\"}}}",
      "eventSource": "aws:sqs",
      "eventSourceARN": "arn:aws:sqs:us-east-1:xxx:xxx",
      "md5OfBody": "3633dx84215b6ceee9x3x2270910de21",
      "messageId": "8d6c7x0d-cxbf-4f38-895x-73265dx1635c",
      "receiptHandle": "AQEB7w4lNhlheNWNmZI/pqJtjmElnVjAGSIqB+X/zIvChMyCbKle3sFU1VKnlrzIM3DrMwSVJxOqqCy/rhkL3u1l1zbxEj2hckrHRObuxyoExKG+fEbPAzHjq2hIZSeZM/uYwsHR7halpRasv5Djnkc+FEcGiMEq8YSreNddN1nk/LhBdn3cwuzegNsu9I9V+ERW+nv/bOgUyeo1uRpMkESlH1A7CInF1y+0vyJyqRGTfAbBXfGmS6OxxxJ9lOyBPhkLvHe8J8ceLuGM+zunavrLY2GRkr/KXxxxU9vJseXbHicrjV0vxz7KyDBK+Dyo1euFjeS4T7ieua28Afmu0fQs63xcUDal6sxCv+1OZ/qlUS5cSU1a+EGBlFaJ61CqKdD1xFbXLMjWUupUz9aHKJONfZtz0ZmEp22jftl79MiD22o="
    }
  ]
}

DataDog Environment variables

DD_ENV: xxx
DD_LOG_LEVEL: debug
DD_SERVICE: xxx
DD_VERSION: xxx
DD_LAMBDA_HANDLER: "handler.handler"
DD_TRACE_ENABLED: true
DD_LOGS_INJECTION: false
DD_COLD_START_TRACING: false
DD_MERGE_XRAY_TRACES: true
DD_LOGS_ENABLED: false
DD_API_KEY: xxx

Logical breakdown

The _before method in the LambaDecorator performs some actions to extract the trace context.

dd_context, trace_context_source, event_source = extract_dd_trace_context(
    event,
    context,
    extractor=self.trace_extractor,
    decode_authorizer_context=self.decode_authorizer_context,
)

an SQS context is extracted via

elif event_source.equals(EventTypes.SNS) or event_source.equals(EventTypes.SQS):
    context = extract_context_from_sqs_or_sns_event_or_context(
        event, lambda_context
    )

We know the above has a complete context because Extracted Datadog trace context from event or context is logged

Then the dd_context is used as a subsegment (this is where the error occurs)

if dd_context and trace_context_source == TraceContextSource.EVENT:
    create_dd_dummy_metadata_subsegment(
        dd_context, XraySubsegment.TRACE_KEY
    )
def create_dd_dummy_metadata_subsegment(
    subsegment_metadata_value, subsegment_metadata_key
):
    """
    Create a Datadog subsegment to pass the Datadog trace context or Lambda function
    tags into its metadata field, so the X-Ray trace can be converted to a Datadog
    trace in the Datadog backend with the correct context.
    """
    send_segment(subsegment_metadata_key, subsegment_metadata_value)
def send_segment(key, metadata):
    host_port_tuple = get_xray_host_port(
        os.environ.get(XrayDaemon.XRAY_DAEMON_ADDRESS, "")
    )
    if host_port_tuple is None:
        return None
    context = parse_xray_header(
        os.environ.get(XrayDaemon.XRAY_TRACE_ID_HEADER_NAME, "")
    )
    if context is None:
        logger.debug(
            "Failed to create segment since it was not possible to get trace context from header"
        )
        return None

    # Skip adding segment, if the xray trace is going to be sampled away.
    if context["sampled"] == "0":
        logger.debug("Skipping sending metadata, x-ray trace was sampled out")
        return None
    segment = build_segment(context, key, metadata) # <----
    segment_payload = build_segment_payload(segment)
    send(host_port_tuple, segment_payload)

The build_segment call above performs the json.dumps

def build_segment(context, key, metadata):
    segment = json.dumps(
        {
            "id": generate_random_id(),
            "trace_id": context["trace_id"],
            "parent_id": context["parent_id"],
            "name": XraySubsegment.NAME,
            "start_time": time.time(),
            "end_time": time.time(),
            "type": "subsegment",
            "metadata": {
                XraySubsegment.NAMESPACE: {
                    key: metadata,
                }
            },
        }
    )
    return segment

However, metadata is a ddtrace.context.Context object

Additionally, I have gathered these datapoints at runtime where I am experiencing this issue in AWS

    dd_context, trace_context_source, event_source = extract_dd_trace_context(
        event,
        context,
        extractor=None,
        decode_authorizer_context=True,
    )
    print("-" * 8)
    print(create_dd_dummy_metadata_subsegment(dd_context, XraySubsegment.TRACE_KEY))
    print(dd_context, trace_context_source, event_source)
    print("dd_context type", type(dd_context))
    # dd_context type <class 'ddtrace.context.Context'>
    print("dd_context is:", str(dd_context))
    # dd_context is: Context(trace_id=1894532486775893021, span_id=8491654256788888308, _meta={'traceparent': '00-00000000000000001a4abb4251b6ec1d-75d86a8e4b55daf4-01', 'tracestate': 'dd=s:1;t.dm:-1', '_dd.p.dm': '-1'}, _metrics={'_sampling_priority_v1': 1}, _span_links=[])

Based on the above, I believe dd_context to be a Context object

Specifications

  • Datadog Lambda Layer version: 49
  • Python version: 3.9.18

Stacktrace

Datadog debug logs

[ERROR] 2023-12-04T03:36:53.853Z f28f8879-3c70-581d-a0ec-d59e77f2a3bb Error Object of type Context is not JSON serializable. Traceback: Traceback (most recent call last): File "/var/task/datadog_lambda/wrapper.py", line 283, in _before create_dd_dummy_metadata_subsegment( File "/var/task/datadog_lambda/tracing.py", line 146, in create_dd_dummy_metadata_subsegment send_segment(subsegment_metadata_key, subsegment_metadata_value) File "/var/task/datadog_lambda/xray.py", line 116, in send_segment segment = build_segment(context, key, metadata) File "/var/task/datadog_lambda/xray.py", line 78, in build_segment segment = json.dumps( File "/var/lang/lib/python3.9/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/var/lang/lib/python3.9/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/var/lang/lib/python3.9/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/v
[DEBUG] 2023-12-04T03:36:53.852Z f28f8879-3c70-581d-a0ec-d59e77f2a3bb Reading trace context from env var Root=1-656d4951-707b16ac7efd22d06914c84f;Parent=82d9a73d124414dc;Sampled=1
[DEBUG] 2023-12-04T03:36:53.852Z f28f8879-3c70-581d-a0ec-d59e77f2a3bb extracted dd trace context Context(trace_id=7464428422997804233, span_id=10276534880103989625, _meta={'traceparent': '00-00000000000000006796fa1a3c1d04c9-8e9d9606aa828179-01', 'tracestate': 'dd=s:1;t.dm:-1', '_dd.p.dm': '-1'}, _metrics={'_sampling_priority_v1': 1}, _span_links=[])
[DEBUG] 2023-12-04T03:36:53.851Z f28f8879-3c70-581d-a0ec-d59e77f2a3bb Extracted Datadog trace context from event or context
[DEBUG] 2023-12-04T03:36:53.851Z f28f8879-3c70-581d-a0ec-d59e77f2a3bb Sending metric aws.lambda.enhanced.invocations value 1 to Datadog via extension
File "/var/task/datadog_lambda/wrapper.py", line 283, in _before
  create_dd_dummy_metadata_subsegment(
File "/var/task/datadog_lambda/tracing.py", line 146, in create_dd_dummy_metadata_subsegment
  send_segment(subsegment_metadata_key, subsegment_metadata_value)
File "/var/task/datadog_lambda/xray.py", line 116, in send_segment
  segment = build_segment(context, key, metadata)
File "/var/task/datadog_lambda/xray.py", line 78, in build_segment
  segment = json.dumps(
File "/var/lang/lib/python3.9/json/__init__.py", line 231, in dumps
  return _default_encoder.encode(obj)
File "/var/lang/lib/python3.9/json/encoder.py", line 199, in encode
  chunks = self.iterencode(o, _one_shot=True)
File "/var/lang/lib/python3.9/json/encoder.py", line 257, in iterencode
  return _iterencode(o, 0)
File "/var/lang/lib/python3.9/json/encoder.py", line 179, in default
  raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Context is not JSON serializable
@rickydua
Copy link

rickydua commented Dec 4, 2023

Excellent breakdown of the problem @danielebra.

@astuyve
Copy link
Contributor

astuyve commented Dec 4, 2023

Thanks for the great summary here @danielebra!
We'll take a look right away.

@purple4reina
Copy link
Contributor

Hi again @danielebra! I'm taking a look at this now.

@purple4reina
Copy link
Contributor

For others who are also encountering this issue, please roll back the version of your Datadog Python Lambda layer to v82. We are working on a fix for this now in a new version. We'll post back here once that new version has been released.

purple4reina added a commit that referenced this issue Dec 4, 2023
Fixes error `TypeError: Object of type Context is not JSON serializable`
#402
@danielebra
Copy link
Contributor Author

Thanks for the swift support on this one @purple4reina @astuyve 🙏

purple4reina added a commit that referenced this issue Dec 4, 2023
Fixes error `TypeError: Object of type Context is not JSON serializable`
#402
purple4reina added a commit that referenced this issue Dec 4, 2023
Fixes error `TypeError: Object of type Context is not JSON serializable`
#402
purple4reina added a commit that referenced this issue Dec 5, 2023
Fixes error `TypeError: Object of type Context is not JSON serializable`
#402
purple4reina added a commit that referenced this issue Dec 5, 2023
Fixes error `TypeError: Object of type Context is not JSON serializable`
#402
@purple4reina
Copy link
Contributor

@danielebra a new layer has been released! Thanks again for reporting this so quickly. Very much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants