-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVLS-5034] Create trace context from Step Function execution details #27988
[SVLS-5034] Create trace context from Step Function execution details #27988
Conversation
Regression DetectorRegression Detector ResultsRun ID: 1f0988c4-8ca3-4ffc-8487-2233241d0098 Metrics dashboard Target profiles Baseline: a0be446 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | pycheck_lots_of_tags | % cpu utilization | +0.65 | [-1.92, +3.22] | 1 | Logs |
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | +0.54 | [-0.23, +1.31] | 1 | Logs |
➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | Logs |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.01, +0.01] | 1 | Logs |
➖ | file_tree | memory utilization | -0.24 | [-0.36, -0.12] | 1 | Logs |
➖ | basic_py_check | % cpu utilization | -0.28 | [-3.16, +2.61] | 1 | Logs |
➖ | idle | memory utilization | -0.34 | [-0.38, -0.30] | 1 | Logs |
➖ | otel_to_otel_logs | ingress throughput | -0.42 | [-1.25, +0.41] | 1 | Logs |
➖ | tcp_syslog_to_blackhole | ingress throughput | -0.72 | [-0.77, -0.67] | 1 | Logs |
Bounds Checks
perf | experiment | bounds_check_name | replicates_passed |
---|---|---|---|
✅ | idle | memory_usage | 10/10 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=44720602 --os-family=ubuntu Note: This applies to commit 1c5f028 |
@@ -112,6 +113,11 @@ func (e Extractor) extract(event interface{}) (*TraceContext, error) { | |||
carrier, err = headersCarrier(ev.Headers) | |||
case events.LambdaFunctionURLRequest: | |||
carrier, err = headersCarrier(ev.Headers) | |||
case events.StepFunctionEvent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a Go expert. Does it mean that if the event include three StepFunctions specific metadata, it matches the StepFunctionEvent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mistaken -- that happens in serverless/trigger/events.go in https://github.com/DataDog/datadog-agent/pull/27988/files#diff-77b95cca617aa4d8daf59d6bf15540a3504815b769b963c5b78679dc1819fe21R277-R284
ParentID uint64 | ||
SamplingPriority sampler.SamplingPriority | ||
TraceID uint64 | ||
TraceIDUpper64Hex string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, Java, .NET, and Go tracers will automatically recognize this new attribute in the trace context, and add it to the span metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. My understanding is the extension starts the trace and adds the TraceID+ Upper 64. The Java, .Net, or Go tracer creates spans and then sends them to the extension. The extension adds them to the trace it created. I'll need to verify that the extension is doing enough with respect to the Upper 64 bits (i.e. do they need to be added to the _meta._dd.p.tid field for every span? Or just the root span?)
@@ -189,3 +189,7 @@ func (lp *LifecycleProcessor) initFromLambdaFunctionURLEvent(event events.Lambda | |||
lp.addTag(tagFunctionTriggerEventSourceArn, fmt.Sprintf("arn:aws:lambda:%v:%v:url:%v", region, accountID, functionName)) | |||
lp.addTags(trigger.GetTagsFromLambdaFunctionURLRequest(event)) | |||
} | |||
|
|||
func (lp *LifecycleProcessor) initFromStepFunctionEvent(event events.StepFunctionEvent) { | |||
lp.requestHandler.event = event.Payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In every other method in this file, we set
lp.requestHandler.event = event
Why are we setting it to event.Payload
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the consequence of a hack to get around the fact that the trace context was wrapped in a Payload
field, per the instructions here: https://docs.datadoghq.com/serverless/step_functions/installation/?tab=custom#:~:text=The%20JsonMerge%20intrinsic%20function%20merges%20the%20Step%20Functions%20context%20object%20(%24%24)%20with%20the%20original%20Lambda%E2%80%99s%20input%20payload%20(%24).%20Fields%20of%20the%20original%20payload%20overwrite%20the%20Step%20Functions%20context%20object%20if%20their%20keys%20are%20the%20same.
The actual event is a definitionless JSON object, so deserializing it and storing it is going to be slightly tricky. I'll think about it a little more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think it's best to store the event in its original form. This way, all the way through the call stack, we can assume that the event
is the actual event, and not just a piece of it.
As for json deserialization, you've already made that a bit easier by adding definitions to pkg/serverless/trigger/events/events.go
. I created that file so that during deserialization, we only need to store the keys we're actually going to use. It really cuts down the number of allocs we make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend you save the entire event, not just the payload. In addition to the reasons listed above, it makes your logic easier below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe I'm miss understanding the order of operations here? Please correct me if I am wrong.
@@ -63,6 +63,9 @@ func (lp *LifecycleProcessor) startExecutionSpan(event interface{}, rawPayload [ | |||
inferredSpan.Span.TraceID = traceContext.TraceID | |||
inferredSpan.Span.ParentID = traceContext.ParentID | |||
} | |||
if traceContext.TraceIDUpper64Hex != "" { | |||
lp.requestHandler.SetMetaTag(ddTraceIDUpper64BitsHeader, traceContext.TraceIDUpper64Hex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding this correctly, the upper 64 bits are set as a span tag? They are not included in any in/outbound trace context headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this is an actual concern that I believe could become a actual problem.
Since the requestHandler
is not reset on each invocation, it's feasible that one invocation has the upper 64 and then one later on does not. The later one would then still get the meta tag from the previous invocation. That's a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. That is a concern. A lambda function could be invoked by a Step Function and then a non-Step-Function-thing. That would be rare, but not impossible. Would it be reasonable to remove that Meta tag in OnInvokeEnd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the meta tag here; instead I'm passing the upper 64 in the response that goes back to the tracer.
tc, err := createTraceContextFromStepFunctionInput(ev) | ||
if err == nil { | ||
return tc, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First looking at this, I thought there was a better way to do this. But after closer inspection, there is another way to do this, but I'm not so sure it's better.
My biggest concern is consistency. We want to defer as much work to dd-trace-go as possible. That's what all these carriers are for -- so we can pass the set of found headers to the dd-trace-go propagator for extraction.
I threw together what this could look like, here 8ed2a33. While this version makes the code more consistent, I'm not convinced this is any better, and in fact I think it is worse because it means creating the headers map, extracting from that map, just to later recreate it. Dumb.
So, in the end, this PR is good. I'm sad we can't be more consistent about how we are extracting trace context from step functions. Ultimately though, they are different enough to get their own way of doing things.
} | ||
sfe := events.StepFunctionEvent{Payload: eventPayload} | ||
ev = eventPayload | ||
lp.initFromStepFunctionEvent(sfe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I like this. This way, the event passed out of this method for both of these step function types will be the same.
@@ -63,6 +64,9 @@ func (lp *LifecycleProcessor) startExecutionSpan(event interface{}, rawPayload [ | |||
inferredSpan.Span.TraceID = traceContext.TraceID | |||
inferredSpan.Span.ParentID = traceContext.ParentID | |||
} | |||
if traceContext.TraceIDUpper64Hex != "" { | |||
executionContext.TraceIDUpper64Hex = traceContext.TraceIDUpper64Hex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit.
I think you do not need the conditional here. You can just always set the upper 64 from the context.
On each invocation, the execution context is reset.
When execution reaches this point, executionContext.TraceIDUpper64Hex
will therefore always be ""
.
Therefore, there is no need to avoid setting executionContext.TraceIDUpper64Hex
to ""
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's more expensive, if traceContext.TraceIDUpper64Hex != ""
or executionContext.TraceIDUpper64Hex = traceContext.TraceIDUpper64Hex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure actually. But I just realized that we shouldn't set the Upper64BitsTag
tag if the upper 64 is empty. So we should leave the conditional, at least for that portion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact 💡, instead of doing the del of the Upper64BitsTag
key at end invocation, you could do it here. That would keep all logic that mutates the requestHander's meta tags in a single location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, clever
Okay, correct me if I am wrong, but I think there are a couple things that we've missed. The extension creates several spans of its own ( |
Do you think we want to set |
…t' of https://github.com/datadog/datadog-agent into chris.agocs/create_trace_context_from_stepfunction_input
@purple4reina I went ahead and added the upper64 bits to the meta tags, and then removed them after endExecutionSpan |
log.Debugf("Failed to unmarshal %s event: %s", stepFunction, err) | ||
break | ||
} | ||
sfe := events.StepFunctionEvent{Payload: eventPayload} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set the event to the full event and not just event.Payload
above, you wouldn't need to do this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Legacy and non-Legacy step function lambda invocations unmarshal into different types, we have to do a little awkward juggling regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, I think you mentioned this before. Can you point me again to what those two json blobs look like? I'm now wondering if there's a way we could unify them into a single event type, even if they look a bit different from one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legacy:
{
"Payload": {
"Execution": {
"Id": "arn:aws:states:us-east-1:425362996713:execution:agocsTestSF:bc9f281c-3daa-4e5a-9a60-471a3810bf44",
"Input": {},
"StartTime": "2024-07-30T19:55:52.976Z",
"Name": "bc9f281c-3daa-4e5a-9a60-471a3810bf44",
"RoleArn": "arn:aws:iam::425362996713:role/test-serverless-stepfunctions-dev-AgocsTestSFRole-tRkeFXScjyk4",
"RedriveCount": 0
},
"StateMachine": {
"Id": "arn:aws:states:us-east-1:425362996713:stateMachine:agocsTestSF",
"Name": "agocsTestSF"
},
"State": {
"Name": "agocsTest1",
"EnteredTime": "2024-07-30T19:55:53.018Z",
"RetryCount": 0
}
}
}
Non-legacy
{
"Execution": {
"Id": "arn:aws:states:us-east-1:425362996713:execution:agocsTestSF:bc9f281c-3daa-4e5a-9a60-471a3810bf44",
"Input": {},
"StartTime": "2024-07-30T19:55:52.976Z",
"Name": "bc9f281c-3daa-4e5a-9a60-471a3810bf44",
"RoleArn": "arn:aws:iam::425362996713:role/test-serverless-stepfunctions-dev-AgocsTestSFRole-tRkeFXScjyk4",
"RedriveCount": 0
},
"StateMachine": {
"Id": "arn:aws:states:us-east-1:425362996713:stateMachine:agocsTestSF",
"Name": "agocsTestSF"
},
"State": {
"Name": "agocsTest1",
"EnteredTime": "2024-07-30T19:55:53.018Z",
"RetryCount": 0
}
}
Serverless Benchmark Results
tl;drUse these benchmarks as an insight tool during development.
What is this benchmarking?The The benchmark is run using a large variety of lambda request payloads. In the charts below, there is one row for each event payload type. How do I interpret these charts?The charts below comes from The benchstat docs explain how to interpret these charts.
I need more helpFirst off, do not worry if the benchmarks are failing. They are not tests. The intention is for them to be a tool for you to use during development. If you would like a hand interpreting the results come chat with us in Benchmark stats
|
executionContext.TraceIDUpper64Hex = traceContext.TraceIDUpper64Hex | ||
lp.requestHandler.SetMetaTag(Upper64BitsTag, traceContext.TraceIDUpper64Hex) | ||
} else { | ||
delete(lp.requestHandler.triggerTags, Upper64BitsTag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool cool, and I just confirmed that attempting to delete a key that does not exist does not panic. Nor does it panic if the map is actually nil. 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good lookin' out!
/merge |
🚂 MergeQueue: waiting for PR to be ready This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. Use |
🚂 MergeQueue: pull request added to the queue The median merge time in Use |
## Summary of changes I've updated the Lambda extension so it is capable of returning a 128 bit trace ID when a tracer calls the `/lambda/start-invocation` endpoint [in this PR](DataDog/datadog-agent#27988) As per [the RFC](https://datadoghq.atlassian.net/wiki/spaces/RUMP/pages/3545630931/RFC+Support+128+bit+trace+IDs+in+RUM+SDKs#:~:text=For%20Datadog%20headers%2C%20the%20128%20bit%20trace%20id%20is%20sent%20in%20two%20parts%2C%20lower%2064%20bits%20in%20x%2Ddatadog%2Dtrace%2Did%20(decimal)%20and%20the%20higher%2064%20bits%20in%20x%2Ddatadog%2Dtags%20header%20under%20_dd.p.tid%20(hex)%20tag), the > lower 64 bits in `x-datadog-trace-id` (decimal) and the higher 64 bits in `x-datadog-tags` header under `_dd.p.tid` (hex) tag. This change modifies the function that calls `/lambda/start-invocation`, allowing it to pick out the upper 64 bits of the trace ID and set the resulting 128-bit trace ID in the extracted context. ## Reason for change The Lambda Extension may now return a 128 bit trace ID when a Step Function invokes a Lambda Function. ## Implementation details I rewrote LambdaCommon's `CreatePlaceholderScope` so it uses `SpanContextPropagator.Instance.Extract` rather than extracting trace context elements one by one. ## Test coverage Added a unit test for 128 bit trace IDs. Fixed existing unit tests so they pass a dictionary of headers to CreatePlaceholderScope. Removed a unit test that only passes SamplingPriority, since a distributed trace with only a sampling priority is hardly a distributed trace at all. ## Other details Backported to 2.x in (TODO) <!-- Fixes #{issue} --> <!--⚠️ Note: where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. --> --------- Co-authored-by: Lucas Pimentel <[email protected]> Co-authored-by: Andrew Lock <[email protected]> Co-authored-by: Daniel Romano <[email protected]> Co-authored-by: Steven Bouwkamp <[email protected]> Co-authored-by: Anna <[email protected]> Co-authored-by: NachoEchevarria <[email protected]> Co-authored-by: William Conti <[email protected]> Co-authored-by: Kevin Gosse <[email protected]> Co-authored-by: Tony Redondo <[email protected]> Co-authored-by: Gregory LEOCADIE <[email protected]>
## Summary of changes I've updated the Lambda extension so it is capable of returning a 128 bit trace ID when a tracer calls the `/lambda/start-invocation` endpoint [in this PR](DataDog/datadog-agent#27988) As per [the RFC](https://datadoghq.atlassian.net/wiki/spaces/RUMP/pages/3545630931/RFC+Support+128+bit+trace+IDs+in+RUM+SDKs#:~:text=For%20Datadog%20headers%2C%20the%20128%20bit%20trace%20id%20is%20sent%20in%20two%20parts%2C%20lower%2064%20bits%20in%20x%2Ddatadog%2Dtrace%2Did%20(decimal)%20and%20the%20higher%2064%20bits%20in%20x%2Ddatadog%2Dtags%20header%20under%20_dd.p.tid%20(hex)%20tag), the > lower 64 bits in `x-datadog-trace-id` (decimal) and the higher 64 bits in `x-datadog-tags` header under `_dd.p.tid` (hex) tag. This change modifies the function that calls `/lambda/start-invocation`, allowing it to pick out the upper 64 bits of the trace ID and set the resulting 128-bit trace ID in the extracted context. ## Reason for change The Lambda Extension may now return a 128 bit trace ID when a Step Function invokes a Lambda Function. ## Implementation details I rewrote LambdaCommon's `CreatePlaceholderScope` so it uses `SpanContextPropagator.Instance.Extract` rather than extracting trace context elements one by one. ## Test coverage Added a unit test for 128 bit trace IDs. Fixed existing unit tests so they pass a dictionary of headers to CreatePlaceholderScope. Removed a unit test that only passes SamplingPriority, since a distributed trace with only a sampling priority is hardly a distributed trace at all. ## Other details Backported to 2.x in (TODO) <!-- Fixes #{issue} --> <!--⚠️ Note: where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. --> --------- Co-authored-by: Lucas Pimentel <[email protected]> Co-authored-by: Andrew Lock <[email protected]> Co-authored-by: Daniel Romano <[email protected]> Co-authored-by: Steven Bouwkamp <[email protected]> Co-authored-by: Anna <[email protected]> Co-authored-by: NachoEchevarria <[email protected]> Co-authored-by: William Conti <[email protected]> Co-authored-by: Kevin Gosse <[email protected]> Co-authored-by: Tony Redondo <[email protected]> Co-authored-by: Gregory LEOCADIE <[email protected]>
…6181) ## Summary of changes I've updated the Lambda extension so it is capable of returning a 128 bit trace ID when a tracer calls the `/lambda/start-invocation` endpoint [in this PR](DataDog/datadog-agent#27988) As per [the RFC](https://datadoghq.atlassian.net/wiki/spaces/RUMP/pages/3545630931/RFC+Support+128+bit+trace+IDs+in+RUM+SDKs#:~:text=For%20Datadog%20headers%2C%20the%20128%20bit%20trace%20id%20is%20sent%20in%20two%20parts%2C%20lower%2064%20bits%20in%20x%2Ddatadog%2Dtrace%2Did%20(decimal)%20and%20the%20higher%2064%20bits%20in%20x%2Ddatadog%2Dtags%20header%20under%20_dd.p.tid%20(hex)%20tag), the > lower 64 bits in `x-datadog-trace-id` (decimal) and the higher 64 bits in `x-datadog-tags` header under `_dd.p.tid` (hex) tag. This change modifies the function that calls `/lambda/start-invocation`, allowing it to pick out the upper 64 bits of the trace ID and set the resulting 128-bit trace ID in the extracted context. ## Reason for change The Lambda Extension may now return a 128 bit trace ID when a Step Function invokes a Lambda Function. ## Implementation details I rewrote LambdaCommon's `CreatePlaceholderScope` so it uses `SpanContextPropagator.Instance.Extract` rather than extracting trace context elements one by one. ## Test coverage Added a unit test for 128 bit trace IDs. Fixed existing unit tests so they pass a dictionary of headers to CreatePlaceholderScope. Removed a unit test that only passes SamplingPriority, since a distributed trace with only a sampling priority is hardly a distributed trace at all. ## Other details Backport to 2.x of #6041 <!-- Fixes #{issue} --> <!--⚠️ Note: where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. --> --------- Co-authored-by: Lucas Pimentel <[email protected]> Co-authored-by: Andrew Lock <[email protected]> Co-authored-by: Daniel Romano <[email protected]> Co-authored-by: Steven Bouwkamp <[email protected]> Co-authored-by: Anna <[email protected]> Co-authored-by: NachoEchevarria <[email protected]> Co-authored-by: William Conti <[email protected]> Co-authored-by: Kevin Gosse <[email protected]> Co-authored-by: Tony Redondo <[email protected]> Co-authored-by: Gregory LEOCADIE <[email protected]>
What does this PR do?
Deterministically create a trace context from Step Function execution ARN, state name, and state entered time for Universal Instrumentation runtimes.
A Step Function has no way of generating a Datadog trace context. We generate the trace context for a Step Function execution after the fact in the Logs to Traces reducer.
Trace ID: sha256(execution ARN)
Span ID: sha256(execution ARN + state name + state entered time)
When a Step Function invokes a Lambda Function, it can be made to include its execution context in the invocation. This change allows the extension to inspect a Step Function invocation event, extract the Step Function execution context, and use the same math to generate a Trace ID and Parent ID that match the Trace ID and Span ID that will be generated in Logs to Traces.
Of Note
The Trace ID generated in Logs to Traces is a 128-bit Trace ID. The common practice (Python, Node) to support 128 bit trace IDs is to split them in half. The lower 64 bits are expressed as the uint64 Trace ID, and the upper 64 bits are encoded as a hexadecimal string and written to the root span's
_meta["_dd.p.tid"]
tag.How does it work?
I defined a Step Function event type in
events.go
. Then, inextractor.go
, I added a case to the type switch for theStepFunctionEvent
type. That case callscreateTraceContextFromStepFunctionInput
incarriers.go
,createTraceContextFromStepFunctionInput
extracts the Execution ID, the state name, and the state entered time, and uses them to create a trace context with a 128 bit trace ID.In order to support the 128 bit trace ID, I had to modify the
TraceContext
struct. I added a string field,TraceIDUpper64Hex
. Because it's a string, the zero value is "".Anywho, the case statement I added in
extractor.go
returns the created trace context.lp.Extractor.Extract()
is called in thestartExecutionSpan
function intrace.go
. The only change I made there was to check for the existence oftraceContext.TraceIDUpper64Hex
, and set a Meta Tag in thelifecycleProcessor.requestHandler
_dd.p.tid:{TraceIDUpper64Hex}
.Possible Drawbacks / Trade-offs
This change adds
crypto/sha256
as a dependency incarriers.go
, which might increase the size of the Extension binary if that library is not already linked somewhere else.Describe how to test/QA your changes
Screenshot of a Step Function invoking a .Net and a Java lambda function: