-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serverless] Inject trace context into SQS/SNS/EventBridge #2917
[serverless] Inject trace context into SQS/SNS/EventBridge #2917
Conversation
BenchmarksBenchmark execution time: 2024-10-16 21:48:16 Comparing candidate commit edb0584 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 58 metrics, 1 unstable metrics. |
5920275
to
7237e7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor changes to make sure we don't expose packages if its not necessary 🙏
73ba8f2
to
f917961
Compare
a76e6a2
to
f95440c
Compare
8479f85
to
80be4e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if it seems like I am nit picking you a lot here. Let me explain.
Customers care a lot about the performance of their apps. Since the instrumentation you are writing here gets run for every message send performed, it's important that we keep the overhead as low as possible.
Since our customers are expecting low overhead, this can sometimes lead to code that is more complex than otherwise necessary.
…race context bytes only once and reusing it for batch requests
… it for SNS/SQS, and (2) checking new eventbridge payload size before marshaling the updated detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just echoing @purple4reina's comments about extracting the repeated logic out of the for loop.
if err != nil { | ||
log.Debug("Unable to get trace context: %s", err.Error()) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit.
I think you could simplify this out even further. Since both handlePublish
and handlePublishBatch
call getTraceContext
, you could make the getTraceContext
call just once, outside of the handle functions and then pass the context objects into the handle functions.
This makes no functional change, but it would DRY up the code a bit. That would be an improvement, since if we need to update the getTraceContext
call in the future, we'd right now need to update it in two locations. Making the getTraceContext
call just once would mean updating code in a single location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is an easy fix and I'm open to doing it. But I'm trying to structure the EnrichOperation
func in a way that allows other teams to modify other operations in SQS/SNS. I think there could be other operations where they won't need the trace context, so it might make more sense to keep the getTraceContext()
calls in the handle funcs. For example, this in the java tracer:
https://github.com/DataDog/dd-trace-java/blob/master/dd-java-agent/instrumentation/aws-java-sqs-2.0/src/main/java/datadog/trace/instrumentation/aws/v2/sqs/SqsInterceptor.java#L84-L90
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you do in fact have an opportunity to pull a lot of work in eventbridge out of the for loop so it is only performed a single time.
Additionally, you did a great job with the java and dotnet tracers to inspect json payloads as strings and adding trace context manually. I think you can do the same thing here too.
…s and trace context. Reuse shared trace context for multiple entries
|
||
// Prepare the reused trace context string | ||
startTimeMillis := time.Now().UnixMilli() | ||
reusedTraceContext := fmt.Sprintf(`%s,"%s":"%d"`, carrierJSON[:len(carrierJSON)-1], startTimeKey, startTimeMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that sets the trace prop style to none
? I'm concerned in that case we'd end up with invalid json, like {,"start":123}
.
I think the env var you want is DD_TRACE_PROPAGATION_STYLE=none
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually, could probably avoid any such issues if you manually add the start time key to the carrier.
carrier[startTimeKey] = strconv.Itoa(startTimeMillis)
carrierJSON, err := json.Marshal(carrier)
etc...
This way you're always guaranteed at least one item in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dope
What does this PR do?
This PR injects trace context into:
SendMessage
andSendMessageBatch
messagesPublish
andPublishBatch
messagesPutEvents
eventsThis allows the Lambda extension to use this trace context to create inferred spans and combine these spans into a single trace. (See PR 1 and PR 2 for these changes)
Motivation
SNS and SQS are already supported in Java and .NET. However, this Go tracer did not have support for SQS, SNS, or EventBridge.
I am also working on a similar PR in dd-trace-java and dd-trace-dotnet to add EventBridge trace context injection. This PR differs because it adds support for SQS, SNS, and EventBridge.
Notes for reviewers
eventbridge.go
, we have a switch statement with a single case right now. But this will make it easier for future operations to be added.Testing
I added unit tests, which can be ran with the commands:
I also did manual tests. Here are what the flame graphs look like after these changes:
System tests covering SNS/SQS: DataDog/system-tests#3204
Reviewer's Checklist
Unsure? Have a question? Request a review!