Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace/tracer: setup GLS-stored context #2761

Merged
merged 9 commits into from
Jul 10, 2024
Merged

Conversation

eliottness
Copy link
Contributor

@eliottness eliottness commented Jun 26, 2024

What does this PR do?

This PR adds a new internal package called orchestrion to better interop with orchestrion. It's first feature is to interact with a compile-time added field in runtime.g to act as a Goroutine Local Storage (GLS) the same way C does with it's Thread Local Storage (TLS).

Sister PR of DataDog/orchestrion#127

From there the integration of this new package cascasdes into:

  • the ddtrace/tracer/context.go and internal/trace_context.go files to support storing and reading spans/traces from the GLS
  • the internal/appsec/dyngo package to refactor and add support for context-stored appsec operations
  • the internal/appsec/emitter/... and the graphql contribs packages to move the old way of storing context values in the new "dyngo"-way

Motivation

Auto-instrumentation comes and breaks the current way that spans are passed around using Go context.Context key value store. Using the GLS is the appropriate solution to have a better coverage where we can't pass a context around.

Reviewer's Checklist

For every reviewer, this seems interesting to read the internal/orchestrion package first.
For the appsec reviewer (@RomainMuller) reading what's inside internal/appsec/dyngo second also seems useful.

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

@pr-commenter
Copy link

pr-commenter bot commented Jun 26, 2024

Benchmarks

Benchmark execution time: 2024-07-10 15:08:06

Comparing candidate commit c82c374 in PR branch eliott.bouhana/APPSEC-53654 with baseline commit 6f26c4c in branch main.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 45 metrics, 0 unstable metrics.

scenario:BenchmarkOTelApiWithCustomTags/otel_api-24

  • 🟥 execution_time [+223.324ns; +294.876ns] or [+2.979%; +3.934%]

scenario:BenchmarkStartSpanConcurrent-24

  • 🟥 execution_time [+120.618ns; +270.582ns] or [+2.235%; +5.013%]

@eliottness eliottness force-pushed the eliott.bouhana/APPSEC-53654 branch from 07c2865 to 6d925c5 Compare June 26, 2024 13:41
@eliottness eliottness marked this pull request as ready for review June 27, 2024 17:12
@eliottness eliottness requested review from a team as code owners June 27, 2024 17:12
@eliottness eliottness changed the title orchestrion: setup GLS-stored context internal/orchestrion: setup GLS-stored context Jun 27, 2024
@Julio-Guerra Julio-Guerra requested review from felixge and darccio June 28, 2024 07:57
@Julio-Guerra Julio-Guerra changed the title internal/orchestrion: setup GLS-stored context ddtrace/tracer: setup GLS-stored context Jun 28, 2024
@Julio-Guerra Julio-Guerra marked this pull request as draft June 28, 2024 08:01
@darccio
Copy link
Member

darccio commented Jun 28, 2024

I'll approve it when Julio's suggestions are included and the PR is back to open again 😁

@eliottness
Copy link
Contributor Author

I believe my PR is missing a crucial component so I will put it back to draft

@eliottness eliottness force-pushed the eliott.bouhana/APPSEC-53654 branch from 8a805e3 to 8ca3a8c Compare July 1, 2024 12:18
@github-actions github-actions bot added the apm:ecosystem contrib/* related feature requests or bugs label Jul 1, 2024
Copy link
Contributor

@RomainMuller RomainMuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly stylistic comments, and some ergonomics -- some of which we have discussed offline but I left there "for posterity", but you can feel free to ignore where we essentially agree they are difficult to implement without causing a QoL degradation somewhere else.

internal/appsec/dyngo/operation.go Show resolved Hide resolved
internal/appsec/emitter/graphqlsec/request.go Show resolved Hide resolved
internal/orchestrion/context.go Show resolved Hide resolved
internal/orchestrion/context.go Outdated Show resolved Hide resolved
internal/orchestrion/context_stack.go Outdated Show resolved Hide resolved
internal/orchestrion/context_stack.go Outdated Show resolved Hide resolved
internal/orchestrion/context_stack.go Outdated Show resolved Hide resolved
internal/orchestrion/context_stack.go Outdated Show resolved Hide resolved
@eliottness eliottness marked this pull request as ready for review July 2, 2024 15:04
@eliottness eliottness requested a review from a team as a code owner July 2, 2024 15:04
@eliottness eliottness requested a review from RomainMuller July 2, 2024 15:06
Copy link
Member

@darccio darccio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Let's just change orchestrion.FromCtxOrGLS to orchestrion.WrapContext.

@eliottness eliottness force-pushed the eliott.bouhana/APPSEC-53654 branch from c6953e3 to 18f3f92 Compare July 3, 2024 08:34
Signed-off-by: Eliott Bouhana <[email protected]>
@eliottness eliottness force-pushed the eliott.bouhana/APPSEC-53654 branch 2 times, most recently from 204abc9 to 1163073 Compare July 10, 2024 11:34
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this to unblock progress on orchestrion. This is based on:

  • Monday's call with Romain and Elliot where we discussed the two PRs.
  • Further review of mine today to ensure that this PR does not pose a risk to existing users of dd-trace-go that don't use orchestrion yet (I took some notes).

One important thing we need going forward is a suite of integration tests that verify that all of this is working as expected, even for the known edge cases.

In particular I'm thinking about scenarios like this:

  • Parent span finishes before child span(s)
  • User spawns a goroutine that calls an orchestrion instrumentation method without passing an explicit ctx.
  • etc.

⚠️ The tests are failing as I'm writing this. Seems like a data race. This should be fixed before merging.

@@ -104,12 +104,12 @@ func (t *gqlTracer) Validate(_ graphql.ExecutableSchema) error {
func (t *gqlTracer) InterceptOperation(ctx context.Context, next graphql.OperationHandler) graphql.ResponseHandler {
opCtx := graphql.GetOperationContext(ctx)
span, ctx := t.createRootSpan(ctx, opCtx)
ctx, req := graphqlsec.StartRequestOperation(ctx, nil /* root */, span, types.RequestOperationArgs{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we pass nil instead of span to this func before, and why are we changing it now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed a parameter to the function StartRequestOperation because we always passed nil as a second argument to it. Romain also noted this here

@@ -503,6 +504,7 @@ func (s *span) Finish(opts ...ddtrace.FinishOption) {
}

s.finish(t)
orchestrion.GLSPopValue(sharedinternal.ActiveSpanKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a span finishes before its children?

Copy link
Contributor Author

@eliottness eliottness Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a case where this could happen in the same goroutine. 🤔 In a multiple goroutine setup where the parent ends in another gouroutine this is a real issue that is not taken care of today. We definitively need a kind of cross goroutine GLS in the future

@eliottness eliottness force-pushed the eliott.bouhana/APPSEC-53654 branch from 1163073 to b33d2bf Compare July 10, 2024 13:07
@eliottness eliottness enabled auto-merge (squash) July 10, 2024 14:45
auto-merge was automatically disabled July 10, 2024 14:49

Pull Request is not mergeable

@eliottness eliottness merged commit 577c776 into main Jul 10, 2024
185 checks passed
@eliottness eliottness deleted the eliott.bouhana/APPSEC-53654 branch July 10, 2024 15:34
github-merge-queue bot pushed a commit to DataDog/orchestrion that referenced this pull request Jul 15, 2024
In order to be able to lazily link against GLS accessors, these need to
be declared as variables instead of functions. This allows creation of
bindings on the other side that can be conditionally satisffied by the
existence of initialized declarations somewhere else.

Adds an integration test to demonstrate it is possible to build programs
with and without the runtime modification using this technique.

Enables: DataDog/dd-trace-go#2761

---------

Signed-off-by: Eliott Bouhana <[email protected]>
Signed-off-by: github-actions on behalf of eliottness <[email protected]>
Co-authored-by: Eliott Bouhana <[email protected]>
Co-authored-by: github-actions on behalf of eliottness <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:ecosystem contrib/* related feature requests or bugs appsec
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants