Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: centralized span registry #61407

Closed
irfansharif opened this issue Mar 3, 2021 · 6 comments
Closed

tracing: centralized span registry #61407

irfansharif opened this issue Mar 3, 2021 · 6 comments
Assignees
Labels
A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@irfansharif
Copy link
Contributor

Describe the problem

The registry of inflight spans today only captures root spans. We construct an in-memory tree of child spans that derive from the parent spans, and are thus only accessible through tree traversal. This makes a few things more cumbersome, compared to the alternative (below).

// activeSpans is a map that references all non-Finish'ed local root spans,
// i.e. those for which no WithLocalParent(<non-nil>) option was supplied.
// It also elides spans created using WithBypassRegistry.
// The map is keyed on the span ID, which is deterministically unique.
//
// In normal operation, a local root Span is inserted on creation and
// removed on .Finish().
//
// The map can be introspected by `Tracer.VisitSpans`. A Span can also be
// retrieved from its ID by `Tracer.GetActiveSpanFromID`.
activeSpans struct {
// NB: it might be tempting to use a sync.Map here, but
// this incurs an allocation per Span (sync.Map does
// not use a sync.Pool for its internal *entry type).
//
// The bare map approach is essentially allocation-free once the map
// has grown to accommodate the usual number of active local root spans,
// and the critical sections of the mutex are very small.
syncutil.Mutex
m map[uint64]*Span
}

// children contains the list of child spans started after this Span
// started recording.
children []*crdbSpan

Proposed improvement

We can centralize the storage of spans by storing them directly in the same in-memory registry (keyed by Span ID) and have parent spans only refer to child spans by IDs, rather than direct pointers. See #59310 for a prototype of what that could look like.

It comes with a few upsides.

  • Memory accounting: we'll have a centralized place for where it all happens, instead of threading it in through individual spans and child spans. It also lets us introduce a global debug switch to drop all structured recordings we’re holding onto, or control what the RSS is for the current set of observable structured events.
  • Historical introspection: by decoupling the garbage collection of structured events from the lifecycle of the span itself, we can go back in time (very briefly) and see what the collected events were. Right now we can only introspect in-flight spans. This might not come in super handy, but it also might, given spans are very short-lived in practice.
  • We can refer to child spans by Span ID, instead of building an in-memory graph to traverse through. It helps obviate memory issues and trickiness we're observing with forked spans that outlive parents, and parents that might still want to hold onto references to child spans for traversal. [dnm] tracing: clean up use of auto collection across tasks #59391.
  • Another perk of having span-centric storage comes from util/tracing,sql: add builtin to set trace spans' verbosity #61353. At the time of writing, only root spans are accessible through the registry. So a built-in that's able to selectively toggle the verbosity of a span is of limited use; we can't isolate a random child span because it's only indirectly accessible (we'd need to traverse the tree of the root span).
@irfansharif irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-tracing Relating to tracing in CockroachDB. labels Mar 3, 2021
@angelapwen
Copy link

Thanks for the writeup (and prototype) @irfansharif — very nice to have!

@angelapwen
Copy link

I also want to link #60784 as a related PR whose stated functionality will work as expected once we have a centralized span registry. Currently the builtin added is called payloads_for_span, which implies that we can surface payloads for any span ID passed into the builtin — but the builtin will only work appropriately for a root span ID. If a non-root span ID is entered, the builtin will return false (indicating failure).

Once this is fixed, we will likely need to make a change to #60922 payloads_for_trace, where we are executing payloads_for_span internally. Right now this builtin returns a row for each payload in all spans of a trace, but when every span is appropriately surfaced in payloads_for_span, we will see significant redundancy in payloads_for_trace. A span's payload(s) would then be surfaced once for that span, and then one more time for each ancestor span. 🛠️

@irfansharif
Copy link
Contributor Author

Filed #62020 following our discussions from last week.

@angelapwen
Copy link

I'm tying up loose ends as I offboard this week, and unassigning myself for now. I'll be eagerly following along any subsequent changes 😄

@angelapwen
Copy link

Ah, additionally, wanted to note this comment in a PR I am about to close off — #61694 (review)@irfansharif mentions that we actually will still need the in-memory graph structure to power session tracing.

@irfansharif
Copy link
Contributor Author

We're unlikely to do anything with this anytime soon. Especially now that traces are only created for sampled statements (#61777), we have a global knob to tune how much memory tracing uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants