Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/tracing: trim trace recordings in a smarter way #88414

Merged
merged 4 commits into from
Sep 30, 2022

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Sep 21, 2022

Before this patch, when the recording of a child span was being added to
the parent, if the number of spans in the child recording + the number
of spans in the parent's recording were greater than the span limit
(1000), then the child's recording was completely dropped (apart from
the structured events, which were still retained). So, for example, if
the parent had a recording of 1 span, and the child has a recording of
1000 spans, the whole 1000 spans were dropped.
This patch improves things by always combining the parent trace and the
child trace, and then trimming the result according to the following
arbitrary algorithm:

  • start at the root of the trace and sort its children by size, desc
  • drop the fattest children (including their descendents) until the
    remaining number of spans to drop becomes smaller than the size of the
    fattest non-dropped child
  • recurse into that child, with an adjusted number of spans to drop

So, the idea is that, recursively, we drop parts of the largest child -
including dropping the whole child if needed.

Fixes #87536

Release note: None

@andreimatei andreimatei requested review from abarganier, aadityasondhi and a team September 21, 2022 22:28
@andreimatei andreimatei requested review from a team as code owners September 21, 2022 22:28
@andreimatei andreimatei requested a review from a team September 21, 2022 22:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei andreimatei removed request for a team September 21, 2022 22:29
@andreimatei
Copy link
Contributor Author

cc @yuzefovich

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @andreimatei)


-- commits line 4 at r1:
I was thinking that we'd backport some improvements to 22.2, but this commit goes against that - what's your take?

BTW we should be getting the dataset from which #87536 was filed some time soon.

@andreimatei andreimatei changed the title util/tracing: remove old tags field from RecordedSpan util/tracing: trim trace recordings in a smarter way Sep 22, 2022
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @yuzefovich)


-- commits line 4 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I was thinking that we'd backport some improvements to 22.2, but this commit goes against that - what's your take?

BTW we should be getting the dataset from which #87536 was filed some time soon.

If we backport, I'll deal with it in the backport.
But I don't think I'll backport this PR; it's a bit large. Now that we think we understand what's going on, do you think we necessarily need to backport anything at all?
I was thinking that one thing we could backport, if we have to, is an increase in the trace size limit (perhaps from 1k to 5k spans) for traces gathered through statement diagnostics.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @yuzefovich)

Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @yuzefovich)

This field was maintained for 22.1 compatibility.

Release note: None
The referenced field went away a while ago.

Release note: None
Before this patch, trace recordings were represented as []RecordedSpan.
This flat slice doesn't make it easy to reconstruct the tree form. This
patch switches the tracing library to represent traces as trees. The
interface outside of the tracing library stays the flat format, at least
for now. The point of the change is that a further patch will use the
tree shape to make smarter decisions about which spans to drop from a
trace when the size of the trace is too large.

Release note: None
Before this patch, when the recording of a child span was being added to
the parent, if the number of spans in the child recording + the number
of spans in the parent's recording were greater than the span limit
(1000), then the child's recording was completely dropped (apart from
the structured events, which were still retained). So, for example, if
the parent had a recording of 1 span, and the child has a recording of
1000 spans, the whole 1000 spans were dropped.
This patch improves things by always combining the parent trace and the
child trace, and then trimming the result according to the following
arbitrary algorithm:
- start at the root of the trace and sort its children by size, desc
- drop the fattest children (including their descendents) until the
  remaining number of spans to drop becomes smaller than the size of the
  fattest non-dropped child
- recurse into that child, with an adjusted number of spans to drop

So, the idea is that, recursively, we drop parts of the largest child -
including dropping the whole child if needed.

Fixes cockroachdb#87536

Release note: None
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @yuzefovich)

@craig
Copy link
Contributor

craig bot commented Sep 30, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracing: possible regression in the amount of things dropped from the trace
4 participants