-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util/tracing: trim trace recordings in a smarter way #88414
Conversation
cc @yuzefovich |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @andreimatei)
-- commits
line 4 at r1:
I was thinking that we'd backport some improvements to 22.2, but this commit goes against that - what's your take?
BTW we should be getting the dataset from which #87536 was filed some time soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I was thinking that we'd backport some improvements to 22.2, but this commit goes against that - what's your take?
BTW we should be getting the dataset from which #87536 was filed some time soon.
If we backport, I'll deal with it in the backport.
But I don't think I'll backport this PR; it's a bit large. Now that we think we understand what's going on, do you think we necessarily need to backport anything at all?
I was thinking that one thing we could backport, if we have to, is an increase in the trace size limit (perhaps from 1k to 5k spans) for traces gathered through statement diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @yuzefovich)
This field was maintained for 22.1 compatibility. Release note: None
The referenced field went away a while ago. Release note: None
Before this patch, trace recordings were represented as []RecordedSpan. This flat slice doesn't make it easy to reconstruct the tree form. This patch switches the tracing library to represent traces as trees. The interface outside of the tracing library stays the flat format, at least for now. The point of the change is that a further patch will use the tree shape to make smarter decisions about which spans to drop from a trace when the size of the trace is too large. Release note: None
Before this patch, when the recording of a child span was being added to the parent, if the number of spans in the child recording + the number of spans in the parent's recording were greater than the span limit (1000), then the child's recording was completely dropped (apart from the structured events, which were still retained). So, for example, if the parent had a recording of 1 span, and the child has a recording of 1000 spans, the whole 1000 spans were dropped. This patch improves things by always combining the parent trace and the child trace, and then trimming the result according to the following arbitrary algorithm: - start at the root of the trace and sort its children by size, desc - drop the fattest children (including their descendents) until the remaining number of spans to drop becomes smaller than the size of the fattest non-dropped child - recurse into that child, with an adjusted number of spans to drop So, the idea is that, recursively, we drop parts of the largest child - including dropping the whole child if needed. Fixes cockroachdb#87536 Release note: None
b809388
to
b03a24d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @yuzefovich)
Build succeeded: |
Before this patch, when the recording of a child span was being added to
the parent, if the number of spans in the child recording + the number
of spans in the parent's recording were greater than the span limit
(1000), then the child's recording was completely dropped (apart from
the structured events, which were still retained). So, for example, if
the parent had a recording of 1 span, and the child has a recording of
1000 spans, the whole 1000 spans were dropped.
This patch improves things by always combining the parent trace and the
child trace, and then trimming the result according to the following
arbitrary algorithm:
remaining number of spans to drop becomes smaller than the size of the
fattest non-dropped child
So, the idea is that, recursively, we drop parts of the largest child -
including dropping the whole child if needed.
Fixes #87536
Release note: None