tracing: sampling in event logs for better coverage #100790
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-3
Issues/test failures with no fix SLA
T-observability
There are limits to the number and size of logs that traces can collect (see #87539). When the recording exceeds the limits, it gets trimmed [#88414]. Currently it retains only a tail of events in the log.
Sometimes it is useful to see both the beginning and the ending of a span. For example, the span below has many identical events (individual
Get
evaluations), and was truncated to display only a tail of a 30s period. It would be useful though to see the head of this span, to make sure there wasn't a slow start / blockage before this long list of evaluations kicked off.So, when trimming is necessary, maybe the sweet spot is in retaining both the head and the tail of the log, and removing events in the middle. More generally, the events in the middle can be sampled, to provide some coverage there too. If so, discontiguous parts of the log should be marked as such so that the engineer looking at it is not confused.
Jira issue: CRDB-26623
Epic CRDB-32402
The text was updated successfully, but these errors were encountered: