Support for tail-based sampling #1720
Labels
area:sampling
Related to trace sampling
area:sdk
Related to the SDK
spec:trace
Related to the specification/trace directory
What are you trying to achieve?
As far as I can tell, there is no way to determine by looking at a single span whether any further spans may follow it. If this is the case then it makes it impossible for a collector which receives spans to have confidence that a trace is complete and tail-based sampling is safe to perform.
For short lived traces it’ll normally be fine just to have a window of X seconds and to sample the trace at the end of this window. For longer lived traces though, which potentially cover a large number of services then it may be rare for one of these traces to complete within X seconds - in which case these traces may never be sampled properly.
Additional context.
I wonder if a non-breaking solution to this is to introduce an attribute called something like
PropagateCount
on Spans which is set whenever a Span is finalised.If the some Span propagated its context a couple of times then
PropagateCount
would equal 2.Obviously there are no guarantees that the receiver of the propagated context goes on to use this. However it will allow for more flexible decision making when tail sampling as there could be a timeout per Span, rather than a timeout per Trace before sampling occurs.
Perhaps an extension to this idea would even be for parent Spans to be able (optionally) to specify expectations which they have of their children, e.g. I expect this child to complete within 10,000ms. This could further enhance the decision making when sampling and potentially drive other interesting workflows.
The text was updated successfully, but these errors were encountered: