Probability sampling: Encode Span's head-adjusted count #170

jmacd · 2021-07-27T19:35:43Z

This text is reproduced from #148, a review that became too long after several revisions to the document.

This proposes specification text for a sampler.name and sampler.adjusted_count attribute, used to convey sampling probability in recorded SpanData messages.

A companion OTEP #168 discusses how to propagate sampling probability when using the Parent sampler.

jmacd · 2021-07-27T21:13:36Z

@oertl @yurishkuro @paulosman this document is unchanged from the most recent discussion in #148. I thought that opening a new PR would improve our chances of getting this specification merged.

This specification includes a lot of background to establish that there are many ways to sample and count spans, but all of them involve recording an adjusted count.

yurishkuro

I am in favor of merging the narrative, but I do not full agree with the proposed spec changes.

text/trace/0170-sampling-probability.md

paulosman · 2021-08-05T14:55:52Z

@oertl @yurishkuro @paulosman this document is unchanged from the most recent discussion in #148. I thought that opening a new PR would improve our chances of getting this specification merged.

This specification includes a lot of background to establish that there are many ways to sample and count spans, but all of them involve recording an adjusted count.

Thank you. I can at least provide a vendor perspective, as a consumer of sampled telemetry events.

Adding an adjusted count solves a real problem for us: At the moment we depend on the return value from ShouldSample including attributes that are added to the span. We have custom plugins that add a sample rate to this list of attributes, which is then read at ingest and used to make estimates about the population. If no adjusted count is included, we assume a value of 1.

This works but requires customers to use custom sampling plugins that provide no real value above the ability to communicate the adjusted count (the plugins use a deterministic algorithm that is similar to the TraceIdRatioBased sampler). Having the adjusted count be part of the spec would allow us to get rid of the need for these plugins.

For tail sampling, we accept OTLP data to our sampling proxy and then convert the data to our HTTP API which includes an adjusted count. We'd prefer to be able to do end-to-end OTLP, which the spec change described in this OTEP would allow us to do.

So 👍 from me. Thanks again for writing this up.

jmacd · 2021-08-09T22:52:50Z

@open-telemetry/specs-trace-approvers Please take a look.

I've completed a prototype of this proposal in https://github.com/open-telemetry/opentelemetry-go/compare/main...jmacd:jmacd/propagate?expand=1. The text in this PR matches the prototype.

text/trace/0170-sampling-probability.md

jmacd · 2021-08-10T07:33:49Z

There is an error condition implied by #168 that would occur when the random variable R is not received in the trace state, here.

This illustrates the benefits of deriving R from the bits of the TraceID. If we depend on tracestate to make a consistent decision and do not receive that field, the TraceIDRatio Sampler will not know how to make its sampling decision. We could fall back to whatever behavior was being implemented by each SDK before this specification, which might be labeled ProbabilitySampler. This would lead to a situation where non-root spans are sampled with known adjusted count, but without having been selected using a consistent decision.

As specified, @oertl, the sampler.adjusted_count attribute can be correctly unbiased without requiring the sampling decision to be made in a consistent way. If an arbitrary Sampler records adjusted counts in this way, how much do you care to know that another sampler was used? None of the built-in samplers would do this, except for a hypothetical TraceIDRatio Sampler when the random variable R is not received in the trace state. We can avoid any built-in Samplers doing this by adding requirements to the TraceID, probably via W3C traceparent.

yurishkuro · 2021-08-12T15:12:16Z

Suggestion: we may want to consider adding top-level Span fields for count and sampler name, given how important these concepts are and the space savings we'd achieve by recording values only.

jmacd · 2021-08-21T06:08:30Z

I revised this OTEP based on the Thursday Sampling SIG discussion, in which it was proposed that we introduce just a single integer field to encode head sampling probability, ignoring (for the time being) the topic of tail sampling. The proposed field would take 65 possible values, as follows:

Value	Head Adjusted Count
0	Unknown
1	1
2	2
3	4
4	8
5	16
6	32
...	...
X	2^(X-1)
...	...
63	2^62
64	0

and the SpanData would be updated with a new field in v1/trace.proto.

  // Log-head-adjusted count is the logarithm of adjusted count for
  // this span as calculated at the head, offset by +1, with the
  // following recognized values.
  //
  // 0: The zero value represents an UNKNOWN adjusted count.
  //    Consumers of these Spans cannot cannot compute span metrics.
  //
  // 1: An adjusted count of 1.
  // 
  // 2-63: Values 2 through 63 represent an adjusted count of 2^(Value-1)
  //
  // 64: Value 64 represents an adjusted count of zero.
  //
  // Values greater than 64 are unrecognized.
  uint32 log_head_adjusted_count = <next_tag>;

oertl · 2021-08-21T08:48:59Z

@jmacd, maybe it is better to sacrify the sampling rate 1/2^62 (which will never be used in practice I guess) in order to fit the state into 6 bits?

jmacd · 2021-08-23T21:35:07Z

@jmacd, maybe it is better to sacrifice the sampling rate 1/2^62 (which will never be used in practice I guess) in order to fit the state into 6 bits?

I agree, and have applied this change in both OTEPs.

jmacd · 2021-08-25T17:59:44Z

@yurishkuro I responded to your commend above. This PR is reduced to proposing to a single uint32 field in the protocol, defined as:

  // Log-head-adjusted count is the logarithm of adjusted count for
  // this span as calculated at the head, offset by +1, with the
  // following recognized values.
  //
  // 0: The zero value represents an UNKNOWN adjusted count.
  //    Consumers of these Spans cannot cannot compute span metrics.
  //
  // 1: An adjusted count of 1.
  // 
  // 2-62: Values 2 through 62 represent an adjusted count of 2^(Value-1)
  //
  // 63: Value 63 represents an adjusted count of zero.
  //
  // Values greater than 64 are unrecognized.
  uint32 log_head_adjusted_count = <next_tag>;

This matches OTEP #168, which is also ready for another look, please.

yurishkuro

overall lgtm (minor nit on v=63), but I think this OTEP is not actionable in the current form, it needs to propose changes to the Sampler SPI to return inclusion probability (or something) to the Tracer.

text/trace/0170-sampling-probability.md

oertl · 2021-08-26T13:56:43Z

@jmacd, @yurishkuro I have implemented an abstract Sampler base class prototype in Java, see here. Any derived implementation will meet the requirements for consistent sampling. It takes care of generating the geometric random value and propagates it together with the exponent of the sampling probability. There are currently two prototype implementations: ConsistentParentRateSampler and ConsistentFixedRateSampler.

jmacd · 2021-08-27T21:07:40Z

To make this more actionable, I've added a section describing a new SamplingResult which is same as the new Span field proposed here. 4e6c69a

This is also the same as @oertl's prototype:

https://github.com/dynatrace-research/opentelemetry-sampling-poc/blob/087741b7bf4a33353ba7d4dccac4c99d82b10de7/poc/src/main/java/com/dynatrace/research/otelsampling/sampling/AbstractConsistentSampler.java#L166

tedsuo

This OTEP looks GTM, any small changes could be worked out during the spec process.

text/trace/0170-sampling-probability.md

jmacd · 2021-09-09T22:51:44Z

The only remaining question here, possibly, is whether the Span field name should be log_head_adjusted_count or something similar. I propose that we merge this OTEP. 😀

…ry#170) * draft from OTEP 148 * renumber * typo in header * typos * formatting * clean TOC * TOC edit * Clarify the counting algorithm * typos * grammar * grammar * two paragraphs * Summarize from the prototype * Remove exported count from proposed spec language * statement about not dropping sampler attributes * from Thursday's SIG, limit proposal to head sampling probability * log_head_adjusted_count * Use 6 bits * update the proto text * add detail on SamplerResult * remove metrics examples, add to span-to-metrics examples * whitespace * lint

…ry/oteps#170) * draft from OTEP 148 * renumber * typo in header * typos * formatting * clean TOC * TOC edit * Clarify the counting algorithm * typos * grammar * grammar * two paragraphs * Summarize from the prototype * Remove exported count from proposed spec language * statement about not dropping sampler attributes * from Thursday's SIG, limit proposal to head sampling probability * log_head_adjusted_count * Use 6 bits * update the proto text * add detail on SamplerResult * remove metrics examples, add to span-to-metrics examples * whitespace * lint

draft from OTEP 148

fcd0d6f

jmacd requested review from a team July 27, 2021 19:35

renumber

f2a7efc

jmacd mentioned this pull request Jul 27, 2021

Probability sampling basics for telemetry events #148

Closed

Joshua MacDonald added 5 commits July 27, 2021 12:38

typo in header

fefb309

typos

df6b3ee

formatting

b1c2f83

clean TOC

159f9cf

TOC edit

ee6252a

jmacd mentioned this pull request Jul 27, 2021

Specify how to propagate consistent head sampling probability #168

Merged

yurishkuro reviewed Jul 28, 2021

View reviewed changes

Joshua MacDonald added 5 commits July 28, 2021 11:46

Clarify the counting algorithm

f76d11e

typos

b847d46

grammar

398649c

grammar

00e64ef

two paragraphs

ab95faa

paulosman approved these changes Aug 5, 2021

View reviewed changes

Summarize from the prototype

d38c719

yurishkuro reviewed Aug 10, 2021

View reviewed changes

text/trace/0170-sampling-probability.md Outdated Show resolved Hide resolved

Remove exported count from proposed spec language

4ab3df6

jmacd mentioned this pull request Aug 10, 2021

Prototype of OTEPs 168 and 170 Probability Sampling open-telemetry/opentelemetry-go#2177

Closed

jmacd mentioned this pull request Aug 10, 2021

Leaky-bucket rate limiting sampler open-telemetry/opentelemetry-specification#1769

Open

statement about not dropping sampler attributes

43c661f

carlosalberto approved these changes Aug 10, 2021

View reviewed changes

jmacd changed the title ~~Probability sampling: Sampler Name and Adjusted Count attributes~~ Probability sampling: Encode Span's head-adjusted count Aug 21, 2021

Use 6 bits

fb8563c

update the proto text

02b06d0

yurishkuro reviewed Aug 26, 2021

View reviewed changes

text/trace/0170-sampling-probability.md Show resolved Hide resolved

text/trace/0170-sampling-probability.md Show resolved Hide resolved

text/trace/0170-sampling-probability.md Show resolved Hide resolved

add detail on SamplerResult

4e6c69a

jmacd mentioned this pull request Aug 30, 2021

Probability sampling specification open-telemetry/opentelemetry-specification#1899

Closed

tedsuo approved these changes Aug 31, 2021

View reviewed changes

yurishkuro approved these changes Aug 31, 2021

View reviewed changes

oertl reviewed Sep 1, 2021

View reviewed changes

text/trace/0170-sampling-probability.md Show resolved Hide resolved

jsuereth reviewed Sep 7, 2021

View reviewed changes

text/trace/0170-sampling-probability.md Outdated Show resolved Hide resolved

jsuereth reviewed Sep 7, 2021

View reviewed changes

text/trace/0170-sampling-probability.md Outdated Show resolved Hide resolved

jsuereth approved these changes Sep 7, 2021

View reviewed changes

reyang approved these changes Sep 8, 2021

View reviewed changes

remove metrics examples, add to span-to-metrics examples

f6259eb

jmacd assigned jsuereth Sep 9, 2021

Joshua MacDonald added 2 commits September 9, 2021 15:48

whitespace

ec3b41d

lint

1d10203

jsuereth merged commit eee3cb8 into open-telemetry:main Sep 9, 2021

oertl mentioned this pull request Jun 14, 2022

REQUEST: New membership for @oertl open-telemetry/community#1078

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probability sampling: Encode Span's head-adjusted count #170

Probability sampling: Encode Span's head-adjusted count #170

jmacd commented Jul 27, 2021

jmacd commented Jul 27, 2021

yurishkuro left a comment

paulosman commented Aug 5, 2021

jmacd commented Aug 9, 2021

jmacd commented Aug 10, 2021

yurishkuro commented Aug 12, 2021

jmacd commented Aug 21, 2021

oertl commented Aug 21, 2021

jmacd commented Aug 23, 2021

jmacd commented Aug 25, 2021

yurishkuro left a comment

oertl commented Aug 26, 2021

jmacd commented Aug 27, 2021

tedsuo left a comment

jmacd commented Sep 9, 2021

Probability sampling: Encode Span's head-adjusted count #170

Probability sampling: Encode Span's head-adjusted count #170

Conversation

jmacd commented Jul 27, 2021

jmacd commented Jul 27, 2021

yurishkuro left a comment

Choose a reason for hiding this comment

paulosman commented Aug 5, 2021

jmacd commented Aug 9, 2021

jmacd commented Aug 10, 2021

yurishkuro commented Aug 12, 2021

jmacd commented Aug 21, 2021

oertl commented Aug 21, 2021

jmacd commented Aug 23, 2021

jmacd commented Aug 25, 2021

yurishkuro left a comment

Choose a reason for hiding this comment

oertl commented Aug 26, 2021

jmacd commented Aug 27, 2021

tedsuo left a comment

Choose a reason for hiding this comment

jmacd commented Sep 9, 2021