-
Notifications
You must be signed in to change notification settings - Fork 164
Non-power-of-two consistent tail probability sampling in TraceState #226
Changes from 5 commits
4d3b94b
c3f1ed2
03f693c
df6b1d0
3c507de
a276ea1
14ad23c
4380c6b
8940b66
9a5e9ce
cfa1b44
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,18 +2,37 @@ | |
|
||
## Motivation | ||
|
||
The existing, experimental [specification for probability sampling using TraceState](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md) | ||
The existing, experimental [specification for probability sampling | ||
using | ||
TraceState](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md) | ||
supporting Span-to-Metrics pipelines is limited to powers-of-two | ||
probabilities and is designed to work without making assumptions about | ||
TraceID randomness. | ||
probabilities and is designed to work without making assumptions about | ||
TraceID randomness. The existing mechanism could only achieve | ||
non-power-of-two sampling using interpolation between powers of two, | ||
which was only possible at the head sampling time. It could not be | ||
used with non-power-of-two sampling probabilities for span sampling in | ||
the rest of the collection path. This proposal aims to address the | ||
above two limitations for a couple of reasons: | ||
|
||
1. Certain customers want support for non-powers-of-two probabilities | ||
(e.g., 10% sampling rate or 75% sampling rate) and it should be | ||
possible to do it cleanly irrespective of where the sampling is | ||
happening. | ||
2. There is a need for consistent sampling in the collection path | ||
(outside of the head-sampling paths) and using the inherent | ||
randomness in the traceID is a less-expensive solution than | ||
referencing a custom "r-value" from the tracestate in every span. | ||
|
||
In this proposal, we will cover how this new mechanism can be used in | ||
both head-based sampling and different forms of tail-based sampling. | ||
|
||
The term "Tail sampling" is in common use to describe _various_ forms | ||
of sampling that take place after a span starts. The term "Tail" in | ||
this phrase distinguishes other techniques from head sampling, however | ||
the term is only broadly descriptive. | ||
|
||
Head sampling requires the use of TraceState to propagate context | ||
about sampling decisions parent spans to child spans. With sampling | ||
about sampling decisions from parent spans to child spans. With sampling | ||
information included in the TraceState, spans can be labeled with their | ||
effective adjusted count, making it possible to count spans as they | ||
arrive at their destination in real time, meaning before assembling | ||
|
@@ -37,10 +56,11 @@ This proposal makes use of the [draft-standard W3C tracecontext | |
`random` | ||
flag](https://w3c.github.io/trace-context/#random-trace-id-flag), | ||
which is an indicator that 56 bits of true randomness are available | ||
for probability sampler decisions. As an added benefit, we find that | ||
this proposal _also works for Head sampling_, and that when 56 bits of | ||
definite randomness are available in the TraceID we can use simpler | ||
sampling logic compared with the p-value, r-value approach. | ||
for probability sampler decisions. The benefit of this is that this | ||
inherently random value can be used by intermediate span samplers to | ||
make _consistent_ sampling decisions. It would be a less-expensive | ||
solution than the earlier proposal of looking up the r-value from the | ||
tracestate of each span. | ||
|
||
This proposes to create a specification with support for 56-bit | ||
precision consistent Head and Intermediate Span sampling. Because | ||
|
@@ -55,25 +75,63 @@ with equivalent use and interpretation as the (W3C trace-context) | |
TraceState field. It would be appropriate to name this field | ||
`LogState`. | ||
|
||
This proposal does makes r-value an optional 56-bit number as opposed | ||
to a required 6-bit number. When the r-value is supplied, it acts as | ||
an alternative source of randomness which allows tail-samplers to | ||
support versions of tracecontext without the `random` bit as well as | ||
more advanced use-cases. For example, independent traces can be | ||
consistently sampled by starting them with identical r-values. | ||
|
||
This proposal deprecates the experimental p-value. For existing | ||
stored data, the specification may recommend replacing `p:X` with an | ||
equivalent t-value; for example, `p:2` can be replaced by `t:4` and | ||
`p:20` can be replaced by `t:0x1p-20`. | ||
|
||
## Explanation | ||
|
||
This document recommends deprecating the experimental p-value, r-value | ||
specification. | ||
This document proposes a new OpenTelemetry specific tracestate value | ||
called t-value. This t-value encodes either the sampling probability | ||
(a floating point value) directly or the "adjusted count" of a span | ||
(an integer). The letter "t" here is a shorthand for "threshold". The | ||
value encoded here can be mapped to a threshold value that a sampler | ||
can compare to a value formed using the rightmost 7 bytes of the | ||
traceID. | ||
|
||
The syntax of the r-value changes in this proposal, as it contains 56 | ||
bits of information. The recommended syntax is to use 14 hexadecimal | ||
characters (e.g., `r:1a2b3c4d5e6f78`). The specification will | ||
recommend samplers drop invalid r-values, so that existing | ||
implementations of r-value are not mistakenly sampled. | ||
|
||
Like the existing specification, r-values will be synthesized as | ||
necessary. However, the specification will recommend that r-values | ||
not be synthesized automatically when the W3C tracecontext `random` | ||
flag is set. To achieve the advanced use-case involving multiple | ||
traces with the same r-value, users should set the `r-value` in the | ||
tracestate before starting correlated trace root spans. | ||
|
||
### Detailed design | ||
|
||
Let's look at the details of how this threshold can be calculated. | ||
This proposal defines the sampling "threshold" as a 7-byte string used | ||
to make consistent sampling decisions, as follows. | ||
|
||
1. Bytes 9-16 of the TraceID are interpreted as a 56-bit random | ||
value in big-endian byte order. | ||
2. The sampling probability (range `[0x1p-56, 1]`) is multipled by | ||
1. When the r-value is present and parses as a 56-bit random value, | ||
use it, otherwise bytes 10-16 of the TraceID are interpreted as a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is worth specifying whether this counts from 0 or 1, or, even better, including an annotated traceID here, just for clarity. |
||
56-bit random value in big-endian byte order | ||
2. The sampling probability (range `[0x1p-56, 1]`) is multiplied by | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think most readers will be unfamiliar with floating point hex notation (I was) and this is probably needlessly terse. One way to express it would be Similarly below, I might say 2^56 rather than using the hex notation. |
||
`0x1p+56`, yielding a unsigned Threshold value in the range `[1, | ||
0x1p+56]`. | ||
3. If the unsigned TraceID random value (range `[0, 0x1p+56)`) is | ||
less-than the sampling Threshold, the span is sampled, otherwise it | ||
is discarded. | ||
|
||
|
||
For head samplers, there is an opportunity to synthesize a new r-value | ||
when the tracecontext does not set the `random` bit (as the existing | ||
specification recommends synthesizing r-values for head samplers | ||
whenever there is none). However, this opportunity is not available | ||
to tail samplers. | ||
|
||
To calculate the Sampling threshold, we began with an IEEE-754 | ||
standard double-precision floating point number. With 52-bits of | ||
significand and a floating exponent, the probability value used to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Double-precision floating-point values have a 52-bit mantissa but are able to represent 53-bit significands (except for subnormal values). See https://cs.stackexchange.com/a/152267/102560. |
||
|
@@ -95,7 +153,7 @@ to machine precision) the adjusted count of each span. For example, | |
given a sampling probability encoded as "0.1", we first compute the | ||
nearest base-2 floating point, which is exactly 0x1.999999999999ap-04, | ||
which is approximately 0.10000000000000000555. The exact quantity in | ||
this example, 0x1.999999999999ap-04, is multipled by `0x1p+56` and | ||
this example, 0x1.999999999999ap-04, is multiplied by `0x1p+56` and | ||
rounded to an unsigned integer (7205759403792794). This specification | ||
says that to carry out sampling probability "0.1", we should keep | ||
Traces whose least-significant 56 bits form an unsigned value less | ||
|
@@ -166,26 +224,49 @@ threshold and compared against the new threshold. These are two cases: | |
Sampler's threshold, the span passes through with the current | ||
sampler's t-value, otherwise the span is discarded. | ||
|
||
## S-value encoding for non-consistent adjusted counts | ||
|
||
There are cases where sampling does not need to be consistent or is | ||
intentionally not consistent. Existing samplers often apply a simple | ||
probability test, for example. This specification recommends | ||
introducing a new tracestate member `s-value` for conveying the | ||
accumulation of adjusted count due to independent sampling stages. | ||
|
||
Unlike resampling with `t-value`, independent non-consistent samplers | ||
will multiply the effect of their sampling into `s-value`. | ||
|
||
## Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to add two more examples that shows how consistent probability sampling can be achieved across multiple participants. Example 1:
Example 2:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These examples sound good to me! Will do. |
||
|
||
### 90% Intermediate Span sampling | ||
### 90% consistent intermediate span sampling | ||
|
||
A span that has been sampled at 90% by an intermediate processor will | ||
have `ot=t:0.9` added to its TraceState field in the Span record. The | ||
sampling threshold is `0.9 * 0x1p+56`. | ||
|
||
### 90% Head sampling | ||
### 90% head consistent sampling | ||
|
||
A span that has been sampled at 90% by a head sampler will add | ||
`ot=t:0.9` to the TraceState context propagated to its children and | ||
record the same in its Span record. The sampling threshold is `0.9 * | ||
0x1p+56`. | ||
|
||
### 1-in-3 sampling | ||
### 1-in-3 consistent sampling | ||
|
||
The tracestate value `ot=t:3` corresponds with 1-in-3 sampling. The | ||
sampling threshold is `1/3 * 0x1p+56`. | ||
|
||
### 30% simple probability sampling | ||
|
||
The tracestate value `ot=s:0.3` corresponds with 30% sampling by one | ||
or more sampling stages. This would be the tracestate recorded by | ||
`probabilisticsampler` when using a `HashSeed` configuration instead | ||
of the consistent approach. | ||
|
||
### 10% probability sampling twice | ||
|
||
The tracestate value `ot=s:0.01` corresponds with 10% sampling by one | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe expand this to show how the tracestate would be modified at each stage? |
||
stage and then 10% sampling by a second stage. | ||
|
||
## Trade-offs and mitigations | ||
|
||
Support for encoding t-value as either a probability or an adjusted | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: This proposal makes...