Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Commit

Permalink
draft updates based on recent SIG discussions
Browse files Browse the repository at this point in the history
  • Loading branch information
Joshua MacDonald committed May 4, 2023
1 parent c3f1ed2 commit 03f693c
Showing 1 changed file with 172 additions and 4 deletions.
176 changes: 172 additions & 4 deletions text/trace/0226-sampling-random-traceids.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## Motivation

**Status*: CURRENT

The existing, experimental [specification for probability sampling using TraceState](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md)
supporting Span-to-Metrics pipelines is limited to powers-of-two
probabilities and is designed to work without making assumptions about
Expand All @@ -19,14 +21,16 @@ for probability sampler decisions.

## Explanation

**Status*: CURRENT

The existing, experimental TraceState probability sampling
specification relies on two variables known as **r-value** and
**p-value**. The r-value carries the source of randomness and the
p-value carries the effective sampling probability.

Given this specification, a ConsistentProbabilitySampler can be
applied as a head sampler for non-power-of-two sampling probabilities
using interpolation. For example, a neffective sampling probability
using interpolation. For example, an effective sampling probability
of 1-in-3 can be achieved by alternating between 25% and 50% sampling.
However, interpolation only works for trace roots, otherwise
"consistent" sampling can only be achieved at the next smaller power
Expand All @@ -43,7 +47,7 @@ This proposal avoids r-value by using 7 bytes of intrinsic randomness
in the TraceID, the ones (draft-) specified [in the W3C tracecontext
`random` flag](https://w3c.github.io/trace-context/#random-trace-id-flag).
Since this Sampler is expected to behave consistently with or without
the `random` flag, we assumes the bits are random and do not actually
the `random` flag, we assume the bits are random and do not actually
check the W3C random flag.

This document propose extending the existing p-value, r-value
Expand All @@ -56,9 +60,161 @@ the TraceID.
As proposed, t-value and p-value are mutually exclusive; p-value
remains the preferred encoding for probability sampling when a
power-of-two sampling probability is used. P-value also remains the
specified way to encode zero adjusted count (i.e., p=63).
specified way to encode zero adjusted count (i.e., p=63). T-value MAY
be used to encode power-of-two probabilities, although typically the
equivalent p-value uses fewer bytes.

### T-Value encoding: Requirements

**Status*: NEW-DRAFT

#### Exactness

This proposal is required to be support precision Span-to-Metrics
pipelines. This means that effective sampling probabilities are
limited to discrete values that can be exactly represented. The number
of discrete steps between powers of two is limited by the number of
remaining bits of randomness in the TraceID.

To achieve exactly 1-in-2^56 sampling, a sampler can select all traces
with 56 `0`s of TraceID randomness. It is not possible to achieve a
smaller sampling probability than 1-in-2^56.

The next larger, exactly representable sampling probability is
1-in-2^55. At this probability, a sampler can select all traces with
55 leading `0`s of TraceID randomness (i.e., 55 `0`s followed by a `1`
and 55 `0`s followed by a `0`). There are no exact probabilities
representable between 1-in-2^55 and 1-in-2^56.

The next larger, exactly representable power-of-two sampling
probability is 1-in-2^54. At this probability, a sampler can select
all traces with 54 leading `0`s of TraceID randomness. At 4 out of
2^56, this sampling probability includes the two TraceID-randomess
values selected at smaller powers-of-two (i.e., 1-in-2^55 and
1-in-2^56) plus two new TraceID-randomness values. One of the two new
TraceID-randomness values corresponds with exactly 1-in-2^54 sampling,
the other of these is the smallest exactly-representable
non-power-of-two sampling probability according to this scheme. It
lies halfway between 1-in-2^54 and 1-in-2^55; in binary floating point
representation, this value is displayed as `0x1.8p-55`.

Continuing this pattern, the next larger power-of-two sampling
probability is 1-in-2^53, which is 8 out of 2^56, 4 of which were
covered above and 4 of which are new. Of the four new, 1 is the exact
power-of-two and there are three available non-power-of-two
probabilities in this range. These probabilities are (exactly)
`0x1.Cp-54`, `0x1.8p-54`, and `0x1.4p-54`.

In the pattern developed here, the number of sampling probabilities in
the open interval `(2^-N, 2^-(N+1))` equals `(2^(56-N))-1`.

Note we are disregarding the fact that a TraceID with all zeros (i.e.,
128 `0` bits) is specified invalid by OpenTelemetry, which makes the
all-zeros TraceID-randomness value slightly less probable than other
values.

#### Correspondence with R-value

**Status*: NEW-DRAFT

There are reasons to maintain compatibility with r-values in the range
[0, 56] as developed in the earlier specification, particularly
because it enables intentionally-consistent sampling across multiple
traces. We require that when r-value is used, r-value takes
precendece over builtin TraceID-randomness.

In this specification, the use of r-values greater than 56 is deprecated.

We require the correspondence with non-power-of-two sampling
probabilities exact to be exact. This can be achieved as follows by
calculating an *effective TraceID-randomness value* from the r-value
combined with the original randomness.

When r-value is set to the value `x` (where `x < 56`), the effective
TraceID-randomness value used is calculated as `x` leading `0`s,
followed by a `1`, followed by the original `56-x-1` trailing bits of
TraceID-randomness.

R-value propgation rules are unmodified. R-value consistency-checking
rules will be updated to detect inconsistent t-values, similar to the
current specification's rules for detecting inconsistent p-values..

#### Sampling decision logic

**Status*: NEW-DRAFT

An implementation of a head or tail sampler is expected to perform a
simple comparison between the 56 bits of TraceID-randomness value and
a threshold value. The encoded t-value will correspond with one of
the exactly representable values of TraceID-randomness, such that a
simple less-than-or-equal comparison achieves exactly the correct
sampling probability.

#### Consistency between head and tail sampling

**Status*: NEW-DRAFT

The correspondence with r-value is meant to ensure that head samplers
and tail samplers will make a consistent decision at non-power-of-two
sampling probabilities. Whereas the existing specification states
that head samplers should use random interpolation between
powers-of-two, the updated consistent sampling specification will use
the deterministic algorithm for head and tail developed above.

#### Deterministic mapping to integer adjusted counts

### T-Value encoding
**Status*: NEW-DRAFT

One requirement remains to be developed. A nice-to-have feature
developed in the earlier specification is that when interpolating
between power-of-two sampling probabilities, the final p-value would
nevertheless be output with one of the nearby power-of-two adjusted
counts.

Using the smallest representable non-power-of-two sampling probability
`0x1.8p-55` as an example--this value lies exactly half-way between
two powers-of-two so we require a deterministic, unbiased way to
select `0x1p-54` 1-out-of-3 times and `0x1p-55` 2-out-of-3 times.

Can we use the SpanID bits to make this selection consistently at the
consumer for each Span? This would allow an exactly-encoded
non-power-of-two `t-value` to nevertheless be mapped into integer
(power-of-two) adjusted counts.

TODO: This is an ongoing investigation.

#### Summary of sampling algorithm

**Status*: NEW-DRAFT

The steps to perform a sampling decision are the same for both head
and tail samplers.

First, select an exactly representable sampling probability. If the
input is an arbitrary floating point value, it will have to be rounded
to a nearby exact probablity. Then, the probability is converted in
two ways:

1. The t-value is calculated that encodes the exact effective samping
probability.
2. The 56-bit threshold for comparing against TraceID-randomness is
calculated as described above.

For each span, the sampler extracts 56 bits of presumed randomness
from the TraceID, the so-called TraceID-randomness value.

When r-value is set to `x` in the span's context, the sampler modifies
the leading `x+1` bits of TraceID-randomness value with `x` `0`s and
followed by a `1`.

A simple comparison is made between the threshold and the effective
TraceID-randomness value. If the effective TraceID-randomess value is
less than or equal to the threshold, the span is selected with the
calculated t-value. Otherwise, the span is not selected.

### T-Value encoding: Original draft

**Status*: OUT-OF-DATE

Since we have 7 bytes, or 56 bits of randomness available, there are
2^56 non-zero sampling probabilities that can be encoded. These
Expand Down Expand Up @@ -88,6 +244,8 @@ pipelines.

### Converting between Thresholds and Probabilities

**Status*: OUT-OF-DATE

Sampling probabilities in the range (0, 1] can be mapped onto 56-bit
encoded t-values in the range [0, 0xffffffffffffff]. For a given
sampling threshold, the corresponding probability is expressed as a
Expand All @@ -108,6 +266,8 @@ pipeline.

#### Probability to Hex Threshold

**Status*: OUT-OF-DATE

Note that the procedure here only works for probabilities greater than
or equal to 2^-52.

Expand All @@ -130,6 +290,8 @@ be exactly represented in 56 bits.

#### Hex Threshold to Probability

**Status*: OUT-OF-DATE

To convert a hex threshold string to the corresponding probability, we
perform that opposite of the above.

Expand All @@ -147,6 +309,8 @@ that is all the precision a double-wide floating point number has.

## Examples

**Status*: OUT-OF-DATE

### 90% sampling

The following header
Expand Down Expand Up @@ -175,11 +339,15 @@ corresponds with 0.33333% sampling.

## Trade-offs and mitigations

**Status*: OUT-OF-DATE

Note that the t-value encoding is not efficient for encoding
power-of-two probabilities (e.g., "ffffffffffffff" corresponds with
100% sampling). That is why the use of p-value is recommended when
the configured sampling probability is an exact power-of-two.

## Prior art and alternatives

**Status*: OUT-OF-DATE

An earlier draft of proposal was explored [here](https://github.com/jmacd/opentelemetry-collector-contrib/pull/2925).

0 comments on commit 03f693c

Please sign in to comment.