Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability Samplers based on W3C Trace Context Level 2 #3910

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions specification/context/api-propagators.md
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,24 @@ Additional `Propagator`s implementing vendor-specific protocols such as AWS
X-Ray trace header protocol MUST NOT be maintained or distributed as part of
the Core OpenTelemetry repositories.

### W3C Trace Context Requirements

A W3C Trace Context propagator is expected to implement the
`traceparent` and `tracestate` contexts fields specified in [W3C Trace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`traceparent` and `tracestate` contexts fields specified in [W3C Trace
`traceparent` and `tracestate` context fields specified in [W3C Trace

Context Level 2](https://www.w3.org/TR/trace-context-2/).

When injecting and extracting trace context to or from a carrier, the
following fields are propagated.

- TraceID (16 bytes)
- SpanID (8 bytes)
- TraceFlags (8 bits)
- TraceState (string)

Propagators MUST NOT assume that bits 2-7 (6 most significant bits)
will be zero, as they are reserved for future use and are expected to
propagate with the context.

### B3 Requirements

B3 has both single and multi-header encodings. It also has semantics that do not
Expand Down
5 changes: 4 additions & 1 deletion specification/trace/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,10 @@ byte.

`TraceFlags` contain details about the trace. Unlike TraceState values,
TraceFlags are present in all traces. The current version of the specification
only supports a single flag called [sampled](https://www.w3.org/TR/trace-context/#sampled-flag).
supports two flags:

- [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag)
- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)
Comment on lines +226 to +227
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag)
- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)
- [Sampled (value 0x1)](https://www.w3.org/TR/trace-context/#sampled-flag)
- [Random (value 0x2)](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)

I think it would help to know which bits we're talking about.


`TraceState` carries vendor-specific trace identification data, represented as a list
of key-value pairs. TraceState allows multiple tracing
Expand Down
107 changes: 87 additions & 20 deletions specification/trace/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,27 @@ When asked to create a Span, the SDK MUST act as if doing the following in order
`Span` is created without an SDK installed or as described in
[wrapping a SpanContext in a Span](api.md#wrapping-a-spancontext-in-a-span).

#### Span flags

The OTLP representation for Span and Span Link include a 32-bit field
declared as Span Flags.

Bits 0-7 of the Span Flags field are reserved for the 8 bits of Trace
Context flags, specified in [W3C Trace
Context](https://www.w3.org/TR/trace-context-2/). [See the list of
recognized flags](./api.md#spancontext).

Bits 8 and 9 are defined to report the Remote property associated with
the SpanContext `IsRemote` property. SDKs should report this
information as follows:

- IsRemote = `true`: Bits 8 and 9 are set in the flags (i.e., `0x300`).
- IsRemote = `false`: Bits 8 is set in the flags (i.e., `0x100`).

For example, if the Span's incoming context has flags 0x3 (indicating
`Sampled` and `Random`) and the parent SpanContext `IsRemote`, the
resulting Span Flags will equal `0x303`.

### Sampler

`Sampler` interface allows users to create custom samplers which will return a
Expand Down Expand Up @@ -312,21 +333,39 @@ The default sampler is `ParentBased(root=AlwaysOn)`.

#### TraceIdRatioBased

* The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`. To respect the

parent `SampledFlag`, the `TraceIdRatioBased` should be used as a delegate of
the `ParentBased` sampler specified below.
* Description MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
with `RATIO` replaced with the Sampler instance's trace sampling ratio
represented as a decimal number. The precision of the number SHOULD follow
implementation language standards and SHOULD be high enough to identify when
Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
had a sampling ratio of 1 to every 10,000 spans it COULD return
`"TraceIdRatioBased{0.000100}"` as its description.

TODO: Add details about how the `TraceIdRatioBased` is implemented as a function
of the `TraceID`. [#1413](https://github.com/open-telemetry/opentelemetry-specification/issues/1413)
##### `TraceIdRatioBased` sampler implementation overview

SDKs SHOULD conform with the [consistent-probability
`TraceIdRatioBased` Sampler
requirements](./tracestate-probability-sampling.md).

The implementation has the following steps:

* Sampling probabilities are restricted to the range `2**-56` through 1.
* A rejection threshold is calculated, expressing as an integer how
many out of 2**56 trace IDs should be selected.
* The threshold is encoded as a "T-value", expressing the threshold
using up to 4 recommended digits of precision.
* Sampler decisions are made by comparing the Trace ID randomness
against the rejection threshold.
* When Sampled, T-value is included in the [OpenTelemetry TraceState
Copy link

@kalyanaj kalyanaj Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to rephrase this to an active voice of what implementations must do. For example, something like: "When sampled, an implementation MUST update the tracestate header to specify the T-value. <+ any additional details>"

header](./tracestate-handling.md), identified by sub-key `th`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the tracestate-handling.md to include the examples for the sub-key 'th'? Right now, it seems to be referring to the earlier p values and r value examples.

indicating the sampling probability of the associated Context. See
[Randomness requirements](#randomness-requirements), below.

When this implementation is used, the Sampler description SHOULD
return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"`
with `RATIO` the configured probability and `TVALUE` replaced by the
Comment on lines +361 to +362
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"`
with `RATIO` the configured probability and `TVALUE` replaced by the
return a string of the form `"TraceIdRatioBased{RATIO;th:THRESHOLD}"`
with `RATIO` the configured probability and `THRESHOLD` replaced by the

encoded T-Value, as the Sampler Description.

##### Requirements for `TraceIdRatioBased` sampler algorithm
##### Former requirements for `TraceIdRatioBased` sampler algorithm

SDKs MAY use the former requirements of the `TraceIdRatioBased`
Sampler before transitioning to the modern requirements stated above.

* The sampling algorithm MUST be deterministic. A trace identified by a given
`TraceId` is sampled or not independent of language, time, etc. To achieve this,
Expand All @@ -338,18 +377,22 @@ of the `TraceID`. [#1413](https://github.com/open-telemetry/opentelemetry-specif
sample. This is important when a backend system may want to run with a higher
sampling rate than the frontend system, this way all frontend traces will
still be sampled and extra traces will be sampled on the backend only.
* **WARNING:** Since the exact algorithm is not specified yet (see TODO above),
there will probably be changes to it in any language SDK once it is, which
would break code that relies on the algorithm results.
Only the configuration and creation APIs can be considered stable.
It is recommended to use this sampler algorithm only for root spans
(in combination with [`ParentBased`](#parentbased)) because different language
SDKs or even different versions of the same language SDKs may produce inconsistent
results for the same input.

When this implementation is used, Description MAY return a string of
the form `"TraceIdRatioBased{RATIO}"` with `RATIO` replaced with the
Sampler instance's trace sampling ratio represented as a decimal
number. The precision of the number SHOULD follow implementation
language standards and SHOULD be high enough to identify when Samplers
have different ratios. For example, if a TraceIdRatioBased Sampler had
a sampling ratio of 1 to every 10,000 spans it could return
`"TraceIdRatioBased{0.000100}"` as its description.

#### ParentBased

* This is a sampler decorator. `ParentBased` helps distinguish between the
SDKs SHOULD conform with the [consistent-probability `ParentBased`
Sampler requirements](./tracestate-probability-sampling.md).

This is a sampler decorator. `ParentBased` helps distinguish between the
following cases:
* No parent (root span).
* Remote parent (`SpanContext.IsRemote() == true`) with `SampledFlag` set
Expand Down Expand Up @@ -460,6 +503,30 @@ Additional `IdGenerator` implementing vendor-specific protocols such as AWS
X-Ray trace id generator MUST NOT be maintained or distributed as part of the
Core OpenTelemetry repositories.

### Randomness requirements

The SDK SHOULD implement the TraceID randomness requirements specified
in the W3C [Trace Context Level
2](https://www.w3.org/TR/trace-context-2/#randomness-of-trace-id)
Candidate Recommendation.

This states that the SDK should fill least significant 7 bytes (i.e., 56
bits) of the TraceID are genuinely random or pseudo-random., so they
can be used for probability sampling.
Comment on lines +513 to +515
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This states that the SDK should fill least significant 7 bytes (i.e., 56
bits) of the TraceID are genuinely random or pseudo-random., so they
can be used for probability sampling.
This states that the SDK should fill the least significant 7 bytes (i.e., 56
bits) of the TraceID with bits that are genuinely random or pseudo-random,
so that they can be used for probability sampling.


#### Randomness trace context flag

The Trace Context `Random` flag, having value `0x2`, SHOULD be set in
the W3C Trace Context that is propagated when the SDK originates a new
TraceID that meets the Randomness requirement.

#### Randomness requirements for IdGenerators

If the SDK uses an `IdGenerator` extension point, the SDK SHOULD
enable it to declare whether it meets the Randomness requirement, in
which case the `Random` flag SHOULD be set in the W3C Trace Context
that is propagated when the SDK originates a new TraceID.

## Span processor

Span processor is an interface which allows hooks for span start and end method
Expand Down
Loading
Loading