-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probability Samplers based on W3C Trace Context Level 2 #3910
Conversation
@kentquirk @oertl @PeterF778 @kalyanaj Please take a look. |
@kentquirk @oertl @PeterF778 @kalyanaj |
`traceparent` and `tracestate` contexts fields specified in [W3C Trace | ||
Context Level 2](https://www.w3.org/TR/trace-context-2/). | ||
|
||
When injecting and extracting trace context to ro from a carrier, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When injecting and extracting trace context to ro from a carrier, the | |
When injecting and extracting trace context to or from a carrier, the |
@@ -312,21 +333,46 @@ The default sampler is `ParentBased(root=AlwaysOn)`. | |||
|
|||
#### TraceIdRatioBased | |||
|
|||
* The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the | |||
The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the | |
The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`. To respect the |
specification/trace/sdk.md
Outdated
* A rejection threshold is calculated, expressing as an integer how | ||
many out of 2**56 trace IDs should be sampled. | ||
* The threshold is encoded as a "T-value", expressing the threshold | ||
using up to 5 hexadecimal digits of precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using up to 5 hexadecimal digits of precision. | |
using up to 14 hexadecimal digits of precision. |
Samplers SHOULD unset `p` when the invariant between the `sampled`, | ||
`p`, and `r` values is violated before using the `tracestate` to make | ||
a sampling decision or interpret adjusted count. | ||
Samplers SHOULD unset T-value (by erasing the `tv` field) when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Samplers SHOULD unset T-value (by erasing the `tv` field) when the | |
Samplers SHOULD unset T-value (by erasing the `th` field) when the |
|
||
``` | ||
base10(r) = 3 | ||
base10(p) = 2 | ||
tracestate: ot=tv:c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracestate: ot=tv:c | |
tracestate: ot=th:c |
return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"` | ||
with `RATIO` the configured probability and `TVALUE` replaced by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"` | |
with `RATIO` the configured probability and `TVALUE` replaced by the | |
return a string of the form `"TraceIdRatioBased{RATIO;th:THRESHOLD}"` | |
with `RATIO` the configured probability and `THRESHOLD` replaced by the |
Here, R-value ce929d0e0e4736 indicates that a consistent probability | ||
sampler configured with probability greater than approximately 19.3%, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, R-value ce929d0e0e4736 indicates that a consistent probability | |
sampler configured with probability greater than approximately 19.3%, | |
The set sampled flag together with the R-value being ce929d0e0e4736 indicate that the parent used a consistent probability sampler configured with probability greater than approximately 19.3%, |
T-value is not set, consistent with an unsampled context. In this | ||
case, the parent context could have been sampled at less than 19.3% | ||
probability or used the `AlwaysOff` sampler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we know that the parent was not consistently sampled. If any other sampling mechanism is used, the probability could be any value. Mentioning 19.3% here, is therefore a little bit confusing.
To combine a consistent probability Sampler decision with a | ||
non-probability Sampler decision, T-value should be set according to | ||
the probability sampler(s). If no probability sampler decides to | ||
sample, no T-value should be set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To combine a consistent probability Sampler decision with a | |
non-probability Sampler decision, T-value should be set according to | |
the probability sampler(s). If no probability sampler decides to | |
sample, no T-value should be set. | |
If the sampling decision is the result of multiple samplers that are not all consistent samplers, no T-value should be set. |
The `TraceIdRatioBased` and `ParentBased` samplers can be used as | ||
delegates of another sampler, for conditioning the choice of sampler | ||
on span and other fixed attributes. However, for adjusted counts to | ||
be trustworthy, the choice of non-root sampler cannot be conditioned | ||
on the parent's sampled trace flag or the OpenTelemetry tracestate | ||
R-value or T-value, as these decisions would lead to incorrect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `TraceIdRatioBased` and `ParentBased` samplers can be used as | |
delegates of another sampler, for conditioning the choice of sampler | |
on span and other fixed attributes. However, for adjusted counts to | |
be trustworthy, the choice of non-root sampler cannot be conditioned | |
on the parent's sampled trace flag or the OpenTelemetry tracestate | |
R-value or T-value, as these decisions would lead to incorrect | |
The `TraceIdRatioBased` and `ParentBased` samplers can be used as | |
delegates of another sampler, for conditioning the choice of sampler | |
on span and other fixed attributes. However, for adjusted counts to | |
be trustworthy, the choice of non-root sampler cannot be conditioned | |
on the parent's sampled trace flag or the OpenTelemetry tracestate | |
R-value, as these decisions would lead to incorrect |
Choosing the T-value based on the parent T-value is fine. This is exactly what is done for parent-based sampling.
- TraceFlags (8 bits) | ||
- TraceState (string) | ||
|
||
Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, let's say "bits 2-7 of TraceFlags".
specification/trace/sdk.md
Outdated
|
||
* Ratio values are restricted to the range `2**-56` through 1 | ||
* A rejection threshold is calculated, expressing as an integer how | ||
many out of 2**56 trace IDs should be sampled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sampled" => "dropped"
and r-value are independent settings, each can be meaningfully set | ||
without the other present. | ||
The [Sampler API](sdk.md#sampler) is responsible for setting the | ||
`sampled` and `random` flags and the `tracestate`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic. A sampler should not change the random
flag at all. This flag must be set when the TraceID is created which is ahead of making the sampling decision for the root span.
bits. | ||
Head Samplers (i.e., trace SDKs) SHOULD by default generate the 56 | ||
random bits specified for TraceIDs by the W3C Trace Context Level 2, | ||
and set the Random trace flag, when generating new contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the SDK specification, the TraceID is already generated even when sampling ROOT spans. This needs more discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of this. I think you've done a good job here and based on what I see, I don't think we need to be rewriting it from scratch. Our past history leads me to believe that might be a long process. Let's push forward!
### W3C Trace Context Requirements | ||
|
||
A W3C Trace Context propagator is expected to implement the | ||
`traceparent` and `tracestate` contexts fields specified in [W3C Trace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`traceparent` and `tracestate` contexts fields specified in [W3C Trace | |
`traceparent` and `tracestate` context fields specified in [W3C Trace |
- TraceFlags (8 bits) | ||
- TraceState (string) | ||
|
||
Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero. | |
Propagators MUST NOT assume that bits 2-7 of TraceFlags (the 6 most significant bits) will be zero. |
- [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag) | ||
- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag) | |
- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag) | |
- [Sampled (value 0x1)](https://www.w3.org/TR/trace-context/#sampled-flag) | |
- [Random (value 0x2)](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag) |
I think it would help to know which bits we're talking about.
specification/trace/sdk.md
Outdated
* A rejection threshold is calculated, expressing as an integer how | ||
many out of 2**56 trace IDs should be sampled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* A rejection threshold is calculated, expressing as an integer how | |
many out of 2**56 trace IDs should be sampled. | |
* A rejection threshold is calculated, expressing as an integer the number | |
out of 2**56 trace IDs that should be dropped. |
This states that the SDK should fill least significant 7 bytes (i.e., 56 | ||
bits) of the TraceID are genuinely random or pseudo-random., so they | ||
can be used for probability sampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This states that the SDK should fill least significant 7 bytes (i.e., 56 | |
bits) of the TraceID are genuinely random or pseudo-random., so they | |
can be used for probability sampling. | |
This states that the SDK should fill the least significant 7 bytes (i.e., 56 | |
bits) of the TraceID with bits that are genuinely random or pseudo-random, | |
so that they can be used for probability sampling. |
decimal value is greater than 63 before using the `tracestate` to make | ||
a sampling decision or interpret adjusted count. | ||
These points, taken together, means there will be a long delay before | ||
the Random trace flag functions as intended in ther OpenTelemetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Random trace flag functions as intended in ther OpenTelemetry | |
the Random trace flag functions as intended in the OpenTelemetry |
|
||
##### Example: Probability sampled context | ||
Tail Samplers SHOULD ignore the W3C Trace Context Level 2 Random flag | ||
when interpreting span data. This is a compromise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when interpreting span data. This is a compromise | |
when interpreting span data. This is a compromise. |
|
||
Consider a trace context with the following headers: | ||
When there is an explicit R-value set, the Sampler will anyway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is an explicit R-value set, the Sampler will anyway | |
When there is an explicit R-value set, the Sampler will |
* Sampler decisions are made by comparing the Trace ID randomness | ||
against the rejection threshold. | ||
* When Sampled, T-value is included in the [OpenTelemetry TraceState | ||
header](./tracestate-handling.md), identified by sub-key `th`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the tracestate-handling.md to include the examples for the sub-key 'th'? Right now, it seems to be referring to the earlier p values and r value examples.
using up to 5 hexadecimal digits of precision. | ||
* Sampler decisions are made by comparing the Trace ID randomness | ||
against the rejection threshold. | ||
* When Sampled, T-value is included in the [OpenTelemetry TraceState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be good to rephrase this to an active voice of what implementations must do. For example, something like: "When sampled, an implementation MUST update the tracestate header to specify the T-value. <+ any additional details>"
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Fixes #1413.
Fixes #3602.
Changes
Updates Trace SDK and Propagator specifications with complete support for consistent probability sampling as described in OTEP 235.
CHANGELOG.md
TODOspec-compliance-matrix.md
TODO