Skip to content

Commit

Permalink
Sampling weight -> sampling rate
Browse files Browse the repository at this point in the history
  • Loading branch information
axw committed Aug 5, 2020
1 parent 6cca0b6 commit af072f0
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 42 deletions.
4 changes: 2 additions & 2 deletions docs/agents/distributed-tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ header.
## Tracestate

For our own `elastic` `tracestate` entry we will introduce a `key:value` formatted list of attributes.
This is used to propagate the sample weight downstream, for example.
This is used to propagate the sampling rate downstream, for example.
See the [sampling](sampling.md) specification for more details.

The general `tracestate` format is:
Expand All @@ -28,7 +28,7 @@ The general `tracestate` format is:

For example:

tracestate: elastic=w:5,othervendor=<opaque>
tracestate: elastic=s:0.1,othervendor=<opaque>


### Validation and length limits
Expand Down
71 changes: 31 additions & 40 deletions docs/agents/sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,65 +9,56 @@ or outcome of the trace are known, and propagated throughout the trace.

Agents can be configured to sample probabilistically, by specifying a
sampling probability in the range \[0,1\] using the configuration
`ELASTIC_APM_TRANSACTION_SAMPLE_RATE`. For example:
`ELASTIC_APM_TRANSACTION_SAMPLE_RATE`. e.g.

- `ELASTIC_APM_TRANSACTION_SAMPLE_RATE=1` means all transactions will be sampled (the default)
- `ELASTIC_APM_TRANSACTION_SAMPLE_RATE=0` means no transactions will be sampled
- `ELASTIC_APM_TRANSACTION_SAMPLE_RATE=0.5` means approximately 50% of transactions will be sampled

## Sampling weight
## Sampling rate

At the time of making a sampling decision, a "sampling weight" must be calculated.
This value represents the approximate number of traces that the sampled trace is
representative of. Every transaction and span in the trace must have the same weight.
At the time of making a sampling decision, the sampling rate must be recorded
so that it can be associated with every transaction and span in the trace. This
will be used for scaling metrics.

For probabilistic sampling, the weight is the inverse of the sampling rate.
e.g. for a sampling rate of 0.5, the weight is 1/0.5=2; for a sampling rate of 0.2,
the weight is 1/0.2=5.
The sampling rate must be recorded on transactions and spans as `sample_rate`. e.g.

Sampling weight must be recorded on transactions and spans as "weight". e.g.
{"transaction":{"name":"GET /","sample_rate":0.1,...}}
{"span":{"name":"SELECT FROM table","sample_rate":0.1,...}}

{"transaction":{"name":"GET /","weight":5,...}}
{"span":{"name":"SELECT FROM table","weight":5,...}}

For non-sampled transactions the weight must be recorded as 0. For backwards
compatibility the server will assume a value of 1 if unspecified.
For non-sampled transactions the `sample_rate` field _must_ be set to 0.
For backwards compatibility the server will assume a value of 1 where the field is unspecified,
resulting in all transactions (sampled and unsampled) being counted equally in metrics.

## Non-sampled transactions

Currently, _all_ transactions are captured by Elastic APM agents. Sampling
controls how much data is captured for transactions: sampled transactions
have complete context recorded, and include spans; non-sampled transactions
have limited context, and no spans.
Currently _all_ transactions are captured by Elastic APM agents.
Sampling controls how much data is captured for transactions: sampled transactions have complete context recorded, and include spans;
non-sampled transactions have limited context and no spans.

For non-sampled transactions set the transaction attributes `sampled: false` and `sample_rate: 0`, and omit `context`.
No spans should be captured.

For non-sampled transactions, set the transaction attributes `sampled: false`
and `weight: 0`, and omit `context`. No spans should be captured. In the future
we may introduce options to agents to stop sending non-sampled transactions
altogether.
In the future we may introduce options to agents to stop sending non-sampled transactions altogether.

## Propagation

As mentioned above, the sampling decision must be propagated throughout the trace.
We adhere to the W3C Trace-Context spec for this, propagating the decision through
trace-flags: https://www.w3.org/TR/trace-context/#sampled-flag

In addition to propagating the sampling decision (boolean), agents must also propagate
the sampling weight to ensure a consistent weight is applied to all events in the trace.
This is achieved by adding a `w` attribute to our [`elastic` `tracestate` key](distributed-tracing.md#tracestate) when calculating the
sampling weight.
We adhere to the W3C Trace-Context spec for this, propagating the decision through trace-flags: https://www.w3.org/TR/trace-context/#sampled-flag

For example:
In addition to propagating the sampling decision (boolean), agents must also propagate the sampling rate to ensure it is consistently attached to to all events in the trace.
This is achieved by adding an `s` attribute to our [`elastic` `tracestate` key](distributed-tracing.md#tracestate) with the value of the sampling rate.
e.g.

tracestate: elastic=w:5,othervendor=<opaque>
tracestate: elastic=s:0.1,othervendor=<opaque>

The "w" attribute should be a number. As `tracestate` has modest size limits, we must
keep the size down. If "w" has more than 5 significant figures before the decimal point,
then round half away from zero to the nearest integer. Otherwise, round half away from
zero to 5 significant figures. e.g.
As `tracestate` has modest size limits we must keep the size down.
When recording `s` in `tracestate` the sampling rate should be rounded half away from zero to 3 decimal places.
e.g.

1.23455 -> 1.2346
12345.5 -> 12346
0.5554 -> 0.555
0.5555 -> 0.556
0.5556 -> 0.556

For a downstream agent, if `tracestate` is not found or does not contain an "elastic"
entry with a "w" attribute, then it must assume a sample weight of 1 just as the server
does for backwards compatibility.
For a downstream agent, if `tracestate` is not found or does not contain an `elastic` entry with an `s` attribute,
then it must assume a sampling rate of 1 just as the server does for backwards compatibility.

0 comments on commit af072f0

Please sign in to comment.