Skip to content

Commit

Permalink
Document tracer configuration options (#225)
Browse files Browse the repository at this point in the history
* rough pass at beginning to document tracer configuration

* finish up configuration.md, add sampling.md

* fix link

* spelling typos

* brain damage

* slightly better wording

* address review comments
  • Loading branch information
dgoffredo authored Apr 26, 2022
1 parent 80979a8 commit dbcee78
Show file tree
Hide file tree
Showing 3 changed files with 382 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Usage docs are on the main Datadog website:

For some quick-start examples, see the [examples](examples/) folder.

For detailed information about this library's configuration, see [configuration.md][2].

## Contributing

Before considering contributions to the project, please take a moment to read our brief [contribution guidelines](CONTRIBUTING.md).
Expand Down Expand Up @@ -117,3 +119,4 @@ test/integration/run_integration_tests_local.sh
```

[1]: https://github.com/DataDog/dd-opentracing-cpp/issues/170
[2]: doc/configuration.md
255 changes: 255 additions & 0 deletions doc/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
Configuration
=============
The Datadog tracer's configuration can be specified in multiple ways:

- programmatically in C++ code via the `TracerOptions` object defined in
[datadog/opentracing.h][1],
- process-wide via setting values for certain [environment variables][4],
- [dynamically][3] by reading JSON-formatted text, as in done in the [nginx
plugin][2].

Most options support all three methods of configuration.

Environment variables override any corresponding configuration in
`TracerOptions` or loaded from JSON.

Options
-------
### Agent Host
The name of the host at which the Datadog Agent can be contacted, or the host's
IP address.

- **TracerOptions member**: `std::string agent_host`
- **JSON property**: `"agent_host"` _string_
- **Environment variable**: `DD_AGENT_HOST`
- **Default value**: `"localhost"`

### Agent Port
The port on which the Datadog Agent is listening.

- **TracerOptions member**: `uint32_t agent_port`
- **JSON property**: `"agent_port"` _integer_
- **Environment variable**: `DD_TRACE_AGENT_PORT`
- **Default value**: `8126`

### Agent URL
As an alternative to specifying a host and port separately, a URL may be
specified indicating where the Datadog Agent can be contacted. Both TCP
and Unix domain sockets are supported. For more information about using a
Unix domain socket, see the [relevant example][5].

If the Agent URL is specified, then it overrides the Agent host and Agent port
settings.

The following forms are supported:

- `http://host` (TCP)
- `http://host:port` (TCP)
- `https://host` (TCP)
- `https://host:port` (TCP)
- `unix://path` (Unix domain socket)
- `path` (Unix domain socket)

- **TracerOptions member**: `std::string agent_url`
- **JSON property**: `"agent_url"` _(string)_
- **Environment variable**: `DD_TRACE_AGENT_URL`
- **Default value**: `""`

### Service Name
The default service name to associate with spans produced by the tracer.
Service name can be overridden programmatically on a per-span basis by setting
a value for the `datadog::tags::service_name` tag.

- **TracerOptions member**: `std::string service`
- **JSON property**: `"service"` _(string)_
- **Environment variable**: `DD_SERVICE`
- **Required**

### Service Type
The default "service type" to associate with spans produced by the tracer.

Service type is used in multiple places throughout Datadog to distinguish
different categories of instrumented service from each other. For example,
it is used in the following ways:

- to identify whether the service's spans need to be obfuscated
- to control display of the service in the Datadog UI.

Example values for service type are `web`, `db`, and `lambda`.

- **TracerOptions member**: `std::string type`
- **JSON property**: `"type"` _(string)_
- **Default value**: `"web"`

### Environment
The default release environment in which the service is running, e.g. "prod,"
"dev," or "staging."

Environment is one of the core properties associated with a service, together
with its name and version. See [Unified Service Tagging][9].

- **TracerOptions member**: `std::string environment`
- **JSON property**: `"environment"` _(string)_
- **Environment variable**: `DD_ENV`
- **Default value**: `""`

### Sample Rate
The default probability that a trace beginning at this tracer will be sampled
for ingestion.

For more information about the configuration of trace sampling, see
[sampling.md][6].

- **TracerOptions member**: `double sample_rate`
- **JSON property**: `"sample_rate"` _(number)_
- **Environment variable**: `DD_TRACE_SAMPLE_RATE`

### Sampling Rules
Sampling rules allow for fine-grained control over the rate at which traces
beginning at this tracer will be sampled for ingestion. Sampling rules are
specified as a JSON array of objects.

For more information about the configuration of trace sampling, see
[sampling.md][6].

- **TracerOptions member**: `std::string sampling_rules`
- **JSON property**: `"sampling_rules"` _(array of objects)_
- **Environment variable**: `DD_TRACE_SAMPLING_RULES` _(JSON)_
- **Default value**: `[]`

### Trace Flushing Period
How often a batch of finished traces is sent to the Datadog Agent.

- **TracerOptions member**: `int64_t write_period_ms` _(milliseconds)_
- **Default value**: `1000` _(milliseconds)_

### Operation Name
The default operation name to associate with spans produced by the tracer.

A span's operation name (sometimes just called "name" or "operation") indicates
which of a service's functions the span represents.

Operation name is often fixed for a given service, e.g. the "nginx" service
entry spans might always have operation name "handle.request".

Operation name is not to be confused with a span's associated resource, also
known as endpoint. Resource (endpoint) contains information about the
particular request, whereas operation name is more like a subcategory of the
service name.

- **TracerOptions member**: `std::string operation_name_override`
- **JSON property**: `"operation_name_override"` _(string)_
- **Default value**: `""`

### Trace Context Extraction Styles
When one service calls another along a distributed trace, information about the
trace must be propagated in the call; information such as the trace ID, the
parent span ID, and the sampling decision.

Different tracing systems have different standards for how trace context is
propagated, e.g. which HTTP request headers are used.

The Datadog C++ tracer supports two styles of trace context propagation. The
default style, `Datadog`, decodes trace information from multiple `X-Datadog-*`
request headers. For compatibility with [other tracing systems][7], another
style, `B3`, is also supported. The `B3` style decodes trace information from
multiple `X-B3-*` request headers.

The trace context extraction styles setting indicates which styles the tracer
will consider when extracting trace context from a request. At least one style
must be specified, but multiple may be specified. If multiple styles are
specified, then trace context must be successfully extractable in at least one
of the styles, and if trace context can be extracted in both styles, the two
extracted contexts must agree.

- **TracerOptions member**: `std::set<PropagationStyle> extract`
- **JSON property**: `"propagation_style_extract"` _(array of string)_
- **Environment variable**: `DD_PROPAGATION_STYLE_EXTRACT` _(JSON)_
- **Default value**: `["Datadog"]`

### Trace Context Injection Styles
Trace context injection styles are analogous to trace context extraction styles
(see the previous section), except that rather than indicating which trace
context encoding are supported when _extracting_ trace context, trace context
injection styles indicate which trace context encoding(s) will be used when
_injecting_ context into a request to the next service along a trace.

Note that even if the `B3` injection style is used, the tracer still may inject
Datadog-specific trace context, such as in the `X-Datadog-Origin` request
header.

- **TracerOptions member**: `std::set<PropagationStyle> inject`
- **JSON property**: `"propagation_style_inject"` _(array of string)_
- **Environment variable**: `DD_PROPAGATION_STYLE_INJECT` _(JSON)_
- **Default value**: `["Datadog"]`

### Host Name Reporting
If `true`, the tracer will look up its host's name on the network using the
[gethostname][8] function and send it to the Datadog backend in a reserved span
tag.

- **TracerOptions member**: `bool report_hostname`
- **JSON property**: `"dd.trace.report-hostname"` _(boolean)_
- **Environment variable**: `DD_TRACE_REPORT_HOSTNAME`
- **Default value**: `false`

### Span Tags
Tags to add to every span produced by the tracer.

When specified as `std::map<std::string, std::string> tags`, each entry in the
map is a (key, value) pair, where the key is the name of the span tag, and the
value is its value. The value is a string.

When specified as the `DD_TAGS` environment variable, tags are formatted as a
comma-separated list of `key:value` pairs (the key and value are separated by a
colon).

- **TracerOptions member**: `std::map<std::string, std::string> tags`
- **JSON property**: `tags` _(object)_
- **Environment variable**: `DD_TAGS` _(format: `"name:value,name:value,..."`)_
- **Default value**: `{}`

### Application Version
The version of the application that is being instrumented.

If set, the application version is sent to the Datadog backend as the `version`
tag on the first span that the tracer produces in every trace.

- **TracerOptions member**: `std::string version`
- **JSON property**: `version` _(string)_
- **Environment variable**: `DD_VERSION`
- **Default value**: `""`

### Logging Function
The function used by the library to log diagnostics.

The provided function takes two arguments:

- `LogLevel level` is the severity of the diagnostic: `debug`, `info`, or
`error`.
- `::opentracing::string_view message` is the diagnostic message itself.

- **TracerOptions member**: `std::function<void(LogLevel, ::opentracing::string_view)> log_func`
- **Default value**: _(prints to `std::cerr`)_

### Limit Traces Sampled Per Second
The maximum number of traces per second that may be sampled on account of
either sampling rules or `DD_TRACE_SAMPLE_RATE`.

For more information about the configuration of trace sampling, see
[sampling.md][6].

- **TracerOptions member**: `double sampling_limit_per_second`
- **JSON property**: `sampling_limit_per_second` _(number)_
- **Environment variable**: `DD_TRACE_RATE_LIMIT`
- **Default value**: `100`

[1]: /include/datadog/opentracing.h
[2]: https://docs.datadoghq.com/tracing/setup_overview/proxy_setup/?tab=nginx#nginx-configuration
[3]: https://docs.datadoghq.com/tracing/setup_overview/setup/cpp/?tab=containers#dynamic-loading
[4]: https://docs.datadoghq.com/tracing/setup_overview/setup/cpp/?tab=containers#environment-variables
[5]: /examples/cpp-tracing/unix-domain-socket
[6]: sampling.md
[7]: https://github.com/openzipkin/b3-propagation
[8]: https://pubs.opengroup.org/onlinepubs/9699919799/
[9]: https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging
124 changes: 124 additions & 0 deletions doc/sampling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
Configuring Trace Sampling
==========================
If instrumented services are producing a higher volume of tracing data than is
desired, then the services can be configured to send tracing data for only a
subset of processed requests. This is called trace sampling.

By default, the rate at which instrumented services sample traces is governed by
the Datadog Agent, which dynamically adjusts the sampling rates of its clients
in order to reach a [configured target number][1] of traces per second.

For fine-grained control over trace sampling, instrumented services can be
configured with _sampling rules_. What follows is a description of how
trace sampling may be configured in the Datadog C++ tracing library.

Sampling Rules
--------------
It is the _first_ service in a trace (the "root service") that determines
whether the trace will be sent to Datadog. Subsequent services in the trace
follow whichever decision was made by the root service.

The root service may define rules that assign different sampling rates to
different kinds of traces. In these rules, traces are distinguished by the
"service" and "operation name" associated with the root span. Typically, the
root span of a service is always associated with the same "service" and
"operation name." However, services acting as hosts to multiple services may
produce different "service" spans for different requests.

For example, consider the following array of rules:
```json
[
{"service": "usersvc", "name": "healthcheck", "sample_rate": 0.0},
{"service": "usersvc", "sample_rate": 0.5},
{"service": "authsvc", "sample_rate": 1.0},
{"sample_rate": 0.1}
]
```
These rules stipulate the following trace sampling behavior:

- `usersvc` requests whose operation name is `healthcheck` are never sampled.
- Other `usersvc` requests are sampled 50% of the time.
- `authsvc` requests are sampling 100% of the time.
- All other requests are sampled 10% of the time.

`sample_rate` is a probability. Its minimum value is zero, indicating "never,"
and its maximum value is one, indicating "always."

Note that the sampling behavior stipulated by sampling rules is relevant only
if the tracer being configured is the _first_ in the trace.

When a trace is created, its root span is evaluated against each sampling rule
in order. The first rule that matches determines the probability that the
trace will be sampled. If no rule matches, then the trace is subject to the
sampling rates governed by the Datadog Agent, as explained above.

Sampling rules can be configured programmatically in `std::string
TracerOptions::sampling_rules` or via the environment variable
`DD_TRACE_SAMPLING_RULES`. In either case, the rules are expressed as a JSON
array of objects. Each object supports the following properties:
```
[{
"service": <the root span's service name, or any if absent>,
"name": <the root span's operation name, or any if absent>,
"sample_rate": <the probability of sampling the trace, or 1.0 if absent>
}, ...]
```

`DD_TRACE_SAMPLE_RATE`
----------------------
Setting a (numeric) value for the `DD_TRACE_SAMPLE_RATE` environment variable
effectively appends a sampling rule to the tracer's array of sampling rules:
```
[
...,
{"sample_rate": $DD_TRACE_SAMPLE_RATE
]
```
Now there is a sampling rule that matches _any_ trace, and so traces that do
not match an earlier sampling rule are subject to the configured sampling rate.

Note that using `DD_TRACE_SAMPLE_RATE` means that the Datadog Agent no longer
governs the sampling rate of any traces produced by the tracer. The implicit
"catch-all" rule, with the configured sampling rate, always takes precedence
over the Agent-based fallback.

`double TracerOptions::sample_rate`
-----------------------------------
This configuration option has the same meaning as the `DD_TRACE_SAMPLE_RATE`
environment variable. Note that the environment variable overrides the
`TracerOptions` field if both are specified.

`DD_TRACE_RATE_LIMIT`
---------------------
Sampling rules (and, by extension, `DD_TRACE_SAMPLE_RATE`) specify the
_probability_ that a trace will be sampled, but they do not specify the maximum
number of traces that may be produced by the tracer in a given time period.

`DD_TRACE_RATE_LIMIT` is the maximum number of traces, per second, that may be
sampled by the tracer on account of sampling rules or `DD_TRACE_SAMPLE_RATE`.
The limit applies globally across all applicable traces, i.e. there is not a
separate limit for each sampling rule.

`DD_TRACE_RATE_LIMIT` is a floating point number, but is usually specified as an integer, e.g.
```shell
export DD_TRACE_RATE_LIMIT=200
```
for a limit of 200 traces per second.

If this limit is not configured, its default value is 100 traces per second.

Note that this limit applies separately to each tracer. If the instrumented
service spawns multiple processes, then each process contains its own tracer,
and each tracer is separately subject to the configured rate limit. For
example, if [nginx][2] is configured with `DD_TRACE_RATE_LIMIT=200` and also
spawns eight worker processes, then the actual limit overall is `200 * 8 =
1600` traces per second.

`double TracerOptions::sampling_limit_per_second`
-------------------------------------------------
This configuration option has the same meaning as the `DD_TRACE_RATE_LIMIT`
environment variable. Note that the environment variable overrides the
`TracerOptions` field if both are specified.

[1]: https://docs.datadoghq.com/tracing/trace_ingestion/mechanisms/?tab=environmentvariables#in-the-agent
[2]: https://docs.datadoghq.com/tracing/setup_overview/proxy_setup/?tab=nginx

0 comments on commit dbcee78

Please sign in to comment.