From dbcee78a34a6c8f628a65837d0ab35a4f130f57c Mon Sep 17 00:00:00 2001 From: David Goffredo Date: Tue, 26 Apr 2022 10:38:13 -0400 Subject: [PATCH] Document tracer configuration options (#225) * rough pass at beginning to document tracer configuration * finish up configuration.md, add sampling.md * fix link * spelling typos * brain damage * slightly better wording * address review comments --- README.md | 3 + doc/configuration.md | 255 +++++++++++++++++++++++++++++++++++++++++++ doc/sampling.md | 124 +++++++++++++++++++++ 3 files changed, 382 insertions(+) create mode 100644 doc/configuration.md create mode 100644 doc/sampling.md diff --git a/README.md b/README.md index f305ec1a..52e79c4f 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,8 @@ Usage docs are on the main Datadog website: For some quick-start examples, see the [examples](examples/) folder. +For detailed information about this library's configuration, see [configuration.md][2]. + ## Contributing Before considering contributions to the project, please take a moment to read our brief [contribution guidelines](CONTRIBUTING.md). @@ -117,3 +119,4 @@ test/integration/run_integration_tests_local.sh ``` [1]: https://github.com/DataDog/dd-opentracing-cpp/issues/170 +[2]: doc/configuration.md diff --git a/doc/configuration.md b/doc/configuration.md new file mode 100644 index 00000000..12dc11b4 --- /dev/null +++ b/doc/configuration.md @@ -0,0 +1,255 @@ +Configuration +============= +The Datadog tracer's configuration can be specified in multiple ways: + +- programmatically in C++ code via the `TracerOptions` object defined in + [datadog/opentracing.h][1], +- process-wide via setting values for certain [environment variables][4], +- [dynamically][3] by reading JSON-formatted text, as in done in the [nginx + plugin][2]. + +Most options support all three methods of configuration. + +Environment variables override any corresponding configuration in +`TracerOptions` or loaded from JSON. + +Options +------- +### Agent Host +The name of the host at which the Datadog Agent can be contacted, or the host's +IP address. + +- **TracerOptions member**: `std::string agent_host` +- **JSON property**: `"agent_host"` _string_ +- **Environment variable**: `DD_AGENT_HOST` +- **Default value**: `"localhost"` + +### Agent Port +The port on which the Datadog Agent is listening. + +- **TracerOptions member**: `uint32_t agent_port` +- **JSON property**: `"agent_port"` _integer_ +- **Environment variable**: `DD_TRACE_AGENT_PORT` +- **Default value**: `8126` + +### Agent URL +As an alternative to specifying a host and port separately, a URL may be +specified indicating where the Datadog Agent can be contacted. Both TCP +and Unix domain sockets are supported. For more information about using a +Unix domain socket, see the [relevant example][5]. + +If the Agent URL is specified, then it overrides the Agent host and Agent port +settings. + +The following forms are supported: + +- `http://host` (TCP) +- `http://host:port` (TCP) +- `https://host` (TCP) +- `https://host:port` (TCP) +- `unix://path` (Unix domain socket) +- `path` (Unix domain socket) + +- **TracerOptions member**: `std::string agent_url` +- **JSON property**: `"agent_url"` _(string)_ +- **Environment variable**: `DD_TRACE_AGENT_URL` +- **Default value**: `""` + +### Service Name +The default service name to associate with spans produced by the tracer. +Service name can be overridden programmatically on a per-span basis by setting +a value for the `datadog::tags::service_name` tag. + +- **TracerOptions member**: `std::string service` +- **JSON property**: `"service"` _(string)_ +- **Environment variable**: `DD_SERVICE` +- **Required** + +### Service Type +The default "service type" to associate with spans produced by the tracer. + +Service type is used in multiple places throughout Datadog to distinguish +different categories of instrumented service from each other. For example, +it is used in the following ways: + +- to identify whether the service's spans need to be obfuscated +- to control display of the service in the Datadog UI. + +Example values for service type are `web`, `db`, and `lambda`. + +- **TracerOptions member**: `std::string type` +- **JSON property**: `"type"` _(string)_ +- **Default value**: `"web"` + +### Environment +The default release environment in which the service is running, e.g. "prod," +"dev," or "staging." + +Environment is one of the core properties associated with a service, together +with its name and version. See [Unified Service Tagging][9]. + +- **TracerOptions member**: `std::string environment` +- **JSON property**: `"environment"` _(string)_ +- **Environment variable**: `DD_ENV` +- **Default value**: `""` + +### Sample Rate +The default probability that a trace beginning at this tracer will be sampled +for ingestion. + +For more information about the configuration of trace sampling, see +[sampling.md][6]. + +- **TracerOptions member**: `double sample_rate` +- **JSON property**: `"sample_rate"` _(number)_ +- **Environment variable**: `DD_TRACE_SAMPLE_RATE` + +### Sampling Rules +Sampling rules allow for fine-grained control over the rate at which traces +beginning at this tracer will be sampled for ingestion. Sampling rules are +specified as a JSON array of objects. + +For more information about the configuration of trace sampling, see +[sampling.md][6]. + +- **TracerOptions member**: `std::string sampling_rules` +- **JSON property**: `"sampling_rules"` _(array of objects)_ +- **Environment variable**: `DD_TRACE_SAMPLING_RULES` _(JSON)_ +- **Default value**: `[]` + +### Trace Flushing Period +How often a batch of finished traces is sent to the Datadog Agent. + +- **TracerOptions member**: `int64_t write_period_ms` _(milliseconds)_ +- **Default value**: `1000` _(milliseconds)_ + +### Operation Name +The default operation name to associate with spans produced by the tracer. + +A span's operation name (sometimes just called "name" or "operation") indicates +which of a service's functions the span represents. + +Operation name is often fixed for a given service, e.g. the "nginx" service +entry spans might always have operation name "handle.request". + +Operation name is not to be confused with a span's associated resource, also +known as endpoint. Resource (endpoint) contains information about the +particular request, whereas operation name is more like a subcategory of the +service name. + +- **TracerOptions member**: `std::string operation_name_override` +- **JSON property**: `"operation_name_override"` _(string)_ +- **Default value**: `""` + +### Trace Context Extraction Styles +When one service calls another along a distributed trace, information about the +trace must be propagated in the call; information such as the trace ID, the +parent span ID, and the sampling decision. + +Different tracing systems have different standards for how trace context is +propagated, e.g. which HTTP request headers are used. + +The Datadog C++ tracer supports two styles of trace context propagation. The +default style, `Datadog`, decodes trace information from multiple `X-Datadog-*` +request headers. For compatibility with [other tracing systems][7], another +style, `B3`, is also supported. The `B3` style decodes trace information from +multiple `X-B3-*` request headers. + +The trace context extraction styles setting indicates which styles the tracer +will consider when extracting trace context from a request. At least one style +must be specified, but multiple may be specified. If multiple styles are +specified, then trace context must be successfully extractable in at least one +of the styles, and if trace context can be extracted in both styles, the two +extracted contexts must agree. + +- **TracerOptions member**: `std::set extract` +- **JSON property**: `"propagation_style_extract"` _(array of string)_ +- **Environment variable**: `DD_PROPAGATION_STYLE_EXTRACT` _(JSON)_ +- **Default value**: `["Datadog"]` + +### Trace Context Injection Styles +Trace context injection styles are analogous to trace context extraction styles +(see the previous section), except that rather than indicating which trace +context encoding are supported when _extracting_ trace context, trace context +injection styles indicate which trace context encoding(s) will be used when +_injecting_ context into a request to the next service along a trace. + +Note that even if the `B3` injection style is used, the tracer still may inject +Datadog-specific trace context, such as in the `X-Datadog-Origin` request +header. + +- **TracerOptions member**: `std::set inject` +- **JSON property**: `"propagation_style_inject"` _(array of string)_ +- **Environment variable**: `DD_PROPAGATION_STYLE_INJECT` _(JSON)_ +- **Default value**: `["Datadog"]` + +### Host Name Reporting +If `true`, the tracer will look up its host's name on the network using the +[gethostname][8] function and send it to the Datadog backend in a reserved span +tag. + +- **TracerOptions member**: `bool report_hostname` +- **JSON property**: `"dd.trace.report-hostname"` _(boolean)_ +- **Environment variable**: `DD_TRACE_REPORT_HOSTNAME` +- **Default value**: `false` + +### Span Tags +Tags to add to every span produced by the tracer. + +When specified as `std::map tags`, each entry in the +map is a (key, value) pair, where the key is the name of the span tag, and the +value is its value. The value is a string. + +When specified as the `DD_TAGS` environment variable, tags are formatted as a +comma-separated list of `key:value` pairs (the key and value are separated by a +colon). + +- **TracerOptions member**: `std::map tags` +- **JSON property**: `tags` _(object)_ +- **Environment variable**: `DD_TAGS` _(format: `"name:value,name:value,..."`)_ +- **Default value**: `{}` + +### Application Version +The version of the application that is being instrumented. + +If set, the application version is sent to the Datadog backend as the `version` +tag on the first span that the tracer produces in every trace. + +- **TracerOptions member**: `std::string version` +- **JSON property**: `version` _(string)_ +- **Environment variable**: `DD_VERSION` +- **Default value**: `""` + +### Logging Function +The function used by the library to log diagnostics. + +The provided function takes two arguments: + +- `LogLevel level` is the severity of the diagnostic: `debug`, `info`, or + `error`. +- `::opentracing::string_view message` is the diagnostic message itself. + +- **TracerOptions member**: `std::function log_func` +- **Default value**: _(prints to `std::cerr`)_ + +### Limit Traces Sampled Per Second +The maximum number of traces per second that may be sampled on account of +either sampling rules or `DD_TRACE_SAMPLE_RATE`. + +For more information about the configuration of trace sampling, see +[sampling.md][6]. + +- **TracerOptions member**: `double sampling_limit_per_second` +- **JSON property**: `sampling_limit_per_second` _(number)_ +- **Environment variable**: `DD_TRACE_RATE_LIMIT` +- **Default value**: `100` + +[1]: /include/datadog/opentracing.h +[2]: https://docs.datadoghq.com/tracing/setup_overview/proxy_setup/?tab=nginx#nginx-configuration +[3]: https://docs.datadoghq.com/tracing/setup_overview/setup/cpp/?tab=containers#dynamic-loading +[4]: https://docs.datadoghq.com/tracing/setup_overview/setup/cpp/?tab=containers#environment-variables +[5]: /examples/cpp-tracing/unix-domain-socket +[6]: sampling.md +[7]: https://github.com/openzipkin/b3-propagation +[8]: https://pubs.opengroup.org/onlinepubs/9699919799/ +[9]: https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging diff --git a/doc/sampling.md b/doc/sampling.md new file mode 100644 index 00000000..9e44352a --- /dev/null +++ b/doc/sampling.md @@ -0,0 +1,124 @@ +Configuring Trace Sampling +========================== +If instrumented services are producing a higher volume of tracing data than is +desired, then the services can be configured to send tracing data for only a +subset of processed requests. This is called trace sampling. + +By default, the rate at which instrumented services sample traces is governed by +the Datadog Agent, which dynamically adjusts the sampling rates of its clients +in order to reach a [configured target number][1] of traces per second. + +For fine-grained control over trace sampling, instrumented services can be +configured with _sampling rules_. What follows is a description of how +trace sampling may be configured in the Datadog C++ tracing library. + +Sampling Rules +-------------- +It is the _first_ service in a trace (the "root service") that determines +whether the trace will be sent to Datadog. Subsequent services in the trace +follow whichever decision was made by the root service. + +The root service may define rules that assign different sampling rates to +different kinds of traces. In these rules, traces are distinguished by the +"service" and "operation name" associated with the root span. Typically, the +root span of a service is always associated with the same "service" and +"operation name." However, services acting as hosts to multiple services may +produce different "service" spans for different requests. + +For example, consider the following array of rules: +```json +[ + {"service": "usersvc", "name": "healthcheck", "sample_rate": 0.0}, + {"service": "usersvc", "sample_rate": 0.5}, + {"service": "authsvc", "sample_rate": 1.0}, + {"sample_rate": 0.1} +] +``` +These rules stipulate the following trace sampling behavior: + +- `usersvc` requests whose operation name is `healthcheck` are never sampled. +- Other `usersvc` requests are sampled 50% of the time. +- `authsvc` requests are sampling 100% of the time. +- All other requests are sampled 10% of the time. + +`sample_rate` is a probability. Its minimum value is zero, indicating "never," +and its maximum value is one, indicating "always." + +Note that the sampling behavior stipulated by sampling rules is relevant only +if the tracer being configured is the _first_ in the trace. + +When a trace is created, its root span is evaluated against each sampling rule +in order. The first rule that matches determines the probability that the +trace will be sampled. If no rule matches, then the trace is subject to the +sampling rates governed by the Datadog Agent, as explained above. + +Sampling rules can be configured programmatically in `std::string +TracerOptions::sampling_rules` or via the environment variable +`DD_TRACE_SAMPLING_RULES`. In either case, the rules are expressed as a JSON +array of objects. Each object supports the following properties: +``` +[{ + "service": , + "name": , + "sample_rate": +}, ...] +``` + +`DD_TRACE_SAMPLE_RATE` +---------------------- +Setting a (numeric) value for the `DD_TRACE_SAMPLE_RATE` environment variable +effectively appends a sampling rule to the tracer's array of sampling rules: +``` +[ + ..., + {"sample_rate": $DD_TRACE_SAMPLE_RATE +] +``` +Now there is a sampling rule that matches _any_ trace, and so traces that do +not match an earlier sampling rule are subject to the configured sampling rate. + +Note that using `DD_TRACE_SAMPLE_RATE` means that the Datadog Agent no longer +governs the sampling rate of any traces produced by the tracer. The implicit +"catch-all" rule, with the configured sampling rate, always takes precedence +over the Agent-based fallback. + +`double TracerOptions::sample_rate` +----------------------------------- +This configuration option has the same meaning as the `DD_TRACE_SAMPLE_RATE` +environment variable. Note that the environment variable overrides the +`TracerOptions` field if both are specified. + +`DD_TRACE_RATE_LIMIT` +--------------------- +Sampling rules (and, by extension, `DD_TRACE_SAMPLE_RATE`) specify the +_probability_ that a trace will be sampled, but they do not specify the maximum +number of traces that may be produced by the tracer in a given time period. + +`DD_TRACE_RATE_LIMIT` is the maximum number of traces, per second, that may be +sampled by the tracer on account of sampling rules or `DD_TRACE_SAMPLE_RATE`. +The limit applies globally across all applicable traces, i.e. there is not a +separate limit for each sampling rule. + +`DD_TRACE_RATE_LIMIT` is a floating point number, but is usually specified as an integer, e.g. +```shell +export DD_TRACE_RATE_LIMIT=200 +``` +for a limit of 200 traces per second. + +If this limit is not configured, its default value is 100 traces per second. + +Note that this limit applies separately to each tracer. If the instrumented +service spawns multiple processes, then each process contains its own tracer, +and each tracer is separately subject to the configured rate limit. For +example, if [nginx][2] is configured with `DD_TRACE_RATE_LIMIT=200` and also +spawns eight worker processes, then the actual limit overall is `200 * 8 = +1600` traces per second. + +`double TracerOptions::sampling_limit_per_second` +------------------------------------------------- +This configuration option has the same meaning as the `DD_TRACE_RATE_LIMIT` +environment variable. Note that the environment variable overrides the +`TracerOptions` field if both are specified. + +[1]: https://docs.datadoghq.com/tracing/trace_ingestion/mechanisms/?tab=environmentvariables#in-the-agent +[2]: https://docs.datadoghq.com/tracing/setup_overview/proxy_setup/?tab=nginx