diff --git a/apollo-router/src/configuration/tests.rs b/apollo-router/src/configuration/tests.rs index b01b139378..95ddccef4d 100644 --- a/apollo-router/src/configuration/tests.rs +++ b/apollo-router/src/configuration/tests.rs @@ -393,8 +393,11 @@ cors: #[test] fn validate_project_config_files() { + std::env::set_var("DATADOG_AGENT_HOST", "http://example.com"); + std::env::set_var("JAEGER_HOST", "http://example.com"); std::env::set_var("JAEGER_USERNAME", "username"); std::env::set_var("JAEGER_PASSWORD", "pass"); + std::env::set_var("ZIPKIN_HOST", "http://example.com"); std::env::set_var("TEST_CONFIG_ENDPOINT", "http://example.com"); std::env::set_var("TEST_CONFIG_COLLECTOR_ENDPOINT", "http://example.com"); diff --git a/docs/source/configuration/metrics.mdx b/docs/source/configuration/metrics.mdx index b7c02db97e..20955a18e8 100644 --- a/docs/source/configuration/metrics.mdx +++ b/docs/source/configuration/metrics.mdx @@ -56,11 +56,38 @@ apollo_router_http_request_duration_seconds_bucket{le="0.9"} 1 > Note that if you haven't run a query against the router yet, you'll see a blank page because no metrics have been generated! -### Available metrics +## Datadog via OTLP + +To use Datadog, you must configure the Datadog agent to accept OTLP metrics. This can be done by adding the following to your `datadog.yaml`: + +```yaml title="datadog.yaml" +otlp_config: + receiver: + protocols: + grpc: + endpoint: :4317 +``` + +The router must also be configured to send traces to the Datadog agent: + +```yaml title="router.yaml" +telemetry: + metrics: + otlp: + enabled: true + # Temporality MUST be set to delta. Failure to do this will result in incorrect metrics. + temporality: delta + # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317) + endpoint: "${env.DATADOG_AGENT_HOST}:4317" +``` + +See [Datadog Agent configuration](https://docs.datadoghq.com/opentelemetry/otlp_ingest_in_the_agent/?tab=host) for more details + +## Available metrics The following metrics are available for Prometheus and OpenTelemetry. Attributes are listed where applicable. -#### HTTP +### HTTP - `apollo_router_http_request_duration_seconds_bucket` - HTTP router request duration - `apollo_router_http_request_duration_seconds_bucket` - HTTP subgraph request duration, attributes: @@ -71,12 +98,12 @@ The following metrics are available for Prometheus and OpenTelemetry. Attributes - `subgraph`: The subgraph being queried - `status` : If the retry was aborted (`aborted`) -#### Session +### Session - `apollo_router_session_count_total` - Number of currently connected clients - `apollo_router_session_count_active` - Number of in-flight GraphQL requests -#### Cache +### Cache - `apollo_router_cache_size` — Number of entries in the cache - `apollo_router_cache_hit_count` - Number of cache hits @@ -89,7 +116,7 @@ All cache metrics listed above have the following attributes: - `kind`: the cache being queried (`apq`, `query planner`, `introspection`) - `storage`: The backend storage of the cache (`memory`, `redis`) -#### Coprocessor +### Coprocessor - `apollo_router_operations_coprocessor_total` - Total operations with coprocessors enabled. - `apollo_router_operations_coprocessor.duration` - Time spent waiting for the coprocessor to answer, in seconds. @@ -99,14 +126,14 @@ The coprocessor operations metric has the following attributes: - `coprocessor.stage`: string (`RouterRequest`, `RouterResponse`, `SubgraphRequest`, `SubgraphResponse`) - `coprocessor.succeeded`: bool -#### Performance +### Performance - `apollo_router_processing_time` - Time spent processing a request (outside of waiting for external or subgraph requests) in seconds. - `apollo_router_query_planning_time` - Time spent planning queries in seconds. - `apollo_router_query_planning_warmup_duration` - Time spent planning queries in seconds. - `apollo_router_schema_load_duration` - Time spent loading the schema in seconds. -#### Uplink +### Uplink - `apollo_router_uplink_fetch_duration_seconds_bucket` - Uplink request duration, attributes: @@ -122,13 +149,13 @@ The coprocessor operations metric has the following attributes: Note that the initial call to uplink during router startup will not be reflected in metrics. -#### Subscription +### Subscription - `apollo_router_opened_subscriptions` - Number of different opened subscriptions (not the number of clients with an opened subscriptions in case it's deduplicated) - `apollo_router_deduplicated_subscriptions_total` - Number of subscriptions that has been deduplicated - `apollo_router_skipped_event_count` - Number of subscription events that has been skipped because too many events have been received from the subgraph but not yet sent to the client. -#### Batching +### Batching - `apollo_router.operations.batching` - A counter of the number of query batches received by the router. - `apollo_router.operations.batching.size` - A histogram tracking the number of queries contained within a query batch. diff --git a/docs/source/configuration/tracing.mdx b/docs/source/configuration/tracing.mdx index 0681c2298f..05bf3419de 100644 --- a/docs/source/configuration/tracing.mdx +++ b/docs/source/configuration/tracing.mdx @@ -149,7 +149,7 @@ telemetry: You will need to experiment to find the setting that are appropriate for your use case. -## Using Datadog +## Using Datadog (native) The Apollo Router can be configured to connect to either the default agent address or a URL. @@ -158,8 +158,8 @@ telemetry: tracing: datadog: enabled: true - # Either 'default' or a URL (example: 'http://127.0.0.1:8126') - endpoint: default + # Optional endpoint, either 'default' or a URL (Defaults to http://localhost:8126/v0.4/traces) + endpoint: "http://${env.DATADOG_AGENT_HOST}:8126/v0.4/traces" ``` [Given that there are some incompatibilities](https://docs.rs/opentelemetry-datadog/latest/opentelemetry_datadog/#quirks) @@ -171,7 +171,6 @@ telemetry: tracing: datadog: enabled: true - endpoint: default enable_span_mapping: true ``` @@ -211,7 +210,9 @@ Instead when `enable_span_mapping` is set to `true` the following trace will be ``` -## Using Jaeger +## Using Jaeger (native) + +> :warning: [Jaeger native is deprecated](https://opentelemetry.io/blog/2022/jaeger-native-otlp/) and will be removed in a future release of the Apollo Router. Instead, [OTLP](#jaeger) should be used. The Apollo Router can be configured to export tracing data to Jaeger either via an agent or http collector. @@ -226,14 +227,12 @@ telemetry: enabled: true # Optional agent configuration, agent: - # Optional endpoint, either 'default' or a socket address (Defaults to 127.0.0.1:6831) - endpoint: docker_jaeger:14268 + # Optional endpoint, either 'default' or a socket address (Defaults to 127.0.0.1:6832) + endpoint: "${env.JAEGER_HOST}:6832" ``` ### Collector config -If you're using Kubernetes, you can inject your secrets into configuration via environment variables: - ```yaml title="router.yaml" telemetry: tracing: @@ -242,16 +241,18 @@ telemetry: # Optional collector configuration, collector: # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:14268/api/traces) - endpoint: "http://my-jaeger-collector" + endpoint: "http://${env.JAEGER_HOST}:14268/api/traces" username: "${env.JAEGER_USERNAME}" password: "${env.JAEGER_PASSWORD}" ``` -## OpenTelemetry Collector via OTLP +## OTLP -[OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) is a horizontally scalable collector that you can use to receive, process, and export your telemetry data in a pluggable way. - -If you find that the built-in telemetry features of the Apollo Router are missing some desired functionality (e.g., [exporting to Kafka](https://opentelemetry.io/docs/collector/configuration/#exporters)), then it's worth considering this option. +OTLP (OpenTelemetry protocol) is the native protocol for open telemetry. It can be used to export traces to a variety of backends including: +* OpenTelemetry Collector +* Datadog +* Honeycomb +* Lightstep ```yaml title="router.yaml" telemetry: @@ -282,6 +283,57 @@ telemetry: Remember that `file.` and `env.` prefixes can be used for expansion in config yaml. e.g. `${file.ca.txt}`. +### Datadog + +To use Datadog, you must configure the Datadog agent to accept OTLP traces. This can be done by adding the following to your `datadog.yaml`: + +```yaml title="datadog.yaml" +otlp_config: + receiver: + protocols: + grpc: + endpoint: :4317 +``` + +The router must also be configured to send traces to the Datadog agent: + +```yaml title="router.yaml" +telemetry: + tracing: + otlp: + enabled: true + + # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317) + endpoint: "${env.DATADOG_AGENT_HOST}:4317" +``` + +See [Datadog Agent configuration](https://docs.datadoghq.com/opentelemetry/otlp_ingest_in_the_agent/?tab=host) for more details + +### Jaeger + +Since 1.35.0, [Jaeger supports native OTLP ingestion](https://medium.com/jaegertracing/introducing-native-support-for-opentelemetry-in-jaeger-eb661be8183c). + +Ensure that when running Jaeger in docker port 4317 is exposed and that `COLLECTOR_OTLP_ENABLED` is set to `true`: +```bash +docker run --name jaeger \ + -e COLLECTOR_OTLP_ENABLED=true \ + -p 16686:16686 \ + -p 4317:4317 \ + -p 4318:4318 \ + jaegertracing/all-in-one:1.35 +``` + +Then configure the router to send traces via OTLP: +```yaml title="router.yaml" +telemetry: + tracing: + otlp: + enabled: true + + # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317) + endpoint: "http://${env.JAEGER_HOST}:4317" +``` + ## Using Zipkin The Apollo Router can be configured to export tracing data to either the default collector address or a URL: @@ -293,5 +345,5 @@ telemetry: enabled: true # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:9411/api/v2/spans) - endpoint: http://my_zipkin_collector.dev + endpoint: "http://${env.ZIPKIN_HOST}:9411/api/v2/spans}" ```