From a308a89cdefd33604ac86a10a358263dfdfaec35 Mon Sep 17 00:00:00 2001 From: Dan Nelson Date: Tue, 23 Jul 2024 11:52:33 -0500 Subject: [PATCH 1/6] Update names of collector detailed metrics --- .../en/docs/collector/internal-telemetry.md | 44 +++++++++---------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md index b86e113405f2..ade0a634e692 100644 --- a/content/en/docs/collector/internal-telemetry.md +++ b/content/en/docs/collector/internal-telemetry.md @@ -237,28 +237,28 @@ categorized by instrumentation type. #### Additional `detailed`-level metrics -| Metric name | Description | Type | -| --------------------------------- | ----------------------------------------------------------------------------------------- | --------- | -| `http_client_active_requests` | Number of active HTTP client requests. | Counter | -| `http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram | -| `http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter | -| `http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram | -| `http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram | -| `http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram | -| `http_server_active_requests` | Number of active HTTP server requests. | Counter | -| `http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram | -| `http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram | -| `http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram | -| `rpc_client_duration` | Measures the duration of outbound RPC. | Histogram | -| `rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | -| `rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | -| `rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | -| `rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | -| `rpc_server_duration` | Measures the duration of inbound RPC. | Histogram | -| `rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | -| `rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | -| `rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | -| `rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| Metric name | Description | Type | +| ----------------------------------------- | ----------------------------------------------------------------------------------------- | --------- | +| `otelcol_http_client_active_requests` | Number of active HTTP client requests. | Counter | +| `otelcol_http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram | +| `otelcol_http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter | +| `otelcol_http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram | +| `otelcol_http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram | +| `otelcol_http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram | +| `otelcol_http_server_active_requests` | Number of active HTTP server requests. | Counter | +| `otelcol_http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram | +| `otelcol_http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram | +| `otelcol_http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram | +| `otelcol_rpc_client_duration` | Measures the duration of outbound RPC. | Histogram | +| `otelcol_rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | +| `otelcol_rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| `otelcol_rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | +| `otelcol_rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| `otelcol_rpc_server_duration` | Measures the duration of inbound RPC. | Histogram | +| `otelcol_rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | +| `otelcol_rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| `otelcol_rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | +| `otelcol_rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | ### Events observable with internal logs From c0b09bc1cadd69570bcacd96bfa5c0770605e0df Mon Sep 17 00:00:00 2001 From: Dan Nelson Date: Tue, 23 Jul 2024 11:53:11 -0500 Subject: [PATCH 2/6] Include log_records in critical monitoring --- .../en/docs/collector/internal-telemetry.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md index ade0a634e692..639fc1fd0faa 100644 --- a/content/en/docs/collector/internal-telemetry.md +++ b/content/en/docs/collector/internal-telemetry.md @@ -283,7 +283,8 @@ own telemetry. #### Data loss -Use the rate of `otelcol_processor_dropped_spans > 0` and +Use the rate of `otelcol_processor_dropped_log_records > 0`, +`otelcol_processor_dropped_spans > 0`, and `otelcol_processor_dropped_metric_points > 0` to detect data loss. Depending on your project's requirements, select a narrow time window before alerting begins to avoid notifications for small losses that are within the desired reliability @@ -317,12 +318,13 @@ logs for messages such as `Dropping data because sending_queue is full`. #### Receive failures -Sustained rates of `otelcol_receiver_refused_spans` and -`otelcol_receiver_refused_metric_points` indicate that too many errors were -returned to clients. Depending on the deployment and the clients' resilience, -this might indicate clients' data loss. +Sustained rates of `otelcol_receiver_refused_log_records`, +`otelcol_receiver_refused_spans`, and `otelcol_receiver_refused_metric_points` +indicate that too many errors were returned to clients. Depending on the +deployment and the clients' resilience, this might indicate clients' data loss. -Sustained rates of `otelcol_exporter_send_failed_spans` and +Sustained rates of `otelcol_exporter_send_failed_log_records`, +`otelcol_exporter_send_failed_spans` and `otelcol_exporter_send_failed_metric_points` indicate that the Collector is not able to export data as expected. These metrics do not inherently imply data loss since there could be retries. But a high rate of failures could indicate issues @@ -330,6 +332,7 @@ with the network or backend receiving the data. #### Data flow -You can monitor data ingress with the `otelcol_receiver_accepted_spans` and -`otelcol_receiver_accepted_metric_points` metrics and data egress with the +You can monitor data ingress with the `otelcol_receiver_accepted_log_records`, +`otelcol_receiver_accepted_spans`, and `otelcol_receiver_accepted_metric_points` +metrics and data egress with the `otelcol_exporter_sent_log_records`, `otelcol_exporter_sent_spans` and `otelcol_exporter_sent_metric_points` metrics. From 4bd946ca481a225505f57258355c0a3b824340c2 Mon Sep 17 00:00:00 2001 From: Dan Nelson <55757989+danelson@users.noreply.github.com> Date: Thu, 1 Aug 2024 10:47:10 -0500 Subject: [PATCH 3/6] Oxford comma Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- content/en/docs/collector/internal-telemetry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md index 639fc1fd0faa..4a2aef546852 100644 --- a/content/en/docs/collector/internal-telemetry.md +++ b/content/en/docs/collector/internal-telemetry.md @@ -335,4 +335,4 @@ with the network or backend receiving the data. You can monitor data ingress with the `otelcol_receiver_accepted_log_records`, `otelcol_receiver_accepted_spans`, and `otelcol_receiver_accepted_metric_points` metrics and data egress with the `otelcol_exporter_sent_log_records`, -`otelcol_exporter_sent_spans` and `otelcol_exporter_sent_metric_points` metrics. +`otelcol_exporter_sent_spans`, and `otelcol_exporter_sent_metric_points` metrics. From bab9b276bff0e263fa9ac990ad5f0602ff90961d Mon Sep 17 00:00:00 2001 From: Dan Nelson <55757989+danelson@users.noreply.github.com> Date: Thu, 1 Aug 2024 10:47:52 -0500 Subject: [PATCH 4/6] Oxford comma Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- content/en/docs/collector/internal-telemetry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md index 4a2aef546852..edc6069ebab5 100644 --- a/content/en/docs/collector/internal-telemetry.md +++ b/content/en/docs/collector/internal-telemetry.md @@ -324,7 +324,7 @@ indicate that too many errors were returned to clients. Depending on the deployment and the clients' resilience, this might indicate clients' data loss. Sustained rates of `otelcol_exporter_send_failed_log_records`, -`otelcol_exporter_send_failed_spans` and +`otelcol_exporter_send_failed_spans`, and `otelcol_exporter_send_failed_metric_points` indicate that the Collector is not able to export data as expected. These metrics do not inherently imply data loss since there could be retries. But a high rate of failures could indicate issues From 92eff9cb58424dbf8948176b960ac4adfd31ab0b Mon Sep 17 00:00:00 2001 From: Dan Nelson Date: Thu, 1 Aug 2024 15:47:08 -0500 Subject: [PATCH 5/6] Revert prefix for instrumentation library metrics --- .../en/docs/collector/internal-telemetry.md | 44 +++++++++---------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md index edc6069ebab5..6d23e36d731d 100644 --- a/content/en/docs/collector/internal-telemetry.md +++ b/content/en/docs/collector/internal-telemetry.md @@ -237,28 +237,28 @@ categorized by instrumentation type. #### Additional `detailed`-level metrics -| Metric name | Description | Type | -| ----------------------------------------- | ----------------------------------------------------------------------------------------- | --------- | -| `otelcol_http_client_active_requests` | Number of active HTTP client requests. | Counter | -| `otelcol_http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram | -| `otelcol_http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter | -| `otelcol_http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram | -| `otelcol_http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram | -| `otelcol_http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram | -| `otelcol_http_server_active_requests` | Number of active HTTP server requests. | Counter | -| `otelcol_http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram | -| `otelcol_http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram | -| `otelcol_http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram | -| `otelcol_rpc_client_duration` | Measures the duration of outbound RPC. | Histogram | -| `otelcol_rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | -| `otelcol_rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | -| `otelcol_rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | -| `otelcol_rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | -| `otelcol_rpc_server_duration` | Measures the duration of inbound RPC. | Histogram | -| `otelcol_rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | -| `otelcol_rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | -| `otelcol_rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | -| `otelcol_rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| Metric name | Description | Type | +| --------------------------------- | ----------------------------------------------------------------------------------------- | --------- | +| `http_client_active_requests` | Number of active HTTP client requests. | Counter | +| `http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram | +| `http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter | +| `http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram | +| `http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram | +| `http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram | +| `http_server_active_requests` | Number of active HTTP server requests. | Counter | +| `http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram | +| `http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram | +| `http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram | +| `rpc_client_duration` | Measures the duration of outbound RPC. | Histogram | +| `rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | +| `rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| `rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | +| `rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| `rpc_server_duration` | Measures the duration of inbound RPC. | Histogram | +| `rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | +| `rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | +| `rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | +| `rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | ### Events observable with internal logs From 25316e8667065457a0411df49e963d29fedfbe71 Mon Sep 17 00:00:00 2001 From: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com> Date: Thu, 5 Sep 2024 19:00:56 +0000 Subject: [PATCH 6/6] Results from /fix:format --- content/en/docs/collector/internal-telemetry.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/en/docs/collector/internal-telemetry.md b/content/en/docs/collector/internal-telemetry.md index 964c31520df2..d25cad225132 100644 --- a/content/en/docs/collector/internal-telemetry.md +++ b/content/en/docs/collector/internal-telemetry.md @@ -335,4 +335,5 @@ with the network or backend receiving the data. You can monitor data ingress with the `otelcol_receiver_accepted_log_records`, `otelcol_receiver_accepted_spans`, and `otelcol_receiver_accepted_metric_points` metrics and data egress with the `otelcol_exporter_sent_log_records`, -`otelcol_exporter_sent_spans`, and `otelcol_exporter_sent_metric_points` metrics. +`otelcol_exporter_sent_spans`, and `otelcol_exporter_sent_metric_points` +metrics.