Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector internal telemetry updates #4867

Merged
merged 8 commits into from
Sep 7, 2024
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions content/en/docs/collector/internal-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,8 @@ own telemetry.

#### Data loss

Use the rate of `otelcol_processor_dropped_spans > 0` and
Use the rate of `otelcol_processor_dropped_log_records > 0`,
`otelcol_processor_dropped_spans > 0`, and
`otelcol_processor_dropped_metric_points > 0` to detect data loss. Depending on
your project's requirements, select a narrow time window before alerting begins
to avoid notifications for small losses that are within the desired reliability
Expand Down Expand Up @@ -317,19 +318,21 @@ logs for messages such as `Dropping data because sending_queue is full`.

#### Receive failures

Sustained rates of `otelcol_receiver_refused_spans` and
`otelcol_receiver_refused_metric_points` indicate that too many errors were
returned to clients. Depending on the deployment and the clients' resilience,
this might indicate clients' data loss.
Sustained rates of `otelcol_receiver_refused_log_records`,
`otelcol_receiver_refused_spans`, and `otelcol_receiver_refused_metric_points`
indicate that too many errors were returned to clients. Depending on the
deployment and the clients' resilience, this might indicate clients' data loss.

Sustained rates of `otelcol_exporter_send_failed_spans` and
Sustained rates of `otelcol_exporter_send_failed_log_records`,
`otelcol_exporter_send_failed_spans`, and
`otelcol_exporter_send_failed_metric_points` indicate that the Collector is not
able to export data as expected. These metrics do not inherently imply data loss
since there could be retries. But a high rate of failures could indicate issues
with the network or backend receiving the data.

#### Data flow

You can monitor data ingress with the `otelcol_receiver_accepted_spans` and
`otelcol_receiver_accepted_metric_points` metrics and data egress with the
`otelcol_exporter_sent_spans` and `otelcol_exporter_sent_metric_points` metrics.
You can monitor data ingress with the `otelcol_receiver_accepted_log_records`,
`otelcol_receiver_accepted_spans`, and `otelcol_receiver_accepted_metric_points`
metrics and data egress with the `otelcol_exporter_sent_log_records`,
`otelcol_exporter_sent_spans`, and `otelcol_exporter_sent_metric_points` metrics.
Loading