Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector internal telemetry updates #4867

Merged
merged 8 commits into from
Sep 7, 2024

Conversation

danelson
Copy link
Contributor

@danelson danelson commented Jul 23, 2024

  • Internal collector detailed metrics were missing the otelcol_ prefix.
  • log_records were missing from the critical monitoring sections (I assume this is not by design)

@danelson danelson requested a review from a team July 23, 2024 16:56
@svrnm svrnm requested review from a team and mx-psi and removed request for a team July 23, 2024 17:19
@danelson
Copy link
Contributor Author

danelson commented Jul 23, 2024

I just came across open-telemetry/opentelemetry-collector#9315 and open-telemetry/opentelemetry-collector#9759 so now I am not sure this is correct. Perhaps it is because I am consuming the metrics on localhost:8888/metrics. Anyway, hopefully a maintainer can clarify.

@mx-psi mx-psi requested a review from codeboten July 24, 2024 08:40
mx-psi
mx-psi previously approved these changes Jul 24, 2024
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danelson, it's a bit confusing, but these metrics listed in the table will not be prefixed because they're emitted by instrumentation libraries rather than by collector components

Comment on lines 242 to 261
| `otelcol_http_client_active_requests` | Number of active HTTP client requests. | Counter |
| `otelcol_http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram |
| `otelcol_http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter |
| `otelcol_http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram |
| `otelcol_http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram |
| `otelcol_http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram |
| `otelcol_http_server_active_requests` | Number of active HTTP server requests. | Counter |
| `otelcol_http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram |
| `otelcol_http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram |
| `otelcol_http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram |
| `otelcol_rpc_client_duration` | Measures the duration of outbound RPC. | Histogram |
| `otelcol_rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram |
| `otelcol_rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `otelcol_rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram |
| `otelcol_rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `otelcol_rpc_server_duration` | Measures the duration of inbound RPC. | Histogram |
| `otelcol_rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram |
| `otelcol_rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `otelcol_rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram |
| `otelcol_rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics are generated by the underlying instrumentation library, not by collector components. This means that the prefix will not be present here

Copy link
Contributor Author

@danelson danelson Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codeboten are you saying to revert this? Also can you clarify how this affects consuming the data when scraping the metrics with the prometheus receiver? I am running 0.104.0 with the below config and I see metrics with names like otelcol_http_server_response_size (note it is not response_body_size either)
image

OTel config
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
  prometheus/collector:
    config:
      scrape_configs:
        - job_name: "internal"
          scrape_interval: 10s
          static_configs:
            - targets:
                - "localhost:8888"

processors:
  filter/collector:
    error_mode: ignore
    metrics:
      include:
        match_type: regexp
        metric_names:
          - .*http_server.*

exporters:
  debug:
    verbosity: detailed

service:
  telemetry:
    metrics:
      level: detailed
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [debug]
    metrics:
      receivers: [prometheus/collector]
      processors: [filter/collector]
      exporters: [debug]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, until v0.106.1, the prometheus exporter configured inside the collector was setting a prometheus namespace of otelcol to prefix all metrics exported via prometheus. This was inconsistent with the metrics exported via other exporters (OTLP, console). This was addressed by prefixing all collector component generated metrics manually with otelcol_ to provide a consistent metric name for all exporters. Note that when I'm using the term exporters here, I mean the exporters configured inside the Collector for the use of the OTel Go SDK.

This means that all metrics generated by instrumentation libraries will match the names that these instrumentation libraries intended as per the example below, where http_server_response_size used to be prefixed by otelcol_ and will now look like this:

http_server_response_size{http_method="POST",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="4318",net_protocol_name="http",net_protocol_version="1.1",service_instance_id="aa3d8988-fdf1-4023-8fff-193877983817",service_name="otelcontribcol",service_version="0.106.1-dev"} 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. The disconnect I was having was between collector generated and instrumentation library generated metrics.

@mx-psi mx-psi dismissed their stale review July 24, 2024 15:37

See Alex's comment

Copy link
Contributor

@tiffany76 tiffany76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small changes, along with Alex's note. Thanks!

content/en/docs/collector/internal-telemetry.md Outdated Show resolved Hide resolved
content/en/docs/collector/internal-telemetry.md Outdated Show resolved Hide resolved
@svrnm
Copy link
Member

svrnm commented Aug 1, 2024

@danelson please take a look at the requested changes, thanks!

Co-authored-by: Tiffany Hrabusa <[email protected]>
@opentelemetrybot opentelemetrybot requested review from a team and djaglowski and removed request for a team August 1, 2024 15:47
Co-authored-by: Tiffany Hrabusa <[email protected]>
@opentelemetrybot opentelemetrybot requested a review from a team August 1, 2024 15:48
@danelson danelson requested review from codeboten and tiffany76 August 1, 2024 20:48
@danelson
Copy link
Contributor Author

danelson commented Aug 1, 2024

I think this should be good now. Thank you for the feedback.

@tiffany76
Copy link
Contributor

Thanks for the changes, @danelson!

Also, I created #4933 to make sure we clarify the change in naming conventions with v0.106.1.

@cartermp
Copy link
Contributor

cartermp commented Aug 3, 2024

@codeboten PTAL!

@cartermp cartermp added the sig-approval-missing Co-owning SIG didn't provide an approval label Aug 3, 2024
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danelson!

@opentelemetrybot opentelemetrybot requested a review from a team September 5, 2024 18:54
@theletterf
Copy link
Member

/fix:format

@opentelemetrybot
Copy link
Collaborator

@opentelemetrybot
Copy link
Collaborator

fix:format was successful.

IMPORTANT: (RE-)RUN /fix:all to ensure that there are no remaining check issues.

@tiffany76
Copy link
Contributor

/fix:all

@opentelemetrybot
Copy link
Collaborator

@opentelemetrybot
Copy link
Collaborator

fix:all was successful.

IMPORTANT: (RE-)RUN /fix:all to ensure that there are no remaining check issues.

@cartermp cartermp added this pull request to the merge queue Sep 7, 2024
Merged via the queue into open-telemetry:main with commit e4f6838 Sep 7, 2024
17 checks passed
michael2893 pushed a commit to michael2893/opentelemetry.io that referenced this pull request Sep 8, 2024
Co-authored-by: Tiffany Hrabusa <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
Co-authored-by: opentelemetrybot <[email protected]>
Co-authored-by: Phillip Carter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig:collector sig-approval-missing Co-owning SIG didn't provide an approval
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants