Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otelcol_exporter_enqueue_failed_spans available in 0.87.0, but not 0.88 #9456

Closed
kriskalish opened this issue Feb 2, 2024 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@kriskalish
Copy link

Describe the bug
When using the sample config, I don't see the metric otelcol_exporter_enqueue_failed_spans exported to the logs in version 0.88.0, but I do see it in 0.87.0.

Steps to reproduce

  1. Run the sample config in version 0.87.0. Wait 10+ seconds for the metrics to scrape and log. Exit. Copy paste the output into a text editor and search for "enqueue_failed". See that it appears.

Sample from logs:

Metric #5
Descriptor:
     -> Name: otelcol_exporter_enqueue_failed_spans
     -> Description: Number of spans failed to be added to the sending queue.
     -> Unit: 
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> exporter: Str(logging)
     -> service_instance_id: Str(9338ca73-219e-4206-861d-dacc7266367b)
     -> service_name: Str(otelcol-contrib)
     -> service_version: Str(0.87.0)
StartTimestamp: 2024-02-02 20:41:24.026 +0000 UTC
Timestamp: 2024-02-02 20:41:24.026 +0000 UTC
Value: 0.000000
  1. Run the sample config in version 0.88.0. Wait 10+ seconds for the metrics to scrape and log. Exit. Copy paste the output into a text editor and search for "enqueue_failed". See that it does not appear.

What did you expect to see?
I expected the metric to continue coming through, even after upgrading.

What did you see instead?
The metric is missing in the log statements.

What version did you use?
otelcol-contrib_0.88.0_darwin_arm64 and otelcol-contrib_0.87.0_darwin_arm64

What config did you use?

receivers:
  otlp:
    protocols:
      http:

  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

processors:
  batch/traces:

exporters:
  logging:
    verbosity: detailed

  otlphttp/test:
    compression: none
    endpoint: 'https://webhook.site/2840fe0b-08fe-4997-bf5a-7edb12008259'
    sending_queue:
      queue_size: 5


service:
  pipelines:

    traces/all:
      receivers: [otlp]
      processors: [batch/traces]
      exporters: [otlphttp/test]

    metrics/self:
      receivers: [prometheus]
      processors: []
      exporters: [logging]

Environment
OS: MacOS Sonoma 14.2.1

Additional context

I found a handful of potentially related posts, but wasn't really able to piece together a cohesive story of how they fit in with this behavior:

This issue seems to report the opposite of what I'm observing: #8673

This reply on an issue (#7454 (comment)) seems to suggest that there is a way to enable a feature flag to get internal metrics from the collector in otel format, but I couldn't find any documentation on how to set it up.

@kriskalish kriskalish added the bug Something isn't working label Feb 2, 2024
@dmitryax
Copy link
Member

dmitryax commented Feb 5, 2024

Hey @kriskalish. I believe the metric is now only reported if it goes above 0. Can you try to ensure any requests get rejected due to queue overflow?

@kriskalish
Copy link
Author

I did spend some time trying to do that with a mock HTTP endpoint that always returns a 429 status code and a small queue size, but I wasn't actually able to get the metric to go non-zero on either version. Let me take another attempt at that.

@kriskalish
Copy link
Author

I see what happened on Friday after repeating my test today.

0.87.0 never captures a non-zero value for otelcol_exporter_enqueue_failed_spans. See the logs that are attached. I (incorrectly) assumed this meant 0.88.0 would not capture the metric.

➜ otelcol-contrib_0.87.0_darwin_arm64_logs.txt

However, I tried the same test set up for 0.88.0 and it does capture the metric. The logs are attached below for reference.

➜ otelcol-contrib_0.88.0_darwin_arm64_logs.txt

Given that 0.87.0 is water under the bridge at this point, I will close the bug. Thanks @dmitryax.

Here was the configuration I used to perform the test:

receivers:
  otlp:
    protocols:
      http:

  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

processors:
  batch/traces:

exporters:
  logging:
    verbosity: detailed

  otlphttp/test:
    compression: none
    endpoint: 'https://webhook.site/0a2f447d-a0e1-4a9d-82bd-971320c762c6'
    sending_queue:
      enabled: true
      queue_size: 1
      num_consumers: 1
    retry_on_failure:
      enabled: true
      initial_interval: 60s
      randomization_factor: 0.7
      multiplier: 1.3
      max_interval: 120s
      max_elapsed_time: 10m

service:
  pipelines:

    traces/all:
      receivers: [otlp]
      processors: [batch/traces]
      exporters: [otlphttp/test]

    metrics/self:
      receivers: [prometheus]
      processors: []
      exporters: [logging]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants