Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

health_check with check_collector_pipeline enabled returns HTTP 500 error #6710

Closed
david-perez-martin opened this issue Dec 8, 2022 · 2 comments
Labels
bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues

Comments

@david-perez-martin
Copy link

Describe the bug
When I start the collector, without any data being processed, after a minute or so, the health_check endpoint is returning a 500 error.

Steps to reproduce
Execute ./otelcol-contrib with the health_check extension enabled, run a loop of curl commands and you will see the error.

In the logs nothing appears

What did you expect to see?
curl -v http://localhost:8433/health/status

  • Trying 127.0.0.1:8433...
  • TCP_NODELAY set
  • Connected to localhost (127.0.0.1) port 8433 (#0)

GET /health/status HTTP/1.1
Host: localhost:8433
User-Agent: curl/7.68.0
Accept: /

  • Mark bundle as not supporting multiuse
    < HTTP/1.1 500 Internal Server Error
    < Date: Thu, 08 Dec 2022 17:42:30 GMT
    < Content-Length: 0

What did you see instead?
A 200 response.

What version did you use?
0.61

What config did you use?
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

processors:
batch:

exporters:
logging:
logLevel: debug
jaeger:
endpoint: xxxxxxxxxx:443
tls:
cert_file:/sources/open-telemetry/otelcol-bin/mydomain.com.crt.hdp
key_file: /sources/open-telemetry/otelcol-bin/mydomain.com.key.hdp

extensions:
health_check:
endpoint: "0.0.0.0:8433"
path: "/health/status"
check_collector_pipeline:
enabled: true
interval: "5m"
exporter_failure_threshold: 5

service:
extensions: [health_check]
telemetry:
logs:
level: debug
initial_fields:
service: local-ubuntu
metrics:
level: basic
address: 0.0.0.0:8080
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]

Environment
OS: (e.g., "Ubuntu 20.04")
On my local WSL2 ubuntu, and in our k8s environment (we use the health_check as liveness probe)

Additional context
We use a jaeger exporter with TLS and certificate configuration.

@david-perez-martin david-perez-martin added the bug Something isn't working label Dec 8, 2022
@jpkrohling jpkrohling added the collector-telemetry healthchecker and other telemetry collection issues label Jan 10, 2023
@haoqixu
Copy link
Member

haoqixu commented Jan 11, 2023

This is caused by the same bug as open-telemetry/opentelemetry-collector-contrib#11780.

The internal queue exporterFailureQueue of the healthcheckextension will keep growing in the fist interval and be emptied by rotate after interval.

@jpkrohling
Copy link
Member

Closing as it's a duplicate of the issue @haoqixu linked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues
Projects
None yet
Development

No branches or pull requests

3 participants