health_check with check_collector_pipeline enabled returns HTTP 500 error #6710

david-perez-martin · 2022-12-08T18:25:57Z

Describe the bug
When I start the collector, without any data being processed, after a minute or so, the health_check endpoint is returning a 500 error.

Steps to reproduce
Execute ./otelcol-contrib with the health_check extension enabled, run a loop of curl commands and you will see the error.

In the logs nothing appears

What did you expect to see?
curl -v http://localhost:8433/health/status

Trying 127.0.0.1:8433...
TCP_NODELAY set
Connected to localhost (127.0.0.1) port 8433 (#0)

GET /health/status HTTP/1.1
Host: localhost:8433
User-Agent: curl/7.68.0
Accept: /

Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Date: Thu, 08 Dec 2022 17:42:30 GMT
< Content-Length: 0

What did you see instead?
A 200 response.

What version did you use?
0.61

What config did you use?
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

processors:
batch:

exporters:
logging:
logLevel: debug
jaeger:
endpoint: xxxxxxxxxx:443
tls:
cert_file:/sources/open-telemetry/otelcol-bin/mydomain.com.crt.hdp
key_file: /sources/open-telemetry/otelcol-bin/mydomain.com.key.hdp

extensions:
health_check:
endpoint: "0.0.0.0:8433"
path: "/health/status"
check_collector_pipeline:
enabled: true
interval: "5m"
exporter_failure_threshold: 5

service:
extensions: [health_check]
telemetry:
logs:
level: debug
initial_fields:
service: local-ubuntu
metrics:
level: basic
address: 0.0.0.0:8080
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]

Environment
OS: (e.g., "Ubuntu 20.04")
On my local WSL2 ubuntu, and in our k8s environment (we use the health_check as liveness probe)

Additional context
We use a jaeger exporter with TLS and certificate configuration.

haoqixu · 2023-01-11T06:51:27Z

This is caused by the same bug as open-telemetry/opentelemetry-collector-contrib#11780.

The internal queue exporterFailureQueue of the healthcheckextension will keep growing in the fist interval and be emptied by rotate after interval.

jpkrohling · 2023-01-12T17:31:18Z

Closing as it's a duplicate of the issue @haoqixu linked.

david-perez-martin added the bug Something isn't working label Dec 8, 2022

jpkrohling added the collector-telemetry healthchecker and other telemetry collection issues label Jan 10, 2023

jpkrohling closed this as completed Jan 12, 2023

haoqixu mentioned this issue Nov 7, 2023

REQUEST: New membership for @haoqixu open-telemetry/community#1781

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

health_check with check_collector_pipeline enabled returns HTTP 500 error #6710

health_check with check_collector_pipeline enabled returns HTTP 500 error #6710

david-perez-martin commented Dec 8, 2022

haoqixu commented Jan 11, 2023 •

edited

Loading

jpkrohling commented Jan 12, 2023

health_check with check_collector_pipeline enabled returns HTTP 500 error #6710

health_check with check_collector_pipeline enabled returns HTTP 500 error #6710

Comments

david-perez-martin commented Dec 8, 2022

haoqixu commented Jan 11, 2023 • edited Loading

jpkrohling commented Jan 12, 2023

haoqixu commented Jan 11, 2023 •

edited

Loading