-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
500 error when scraping metrics from otel-collector pod when loadbalancing exporter is used #30477
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Some things I tried:
|
I think this is more likely to be something with the Prometheus receiver/exporter than with the load balancing, given that this seems to be about the component's own metrics, rather than the load balanced telemetry. |
Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Getting the same error with this loadbalancer configuration
|
Linking a couple of similar sounding issues here: |
Based on comments in #30697 I could reproduce this:
I attached pod logs showing what happens, if they are of any help: Edit: I tried with 0.91.0, and couldn't reproduce. Pod logs from that version: Edit 2: Using 0.92.0 but disabling Edit 3: Managed to reproduce the issue with the DNS resolver as well. |
From the collector config, it doesn't look like you are actually using the prometheus receiver or prometheus exporter in a pipeline? |
No, I'm not. This can be reproduced easily without those, just by using the loadbalancing exporter. |
Pinging code owners for exporter/loadbalancing: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Trying to gather my thoughts a bit here, so please forgive me if you find this messy.
|
@juissi-t , some information to your questions above
|
With more debugging, the code from Otel-collector repo, query_sender.go recordWithOtel introduced in release @dmitryax, would you be able to shed some light?
|
Feels like it's a duplicate of #16826 . |
@jpkrohling , could you provide more insight on the connection of this to #16826 ? |
This issue also happens with
Note, you can't turn this off with |
@jpkrohling @open-telemetry/collector-contrib-maintainer can you assign this issue to @Juliaj to investigate and file a PR. Thanks! |
@juissi-t, would you be able to test the repro with the current commits from this repository in your environment ? I synced the recent commits from Otel collector Git repository and this repository to build an image. I am not able to repro with the steps above. Just wondering whether you could help verify. |
Yes, I can deploy the image to our development environment to check. Please let me know where I can get the image from. Edit: Managed to build the image myself. Can't reproduce the issue anymore. |
@juissi-t can you confirm that this issue is not reproducible w/ v0.94.0 that was just released yesterday? |
@codeboten , @juissi-t , I verified that w/v0.94.0, this issue was not reproducible in our setup. |
I'm closing, feel free to reopen if this is still an issue. |
Component(s)
exporter/loadbalancing
What happened?
Description
I enabled loadbalancing exporter on our collector pods. After a while (~1 hour), Prometheus fails to scrape metrics from the pods which have the exporter configured. Below is an error message from one pod.
Steps to Reproduce
Expected Result
Actual Result
Collector version
0.92.0
Environment information
Environment
OS: EKS 1.26 Bottlerocket
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: