Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Exporter: Problem when scraping metrics from multiple targets that emits metrics of the same name and labelkeys #1076

Closed
jhengy opened this issue Jun 4, 2020 · 5 comments · Fixed by #1096
Labels
bug Something isn't working

Comments

@jhengy
Copy link

jhengy commented Jun 4, 2020

Describe the bug
if otel-collector/agent scrapes from two endpoints which emits metrics of the same name(even with different label), it seems to only export metrics from one of the endpoints in a random fashion.

Steps to reproduce

  1. modified the demo example in this commit
  • create a cloned metrics-load-generator2 which emits exactly the same metrics as metrics-load-generator except the source label
  1. use an older image (from 28 May 2020) due to the problem encountered with the latest otel-collector docker image, i.e. details can be found in this issue
  2. run the modified demo example
  • cd exaples/demo
  • docker-compose up
  • curl localhost:8889/metrics

What did you expect to see?
Will see metrics from both metrics-load-generator(source=source1) and metrics-load-generator2( source=source2)

What did you see instead?
At anytime, only see metrics from one of the sources.
Sometimes seeing this (only observe metrics from the metrics-load-generator service):

# HELP promexample_opdemo_latency The various latencies of the methods
# TYPE promexample_opdemo_latency histogram
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="10"} 86
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="50"} 448
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="100"} 783
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="200"} 802
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="400"} 846
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="800"} 939
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="1000"} 973
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="1400"} 1000
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="2000"} 1007
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="5000"} 1040
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="10000"} 1088
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="15000"} 1122
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="+Inf"} 1132
promexample_opdemo_latency_sum{client="cli",label1="value1",method="repl",source="source1"} 1.2317093098059976e+06
promexample_opdemo_latency_count{client="cli",label1="value1",method="repl",source="source1"} 1132
# HELP promexample_opdemo_line_counts The counts of the lines in
# TYPE promexample_opdemo_line_counts counter
promexample_opdemo_line_counts{client="cli",label1="value1",method="repl",source="source1"} 3424
# HELP promexample_opdemo_line_lengths The lengths of the various lines in
# TYPE promexample_opdemo_line_lengths histogram
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="10"} 27
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="20"} 61
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="50"} 155
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="100"} 324
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="150"} 481
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="200"} 662
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="500"} 1669
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="800"} 2722
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="+Inf"} 3424
promexample_opdemo_line_lengths_sum{client="cli",label1="value1",method="repl",source="source1"} 1.7351559999999993e+06
promexample_opdemo_line_lengths_count{client="cli",label1="value1",method="repl",source="source1"} 3424
# HELP promexample_opdemo_process_counts The various counts
# TYPE promexample_opdemo_process_counts counter
promexample_opdemo_process_counts{client="cli",label1="value1",method="repl",source="source1"} 1132

sometimes seeing this(only see metrics from the metrics-load-generator2 service):

# HELP promexample_opdemo_latency The various latencies of the methods
# TYPE promexample_opdemo_latency histogram
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="10"} 100
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="50"} 526
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="100"} 937
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="200"} 960
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="400"} 1013
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="800"} 1122
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="1000"} 1171
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="1400"} 1206
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="2000"} 1214
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="5000"} 1257
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="10000"} 1308
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="15000"} 1343
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="+Inf"} 1352
promexample_opdemo_latency_sum{client="cli",label1="value1",method="repl",source="source2"} 1.3510892105500018e+06
promexample_opdemo_latency_count{client="cli",label1="value1",method="repl",source="source2"} 1352
# HELP promexample_opdemo_line_counts The counts of the lines in
# TYPE promexample_opdemo_line_counts counter
promexample_opdemo_line_counts{client="cli",label1="value1",method="repl",source="source2"} 4113
# HELP promexample_opdemo_line_lengths The lengths of the various lines in
# TYPE promexample_opdemo_line_lengths histogram
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="10"} 38
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="20"} 92
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="50"} 211
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="100"} 419
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="150"} 626
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="200"} 814
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="500"} 2025
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="800"} 3270
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="+Inf"} 4113
promexample_opdemo_line_lengths_sum{client="cli",label1="value1",method="repl",source="source2"} 2.0698130000000026e+06
promexample_opdemo_line_lengths_count{client="cli",label1="value1",method="repl",source="source2"} 4113
# HELP promexample_opdemo_process_counts The various counts
# TYPE promexample_opdemo_process_counts counter
promexample_opdemo_process_counts{client="cli",label1="value1",method="repl",source="source2"} 1352

What version did you use?
Version: otel/opentelemetry-collector-dev:ffd2a1eb98980702fd642d8a9e4c270934e82e67

What config did you use?
Config: (e.g. the yaml config file)

Environment
OS: Ubuntu 18.04

@jhengy jhengy added the bug Something isn't working label Jun 4, 2020
@jhengy jhengy changed the title Problem when scraping metrics from multiple targets which emits equally named metrics Problem when scraping metrics from multiple targets that expose equally named metrics Jun 4, 2020
@liamawhite
Copy link
Contributor

liamawhite commented Jun 8, 2020

I have a branch with a repro here. It switches between showing source1 and source2 labelled metrics when running curl localhost:8889/metrics. See gist.

This must happen somewhere after the receiver as @jhengy was seeing this issue using the Prometheus receiver and the demo uses the OC receiver.

This seems like a pretty critical bug?

@liamawhite
Copy link
Contributor

Speculative fix orijtech/prometheus-go-metrics-exporter#10

@jhengy jhengy changed the title Problem when scraping metrics from multiple targets that expose equally named metrics Problem when scraping metrics from multiple targets that emits metrics of the same name and labelkeys Jun 9, 2020
@jhengy
Copy link
Author

jhengy commented Jun 9, 2020

It seems like a possible workaround for this situation is to add distinct dummy labels to metrics emitted by different targets to differentiate between them. After I made this commit, we see that otel-collector can export identically named metrics from both generators.

curl localhost:8889/metrics:

# HELP promexample_opdemo_latency The various latencies of the methods
# TYPE promexample_opdemo_latency histogram
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="10"} 46
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="50"} 209
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="100"} 374
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="200"} 377
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="400"} 397
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="800"} 429
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="1000"} 442
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="1400"} 449
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="2000"} 451
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="5000"} 458
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="10000"} 477
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="15000"} 486
promexample_opdemo_latency_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="+Inf"} 488
promexample_opdemo_latency_sum{client="cli",generator1="generator1",label1="value1",method="repl",source="source1"} 368460.11517499975
promexample_opdemo_latency_count{client="cli",generator1="generator1",label1="value1",method="repl",source="source1"} 488
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="10"} 23
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="50"} 182
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="100"} 333
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="200"} 342
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="400"} 354
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="800"} 393
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="1000"} 404
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="1400"} 416
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="2000"} 418
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="5000"} 425
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="10000"} 437
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="15000"} 452
promexample_opdemo_latency_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="+Inf"} 452
promexample_opdemo_latency_sum{client="cli",generator2="generator2",label1="value1",method="repl",source="source2"} 371420.13484
promexample_opdemo_latency_count{client="cli",generator2="generator2",label1="value1",method="repl",source="source2"} 452
# HELP promexample_opdemo_line_counts The counts of the lines in
# TYPE promexample_opdemo_line_counts counter
promexample_opdemo_line_counts{client="cli",generator1="generator1",label1="value1",method="repl",source="source1"} 1351
promexample_opdemo_line_counts{client="cli",generator2="generator2",label1="value1",method="repl",source="source2"} 1418
# HELP promexample_opdemo_line_lengths The lengths of the various lines in
# TYPE promexample_opdemo_line_lengths histogram
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="10"} 16
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="20"} 32
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="50"} 68
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="100"} 140
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="150"} 205
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="200"} 265
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="500"} 672
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="800"} 1071
promexample_opdemo_line_lengths_bucket{client="cli",generator1="generator1",label1="value1",method="repl",source="source1",le="+Inf"} 1351
promexample_opdemo_line_lengths_sum{client="cli",generator1="generator1",label1="value1",method="repl",source="source1"} 680982.0000000001
promexample_opdemo_line_lengths_count{client="cli",generator1="generator1",label1="value1",method="repl",source="source1"} 1351
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="10"} 11
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="20"} 21
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="50"} 65
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="100"} 137
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="150"} 206
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="200"} 266
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="500"} 699
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="800"} 1138
promexample_opdemo_line_lengths_bucket{client="cli",generator2="generator2",label1="value1",method="repl",source="source2",le="+Inf"} 1418
promexample_opdemo_line_lengths_sum{client="cli",generator2="generator2",label1="value1",method="repl",source="source2"} 712477.0000000003
promexample_opdemo_line_lengths_count{client="cli",generator2="generator2",label1="value1",method="repl",source="source2"} 1418
# HELP promexample_opdemo_process_counts The various counts
# TYPE promexample_opdemo_process_counts counter
promexample_opdemo_process_counts{client="cli",generator1="generator1",label1="value1",method="repl",source="source1"} 488
promexample_opdemo_process_counts{client="cli",generator2="generator2",label1="value1",method="repl",source="source2"} 452

@arpitjindal97
Copy link

This issue is still not resolved and is reproducible with latest version.

@nilebox nilebox changed the title Problem when scraping metrics from multiple targets that emits metrics of the same name and labelkeys Prometheus Exporter: Problem when scraping metrics from multiple targets that emits metrics of the same name and labelkeys Sep 22, 2020
@nilebox
Copy link
Member

nilebox commented Sep 22, 2020

This issue is still not resolved and is reproducible with latest version.

@arpitjindal97 the issue in the Prometheus Exporter was fixed by #1096.

hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
…y#1076)

Bumps [boto3](https://github.com/boto/boto3) from 1.20.26 to 1.20.28.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.20.26...1.20.28)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants