Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle overlapping metrics from different jobs in prometheus exporter #1096

Merged
merged 5 commits into from
Jun 16, 2020
Merged

Conversation

liamawhite
Copy link
Contributor

@liamawhite liamawhite commented Jun 9, 2020

Resolves #1076.

Signed-off-by: Liam White [email protected]

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 9, 2020

CLA Check
The committers are authorized under a signed CLA.

@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #1096 into master will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1096      +/-   ##
==========================================
+ Coverage   86.43%   86.51%   +0.07%     
==========================================
  Files         198      198              
  Lines       14172    14189      +17     
==========================================
+ Hits        12250    12275      +25     
+ Misses       1469     1464       -5     
+ Partials      453      450       -3     
Impacted Files Coverage Δ
exporter/prometheusexporter/prometheus.go 92.00% <100.00%> (+17.00%) ⬆️
service/builder/exporters_builder.go 71.42% <0.00%> (+1.42%) ⬆️
translator/internaldata/resource_to_oc.go 76.47% <0.00%> (+2.94%) ⬆️
exporter/opencensusexporter/opencensus.go 49.45% <0.00%> (+4.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a71185a...56fd173. Read the comment docs.

@liamawhite
Copy link
Contributor Author

Test failure looks unrelated?

Copy link
Member

@asuresh4 asuresh4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure looks unrelated. Unfortunately, I do not have permissions to re-trigger the test. You could push an amended commit to re-trigger the test or the code owner merging the PR can re-trigger it. But changes LGTM.

@liamawhite
Copy link
Contributor Author

Did a rebase 🤞.

@tigrannajaryan tigrannajaryan changed the title Handle overlapping metrics from different jobs in prom exporter Handle overlapping metrics from different jobs in prometheus exporter Jun 11, 2020
buf.WriteString(metric.GetMetricDescriptor().GetName())
labelKeys := metric.GetMetricDescriptor().GetLabelKeys()
for _, labelKey := range labelKeys {
buf.WriteString("-" + labelKey.Key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are dashes allowed in metric names and label keys? If yes then this can result in the same metric signature for different metrics. The following 2 different metrics produce the same signature:

  1. Name: abc
    Label key: d-ef
    Signature: abc-d-ef
  2. Name: abc-d
    Label key: ef
    Signature: abc-d-ef

Copy link
Contributor Author

@liamawhite liamawhite Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its underscores only? This signature function is a copy-paste from the exporter. I guess it doesn't hurt to change it though, would you like me to do so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc seems to indicate dashes aren't allowed in metric names or labels in prometheus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrannajaryan I think we're good to merge then 🙂

@tigrannajaryan tigrannajaryan merged commit f134f50 into open-telemetry:master Jun 16, 2020
wyTrivail pushed a commit to mxiamxia/opentelemetry-collector that referenced this pull request Jul 13, 2020
…open-telemetry#1096)

This commit fixes the following bug:

**The bug**
if otel-collector/agent scrapes from two endpoints which emits metrics of the same name(even with different label), it seems to only export metrics from one of the endpoints in a random fashion.

**Steps to reproduce**
1. modified the [demo example](https://github.com/open-telemetry/opentelemetry-collector/tree/master/examples/demo) in [this commit](jhengy@933679d)
  - create a cloned `metrics-load-generator2` which emits exactly the same metrics as `metrics-load-generator` except the `source` label
2. use [an older image](jhengy@bad29d9) (from 28 May 2020) due to the problem encountered with the latest otel-collector docker image, i.e. details can be found in [this issue](open-telemetry#1075)
3. run the modified demo example
- `cd exaples/demo`
- `docker-compose up`
- `curl localhost:8889/metrics`

**What did you expect to see?**
Will see metrics from both metrics-load-generator(source=source1) and metrics-load-generator2( source=source2)

**What did you see instead?**
At anytime, only see metrics from one of the sources.
Sometimes seeing this (only observe metrics from the metrics-load-generator service):
```
# HELP promexample_opdemo_latency The various latencies of the methods
# TYPE promexample_opdemo_latency histogram
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="10"} 86
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="50"} 448
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="100"} 783
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="200"} 802
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="400"} 846
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="800"} 939
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="1000"} 973
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="1400"} 1000
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="2000"} 1007
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="5000"} 1040
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="10000"} 1088
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="15000"} 1122
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source1",le="+Inf"} 1132
promexample_opdemo_latency_sum{client="cli",label1="value1",method="repl",source="source1"} 1.2317093098059976e+06
promexample_opdemo_latency_count{client="cli",label1="value1",method="repl",source="source1"} 1132
# HELP promexample_opdemo_line_counts The counts of the lines in
# TYPE promexample_opdemo_line_counts counter
promexample_opdemo_line_counts{client="cli",label1="value1",method="repl",source="source1"} 3424
# HELP promexample_opdemo_line_lengths The lengths of the various lines in
# TYPE promexample_opdemo_line_lengths histogram
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="10"} 27
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="20"} 61
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="50"} 155
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="100"} 324
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="150"} 481
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="200"} 662
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="500"} 1669
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="800"} 2722
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source1",le="+Inf"} 3424
promexample_opdemo_line_lengths_sum{client="cli",label1="value1",method="repl",source="source1"} 1.7351559999999993e+06
promexample_opdemo_line_lengths_count{client="cli",label1="value1",method="repl",source="source1"} 3424
# HELP promexample_opdemo_process_counts The various counts
# TYPE promexample_opdemo_process_counts counter
promexample_opdemo_process_counts{client="cli",label1="value1",method="repl",source="source1"} 1132
``` 
sometimes seeing this(only see metrics from the metrics-load-generator2 service):
```
# HELP promexample_opdemo_latency The various latencies of the methods
# TYPE promexample_opdemo_latency histogram
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="10"} 100
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="50"} 526
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="100"} 937
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="200"} 960
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="400"} 1013
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="800"} 1122
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="1000"} 1171
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="1400"} 1206
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="2000"} 1214
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="5000"} 1257
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="10000"} 1308
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="15000"} 1343
promexample_opdemo_latency_bucket{client="cli",label1="value1",method="repl",source="source2",le="+Inf"} 1352
promexample_opdemo_latency_sum{client="cli",label1="value1",method="repl",source="source2"} 1.3510892105500018e+06
promexample_opdemo_latency_count{client="cli",label1="value1",method="repl",source="source2"} 1352
# HELP promexample_opdemo_line_counts The counts of the lines in
# TYPE promexample_opdemo_line_counts counter
promexample_opdemo_line_counts{client="cli",label1="value1",method="repl",source="source2"} 4113
# HELP promexample_opdemo_line_lengths The lengths of the various lines in
# TYPE promexample_opdemo_line_lengths histogram
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="10"} 38
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="20"} 92
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="50"} 211
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="100"} 419
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="150"} 626
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="200"} 814
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="500"} 2025
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="800"} 3270
promexample_opdemo_line_lengths_bucket{client="cli",label1="value1",method="repl",source="source2",le="+Inf"} 4113
promexample_opdemo_line_lengths_sum{client="cli",label1="value1",method="repl",source="source2"} 2.0698130000000026e+06
promexample_opdemo_line_lengths_count{client="cli",label1="value1",method="repl",source="source2"} 4113
# HELP promexample_opdemo_process_counts The various counts
# TYPE promexample_opdemo_process_counts counter
promexample_opdemo_process_counts{client="cli",label1="value1",method="repl",source="source2"} 1352
```
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this pull request Nov 11, 2021
…/zipkin (open-telemetry#1096)

* Bump google.golang.org/grpc in /exporters/trace/zipkin

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.31.0 to 1.31.1.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.31.0...v1.31.1)

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-fix go.sum changes in dependent modules

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this pull request Oct 9, 2024
* fix(opentelemetry-operator): allow fullnameOverride

Signed-off-by: skuethe <[email protected]>

* docs(opentelemetry-operator): bump to new Chart version

Signed-off-by: skuethe <[email protected]>

* chore(opentelemetry-operator): bump version

Signed-off-by: skuethe <[email protected]>

---------

Signed-off-by: skuethe <[email protected]>
Co-authored-by: Tyler Helmuth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants