Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prometheus Receiver] Incorrect start_timestamp of Summary and Histogram metrics after a failed scrape #6360

Closed
PaurushGarg opened this issue Nov 17, 2021 · 3 comments
Assignees
Labels
comp:prometheus Prometheus related issues

Comments

@PaurushGarg
Copy link
Member

PaurushGarg commented Nov 17, 2021

Describe the bug

We want to ensure OpenMetrics / Prometheus compatibility in the OpenTelemetry Collector. We have been building compatibility tests to verify the OpenMetrics spec is fully supported on both the OpenTelemetry Collector Prometheus receiver and PRW exporter as well as in Prometheus itself.

In order to verify that the Prometheus receiver is functioning as expected for Prometheus to OTLP data transformations, issue #6000 and #6151 create test to validate Prometheus core metrics. Through this test we found that after a failed scrape, start_timestamp of histogram and summary metrics are not same as start_timestamp of the first scrape. start_timestamp are used by OTLP format to set timestamp when a metric collection system is started.

Steps to reproduce

  • Run func TestEndToEnd(t *testing.T)

  • Currently, the validate loop is skipped for the tests, re-enable the validate loop by removing/commenting following lines from func testEndToEnd(...) (lines 1442-1445 )

if true {
   t.Log(`Skipping the "up" metric checks as they seem to be spuriously failing after staleness marker insertions`)
   return}
  • Note: the test fails in getValidScrapes due to staleness, inspect the metrics to look at the timestamps

What did you expect to see?

Referring to the table below, expected to see the start_timestamps of Histogram and Summary of Target1Scrape2 same as Target1Scrape1, irrespective of the failed scrape in between Target1Scrape2 and Target1Scrape1.

What did you see instead?

Because of the failed scrape between Target1Page1 and Target1Page2, the start_timestamp of Target1Scrape1 andTarget1Scrape2 are different for histogram and summary metrics.

Table: start_timestamps observed:

  timestamp:{seconds, nanos} timestamp:{seconds, nanos} timestamp:{seconds, nanos} timestamp:{seconds, nanos}  
  Target1 Scrape1 Target1 Failed Scrape Target1 Scrape2 Target1 Failed Scrape 4 nos of: 5 default metrics with up 0
Gauge series start_Timestamp Point Only Point Only Point Only Point Only
Counter series start_Timestamp 1634681667, 427000000 1634681667, 427000000 1634681667, 427000000 1634681667, 427000000
Histogram series start_Timestamp 1634681667, 427000000   1634681667, 427000000 1634681670, 431000000
Summary series start_Timestamp 1634681667, 427000000 1634681668, 432000000 1634681668, 432000000 1634681670, 431000000
           
Points timestamp 1634681667, 427000000 1634681668, 432000000 1634681669, 431000000 1634681670, 431000000

Possible Solution

Modify existing adjustMetricTimeseries logic to ensure:

  • start_timestamp are reset if the current scrape has lower value than previous scrape, despite the presence of failed scrape in between.
  • start_timestamp are not reset if the current scrape has higher value than previous scrape, despite the presence of failed scrape in between.

Complete solution will also be dependent on the resolution of the issue: #6400.

What version did you use?

Collector Contrib: v- 0.37.1

Additional context
Related to: open-telemetry/prometheus-interoperability-spec#57
Issue: #6000 #6400
cc: @alolita @Aneurysm9

@PaurushGarg
Copy link
Member Author

@alolita please assign this issue to me. I would like to work on this one.

@alolita alolita added the comp:prometheus Prometheus related issues label Nov 17, 2021
@PaurushGarg PaurushGarg changed the title [Prometheus reciever] Incorrect start_timestamp of Summary and Histogram metrics after a failed scrape [Prometheus receiver] Incorrect start_timestamp of Summary and Histogram metrics after a failed scrape Nov 17, 2021
@PaurushGarg PaurushGarg changed the title [Prometheus receiver] Incorrect start_timestamp of Summary and Histogram metrics after a failed scrape [Prometheus Receiver] Incorrect start_timestamp of Summary and Histogram metrics after a failed scrape Nov 18, 2021
@gouthamve
Copy link
Member

@PaurushGarg Can this be closed as it looks like #6696 fixes it?

@PaurushGarg
Copy link
Member Author

Yes. This issue has been fixed and can be closed now. cc @Aneurysm9

povilasv referenced this issue in coralogix/opentelemetry-collector-contrib Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:prometheus Prometheus related issues
Projects
None yet
Development

No branches or pull requests

3 participants