You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to ensure OpenMetrics / Prometheus compatibility in the OpenTelemetry Collector. We have been building compatibility tests to verify the OpenMetrics spec is fully supported on both the OpenTelemetry Collector Prometheus receiver and PRW exporter as well as in Prometheus itself.
Prometheus Receiver should assign a metric staleNaN value, if the metric is missing in the current scrape but was present in the previous scrape. However, currently metric builder do not assign staleNaN values to the histogram and summary values, that are passed by the Prometheus scrape loop for failed scrapes.
Histogram and Summary count values are int64, and the casting of float64 type num-numerical values(staleNaN, normalNaN, and +-Inf) assign minInt64(-9223372036854775808) number to the count.
Currently, the validate loop is skipped for the tests, re-enable the validate loop by removing/commenting following lines from func testEndToEnd(...) (lines 1442-1445)
if true {
t.Log(`Skipping the "up" metric checks as they seem to be spuriously failing after staleness marker insertions`)
return}
Note: the test fails in getValidScrapes due to staleness, inspect the metrics and find the first failed scrape. The Histogram and Summary count values in the failed scrapes are minInt64(-9223372036854775808) instead of non-numerical values (staleNaN, normalNaN, and +-Inf)
What did you see instead?
Scraping endpoints that contains histogram/summary metric and a failed scrape in between, produces the following graph in Prometheus Web UI. The histogram/summary count value is plotted as the peak in the below graph:
However, if same data is passed directly to the Prometheus Server. Prometheus WebUI produces following graph:
Possible Solution
Since count value (int64) for histogram/summary can not be assigned float64 values, one possible solution is to use directly use OTLP format in OTLP Prometheus receiver metricbuilder, and assign datapoint flags (MetricDataPointFlagNoRecordedValue) to the metric as staleness marker. See linked issue: #6400
@PaurushGarg I agree, the prometheus receiver should not set that "NaN" value but instead should use the OTLP native no-value present for that (for all metrics not just for histograms).
@PaurushGarg I agree, the prometheus receiver should not set that "NaN" value but instead should use the OTLP native no-value present for that (for all metrics not just for histograms).
@bogdandrutu thanks. Is there a tracking issue for Prometheus Receiver to directly use OTLP format in metric builder? If not, do we need to create one?
Describe the bug
We want to ensure OpenMetrics / Prometheus compatibility in the OpenTelemetry Collector. We have been building compatibility tests to verify the OpenMetrics spec is fully supported on both the OpenTelemetry Collector Prometheus receiver and PRW exporter as well as in Prometheus itself.
Prometheus Receiver should assign a metric
staleNaN
value, if the metric is missing in the current scrape but was present in the previous scrape. However, currently metric builder do not assignstaleNaN
values to the histogram and summary values, that are passed by the Prometheus scrape loop for failed scrapes.Histogram and Summary
count
values are int64, and the casting of float64 type num-numerical values(staleNaN, normalNaN, and +-Inf) assign minInt64(-9223372036854775808) number to thecount
.Steps to reproduce
What did you see instead?
Scraping endpoints that contains histogram/summary metric and a failed scrape in between, produces the following graph in Prometheus Web UI. The histogram/summary count value is plotted as the peak in the below graph:
However, if same data is passed directly to the Prometheus Server. Prometheus WebUI produces following graph:
Possible Solution
Since
count
value (int64) for histogram/summary can not be assigned float64 values, one possible solution is to use directly use OTLP format in OTLP Prometheus receivermetricbuilder
, and assign datapoint flags (MetricDataPointFlagNoRecordedValue) to the metric as staleness marker. See linked issue: #6400What version did you use?
Collector-Contrib: v- 0.37.1
Additional context
Related to open-telemetry/prometheus-interoperability-spec#57
Linked Issue: #6400 #6000 #6087
cc @alolita @Aneurysm9
The text was updated successfully, but these errors were encountered: