-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus parser incorrectly classifies counters as histograms #39705
Comments
Thanks @gpop63 for looking into the issue. Engaging code owners, @elastic/obs-cloud-monitoring , for validation. |
Hey @agithomas, thanks for the heads up. We are sunsetting elastic/obs-cloud-monitoring, so the reference handle for our team is now @elastic/obs-ds-hosted-services. |
I have worked a bit with how to parse metrics. I believe this should be the PR that made the latest big change there: #36669. I think this is happening because we expect the metrics to be like this:
And according to the metrics you put in the description, it is like this:
I believe it should not be a very hard change. Right now, we have unit tests for all possibilities, so we should create some for multiple metrics like you specified to test your change. |
@gpop63 This has been backported to 8.14. |
As we are not planning to release a new 8.13 version, I'm not sure it would be useful. |
@pierrehilbert The customer who met this issue and raised https://github.com/elastic/sdh-beats/issues/4742 is planning the upgrade. |
Backporting to 8.13 won't imply having a new release for 8.13: as I mentioned in previous comment, we are not planning to release a new 8.13 version. |
Thanks @pierrehilbert for the explanation. |
Overview
While attempting to ingest nginx stream metrics using the
prometheus
module, we've noticed that certain metrics such asnginx_sts_upstream_connects_total
andnginx_sts_server_connects_total
are not being ingested.This issue is not present in metricbeat version
7.17.x
where all metrics are ingested correctly. However, in versions8.7
and later, these metrics are being skipped.The issue seems to be rooted in the text parser logic in
metricbeat/helper/prometheus/textparse.go
, specifically within theParseMetricFamilies
function. When processing a group of metrics, the metric type variable is set to the latest metric type in the group, even if their types might differ.After the loop completes for a group of metrics,
mt
is set to the last metric type in the group, which in this case ishistogram
. This is problematic because for examplenginx_sts_server_connects_total
is actually acounter
. This causes an error later in the code when we validate if the metric name contains a suffix specific to a histogram - the check fails and these metrics are skipped.How to replicate
(We had a recent issue with nginx, that's why the setup - it could be replicated way easier)
Prerequisites:
nginx
nginx-module-sts
nginx-module-stream-sts
nginx-module-vts
(not sure if this one is needed)Make sure to use the below
nginx.conf
which will be in/usr/local/nginx/conf/nginx.conf
before startingnginx
.nginx.conf
Then start
nginx
and make some dummy requests.Metricbeat Prometheus config
Possible solution
Did a quick test to see if it works and it does but not sure what the impact would be. Needs more testing.
cc @agithomas
The text was updated successfully, but these errors were encountered: