Ever increasing timeseries count #904
Replies: 7 comments 5 replies
-
You are not violating any Prometheus rules by doing this, in fact there are entire projects such as mtail and grok_exporter that create metrics from logs. My guess is that there is some sort of condition where there is a high cardinality label that comes from |
Beta Was this translation helpful? Give feedback.
-
thanks for taking a stab at this. I guessed so but couldn't find any shenanigans from the If I look at the number of series in the head block, it keeps increasing. I was expecting it to increase until prometheus "sees" all the metrics and stabilise around a particular value (only to churn if there are changes in tags (version, etc...)) but it keeps on increasing |
Beta Was this translation helpful? Give feedback.
-
@csmarchbanks I had one more question. Some of the tags I noticed are not high cardinality in say the last 2 days but have a high churn .. so over a month they might have ~1000 unique values. I feel that will cause the prometheus active series count to go up steadily. Is it possible to make prometheus forget series that it has not seen in the last X hours. I am using prometheus in agent mode to only remote write to Grafana. I am using
|
Beta Was this translation helpful? Give feedback.
-
Hi @csmarchbanks
Does this sound okay ? Would appreciate if you enlighten me with a better approach. |
Beta Was this translation helpful? Give feedback.
-
ran this locally and POC looks good. Will try this out in the actual cluster. @csmarchbanks |
Beta Was this translation helpful? Give feedback.
-
Hi @csmarchbanks I have been running the system with Soln 1 (Custom Registry and delete+create periodically) and the system has been performing well removing stale metrics and freeing up memory I am planning to give soln2 also a try since that would solve the problem w/o resetting the active time series. I see the custom collector classes here also allows to set the timestamp when adding a metric which can come super handly. I would just need compare this against the current time and delete the samples which are super old. Still need to figure out how to access/delete the sample. But it can look something like this
Does this look okay'ish to you ? |
Beta Was this translation helpful? Give feedback.
-
thanks @csmarchbanks. I did end up using the regular Counter/Histogram class and a separate dict per metric with key="tuple of label values" and value=last_seen_timestamp + a cron to iterate over the said dict and doing a metric.remove(label_value_tuple). Seems to be working and this approach seems better to me since it doesn't reset ALL the timeseries in the system. I think I will go ahead with this. Thanks for your help here. |
Beta Was this translation helpful? Give feedback.
-
I am using this prometheus Python client for logs to Prometheus metrics conversion and I see that number of timeseries keeps on monotonically increasing.
My code roughly looks like this:
Am I violating any prometheus rule by keeping a local mapping of metrics and dynamically creating them by parsing a log line ?
Beta Was this translation helpful? Give feedback.
All reactions