How to reduce the memory usage of prometheus golang client？ #920

yuyue0225sc · 2021-10-08T03:55:03Z

yuyue0225sc
Oct 8, 2021

After I removed the prometheus golang client and stopped collecting metrics, I found that the service memory usage dropped from 600MB to 300MB. Is there a better cache indicator expiration strategy?

Answered by beorn7

Oct 8, 2021

There is nothing really expiring in the instrumentation library. Any metric is intentionally exposed over the lifetime of the binary. If you see the instrumentation code taking a lot of memory, perhaps you have created metric vectors with a very high cardinality. That would be the first thing to look for. If you have elements in the metric vector that only get an update once or twice and then just sit around and using up memory, you might actually have more an event logging use case.

View full answer

beorn7 · 2021-10-08T14:53:19Z

beorn7
Oct 8, 2021
Maintainer

There is nothing really expiring in the instrumentation library. Any metric is intentionally exposed over the lifetime of the binary. If you see the instrumentation code taking a lot of memory, perhaps you have created metric vectors with a very high cardinality. That would be the first thing to look for. If you have elements in the metric vector that only get an update once or twice and then just sit around and using up memory, you might actually have more an event logging use case.

0 replies

jotak · 2022-07-28T09:56:36Z

jotak
Jul 28, 2022

Any metric is intentionally exposed over the lifetime of the binary.

Is there any strong / technical reason to do that, or could it be revised?

I'm asking, because I still see valid reasons to have an expiry mechanism. There can be relatively high cardinality metrics with ephemeral label values that end up being obsolete after some time, I can think about pods in kubernetes for instance, when metrics are exposed by something that centralizes data rather than the pod itself.

If I'm correct, there is the option to use a custom collector and constant metrics to avoid filling the internal metrics registry. However there are some issues with that approach:

having to maintain all metrics states on the user side, instead of relying on the prom client's metrics registry
cost of allocation during scrape
I think there is a somewhat related discussion here: Serving “pre-rendered” rarely changing metrics #917

So, what about another approach that would be introducing an (optional) expiry ttl associated with metrics+labels ? (a metrics with a given label sets would be deleted from the registry if it isn't updated after X seconds). Is it something we can discuss?

0 replies

beorn7 · 2022-08-10T11:44:53Z

beorn7
Aug 10, 2022
Maintainer

The strong technical reason is that disappearing metrics are creating complicated situation when PromQL acts on them. Complex PromQL expressions are already overwhelming for many. Adding even more obstacles by letting metrics disappear on a regular basis is really not a good idea.

Having said that, there are use cases, and as you hinted at yourself, that's mostly if you instrument a binary that mirrors metrics of something else. Especially if it mirrors multiple "somethings" and then one of them disappears, it is fair game to let those metrics disappear. Those mirroring use cases are usually done with NewConstMetrics and friends, not with the higher-level primitives in this library (which are meant for direct instrumentation, where you should really really avoid letting metrics disappear). With NewConstMetrics and friends, you naturally won't collect metrics that don't exist anymore.

Having said that (2nd order "having said that"), there are a few use cases (niche use case within a class of niche use cases) where you register collectors that you don't want to collect anymore. For those, there is the Unregister method in the Registerer interface.

Having said all of that (3rd order), I think all of these metric disappearances should happen in a deliberate, defined way. Letting metrics disappear after a TTL has the smell of later disaster. However, you are free to implement it on top of the primitives in this library. I just don't think we should offer a first class feature for an anti-pattern within a niche use case within a class of niche use cases.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reduce the memory usage of prometheus golang client？ #920

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to reduce the memory usage of prometheus golang client？ #920

yuyue0225sc Oct 8, 2021

Replies: 3 comments

beorn7 Oct 8, 2021 Maintainer

jotak Jul 28, 2022

beorn7 Aug 10, 2022 Maintainer

yuyue0225sc
Oct 8, 2021

beorn7
Oct 8, 2021
Maintainer

jotak
Jul 28, 2022

beorn7
Aug 10, 2022
Maintainer