How to reduce the memory usage of prometheus golang client? #920
-
After I removed the prometheus golang client and stopped collecting metrics, I found that the service memory usage dropped from 600MB to 300MB. Is there a better cache indicator expiration strategy? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
There is nothing really expiring in the instrumentation library. Any metric is intentionally exposed over the lifetime of the binary. If you see the instrumentation code taking a lot of memory, perhaps you have created metric vectors with a very high cardinality. That would be the first thing to look for. If you have elements in the metric vector that only get an update once or twice and then just sit around and using up memory, you might actually have more an event logging use case. |
Beta Was this translation helpful? Give feedback.
-
Is there any strong / technical reason to do that, or could it be revised? I'm asking, because I still see valid reasons to have an expiry mechanism. There can be relatively high cardinality metrics with ephemeral label values that end up being obsolete after some time, I can think about pods in kubernetes for instance, when metrics are exposed by something that centralizes data rather than the pod itself. If I'm correct, there is the option to use a custom collector and constant metrics to avoid filling the internal metrics registry. However there are some issues with that approach:
So, what about another approach that would be introducing an (optional) expiry ttl associated with metrics+labels ? (a metrics with a given label sets would be deleted from the registry if it isn't updated after X seconds). Is it something we can discuss? |
Beta Was this translation helpful? Give feedback.
-
The strong technical reason is that disappearing metrics are creating complicated situation when PromQL acts on them. Complex PromQL expressions are already overwhelming for many. Adding even more obstacles by letting metrics disappear on a regular basis is really not a good idea. Having said that, there are use cases, and as you hinted at yourself, that's mostly if you instrument a binary that mirrors metrics of something else. Especially if it mirrors multiple "somethings" and then one of them disappears, it is fair game to let those metrics disappear. Those mirroring use cases are usually done with Having said that (2nd order "having said that"), there are a few use cases (niche use case within a class of niche use cases) where you register collectors that you don't want to collect anymore. For those, there is the Having said all of that (3rd order), I think all of these metric disappearances should happen in a deliberate, defined way. Letting metrics disappear after a TTL has the smell of later disaster. However, you are free to implement it on top of the primitives in this library. I just don't think we should offer a first class feature for an anti-pattern within a niche use case within a class of niche use cases. |
Beta Was this translation helpful? Give feedback.
There is nothing really expiring in the instrumentation library. Any metric is intentionally exposed over the lifetime of the binary. If you see the instrumentation code taking a lot of memory, perhaps you have created metric vectors with a very high cardinality. That would be the first thing to look for. If you have elements in the metric vector that only get an update once or twice and then just sit around and using up memory, you might actually have more an event logging use case.