-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metricsregistry #120
Metricsregistry #120
Conversation
For the generic statsd case we have some issues with the restrictions placed on us by the default golang client registry (discussion: https://groups.google.com/forum/#!topic/prometheus-developers/Q2pRR-UlHI0). This commit implements a simplistic metrics registry that puts no restrictions on the registration or deregistration of metrics (since they are all done programatically).
Before all the exporter metrics where mixed with the statsd metrics. This changes that to have the exporter metrics sent to /metrics (by default) and the statsd metrics sent to /statsd (by default) Fixes prometheus#80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think having a full-on registry is a very heavy-handed and low-level approach. I'm not sure it's a price I want to pay for supporting this case, which can also be solved by coupling the application and exporter lifecycles.
If we do want to do it, I'd rather keep track of things separately and then generate ConstMetrics like in many other exporters. However, I don't know how we could do that with summaries and histograms.
If the client library were to allow modifying an existing metric to extend the label set, we could do that (after #119) when we observe a wider label set for the first time, and pad narrower label sets on observation. @beorn7 is this something you would be willing to support in general? You mentioned at some point that you wanted to support some solution for this in the library, what form would that take? It would prevent having partial copies of the library code in multiple projects. @beorn7 let's discuss options in the coming week?
close(metricChan) | ||
}() | ||
for _, metric := range r.metricsByID { | ||
go func(collector prometheus.Collector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out in #76 (comment) this causes a huge spike in memory consumption because it spawns a goroutine for every single time series.
prometheus/client_golang#370 is a potential fix for that.
If we end up needing to continue with the registry we can either optimize
or implement a more naive one. I'd be more than happy to not "build our
own" but the response last time this was discussed upstream was basically
"try it yourself and of we like it we might merge it".
…On Jan 28, 2018 4:54 AM, "Matthias Rampke" ***@***.***> wrote:
***@***.**** commented on this pull request.
I still think having a full-on registry is a very heavy-handed and
low-level approach. I'm not sure it's a price I want to pay for supporting
this case, which can also be solved by coupling the application and
exporter lifecycles.
If we do want to do it, I'd rather keep track of things separately and
then generate ConstMetrics like in many other exporters. However, I don't
know how we could do that with summaries and histograms.
If the client library were to allow modifying an existing metric to extend
the label set, we could do that (after #119
<#119>) when we observe
a wider label set for the first time, and pad narrower label sets on
observation. @beorn7 <https://github.com/beorn7> is this something you
would be willing to support in general? You mentioned at some point that
you wanted to support *some* solution for this in the library, what form
would that take? It would prevent having partial copies of the library code
in multiple projects. @beorn7 <https://github.com/beorn7> let's discuss
options in the coming week?
------------------------------
In registry.go
<#120 (comment)>
:
> + wg sync.WaitGroup
+ errs MultiError // The collected errors to return in the end.
+ )
+
+ r.mtx.RLock()
+ metricFamiliesByName := make(map[string]*dto.MetricFamily)
+
+ // Scatter.
+ // (Collectors could be complex and slow, so we call them all at once.)
+ wg.Add(len(r.metricsByID))
+ go func() {
+ wg.Wait()
+ close(metricChan)
+ }()
+ for _, metric := range r.metricsByID {
+ go func(collector prometheus.Collector) {
As pointed out in #76 (comment)
<#76 (comment)>
this causes a huge spike in memory consumption because it spawns a
goroutine for every single time series.
prometheus/client_golang#370
<prometheus/client_golang#370> is a potential fix
for that.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#120 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABi3nQe7vn9UlkYeTfvrM0e1h_Y7q6ADks5tPG32gaJpZM4Rvd_N>
.
|
Re-implementing the registry to create inconsistent metrics sounds like a really bad idea, to be honest. |
So we just discussed this separately – the linked issues don't quite cut it for the statsd exporter. They work well with custom collectors that generate ConstMetrics, but this exporter doesn't. It could somewhat sensibly for counters and gauges, but we'd have to recreate the quantile calculation from scratch for summaries, and the bucketing for histograms. Another idea that we've kicked around is tracking the labels for a given metric name (this would be easy on top of #119), and when they change unregister the metric and register a completely new one. That's not allowed at the moment anyway, but could be without breaking the semantics too much. On top of that, we could fill out with empty labels as needed anyway. This would mean that at the transition, we would throw away all previous observations (unless we dig into the protobuf to rescue them and copy them over into the new metrics somehow.) There's no really pretty solution yet … |
More thoughts: The "unregister" hack could be implemented more cleanly (and without changing the semantics of WRT throwing away the old observations: With summaries, you want a decay time anyway (by default 10m), so assuming label changes are rare, that's not a big deal. For histogram, throwing away the old observations is essentially a counter reset and will be dealt with by Prometheus, i.e. you'll only lose about half a scrape interval worth of even counts. I'd say let's go with the "throw away on label change" approach, i.e. whenever a previously unaccounted label shows up, recreate all affected metrics with a zero count in a new registry, register all unaffected metrics with the new registry, rewire the HTTP handle to the new registry and throw away the old one. Does that make sense? |
Could there be any strange side effects from not resetting all metrics at
the same time? Say, for request totals and errors?
…On Mon, Jan 29, 2018, 18:50 Björn Rabenstein ***@***.***> wrote:
More thoughts:
The "unregister" hack could be implemented more cleanly (and without
changing the semantics of Unregister) by using a local (not the global)
registry and simply register all metrics (the newly labeled and the old
ones that haven't changed) with the new registry. Then throw the old HTTP
handler and the old registry away and wire up the new registry.
WRT throwing away the old observations: With summaries, you want a decay
time anyway (by default 10m), so assuming label changes are rare, that's
not a big deal. For histogram, throwing away the old observations is
essentially a counter reset and will be dealt with by Prometheus, i.e.
you'll only lose about half a scrape interval worth of even counts.
I'd say let's go with the "throw away on label change" approach, i.e.
whenever a previously unaccounted label shows up, recreate all affected
metrics with a zero count in a new registry, register all unaffected
metrics with the new registry, rewire the HTTP handle to the new registry
and throw away the old one.
Does that make sense?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#120 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAICBqImJIMJ66F-xmFc3PlOv7mxTJWOks5tPgTAgaJpZM4Rvd_N>
.
|
Technically no, but I can imagine that causing artifacts. On the other hand applications should not be changing labels at runtime. |
The statsd exporter is not, by itself, an application – I would still recommend coupling its lifecycle to that of the application but I also recognize that that is not making it a drop-in replacement for statsd (where a central deployment is much more common). Now, with the schema @beorn7 proposed, we would re-register and reset the metric that changed, but we would not have a mechanism to do that to all metrics exposed by one application at the same time (because we don't know what these all are). I'm not saying it's a bad idea, just writing down thoughts about the limitations. Whatever we choose to do, I'd want to document the different choices and tradeoffs really well. We could even make the metrics-reregistering an optional feature (and drop the event otherwise). |
To quote from my Prometheus proverb collection: “First do |
@jacksontj is this (wiping the whole metric and recreating with the new label set) a solution that would fit your needs? |
How would that work out during a deployment, where some instances send the new label set and others still the old one? If I understand the proposal correctly, the exporter would constantly throw away the existing metric and create a new one. |
I wouldn't do this for shrinking label sets, those I would just pad out indefinitely. So it would only happen once per metric name, the first time an event with the wider label set is seen. Renaming a label would result in a "wider" label set that includes both. |
@matthiasr that would mean that the metrics sent from clients to this exporter would have labels added with empty values. IMO thats not workable, as that means I can no longer query based on the existence of label keys -- since they are being mutated before storage. the intention of this exporter is to be (as much as possible) a drop-in-replacment for a statsd server/aggregator/relay/etc. With that in mind we should avoid mutating the metrics as much as possible-- we want to store exactly what we where sent. |
Fundamentally, that's an impossible task. The data models are not the same so we can't store exactly what we get sent – only Graphite can. The Prometheus server does not distinguish between empty and non-present labels, so you can't query based on label key presence anyway – the query language does not allow that. The only way to test for "label is not present" is to match |
I'm going to close this – the concrete changeset is very old, and I believe with prometheus/client_golang#425 we can solve this without a whole new registry. The concrete problem in #63 was fixed from another angle. |
Rebase of #74, fixes #63
TLDR; statsd exporter currently doesn't allow for duplicate/conflicting tag-sets -- which is a problem for the statsd exporter. The exporter interface has no such restriction, its simply an issue with the go prometheus client. This can be fixed by implementing our own registry-- which is this PR.
Apologies for the long delay, other things came up as higher priority. So here is a rebased version against current master to start the conversation from.
As far as the outstanding comments from the previous PR:
Please update the help string for both flags to make the difference very very clear.
-- Updated with the most recent commit on the branchWhat happens when these are exceeded? What happens for someone with very many metrics?
-- this is a copy/pasted hard-coded value from the prometheus client_golang when the channel hits capacity it will block collection until it has been consumed (effectively a memory limiter).that PR is closed. what's the future here?
-- We'll have to maintain this registry long-term. If we'd prefer I can pull the registry into a separate package (directory in the same repo).cc: @matthiasr