-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of conflicting metric values #63
Comments
guys, any news here? |
What behavior would you expect here? We can't register both conflicting metrics and export them. Just logging the error and continuing would result in silently ignoring all conflicting metrics. |
I hit this problem as a report that “Prometheus is down” when some developers were rolling out a new build with different help text. Exiting just meant that it was in a loop with Upstart restarting the service so I'm not sure that was substantially better than an ERROR log message. |
@acdha So you would prefer writing out log lines and ignoring all metrics with the new signature until someone manually restarts the statsd_exporter in that case? |
I guess it's a judgement call whether you think it's better to be disruptive, forcing people to actually notice the problem, or to allow other applications to continue sending stats without interruption. |
In my case running an instance shared by several teams, it would have been preferable if only the project which changed their stats experienced a gap but there is an argument that it'd also be acceptable to simply say “monitor process flapping better”. |
I guess I wouldn't share a statsd exporter between teams for such reasons. We generally tend to prefer failing hard and early, as everything else usually makes debugging very difficult. That's why I'm a bit hesitant to implement a solution which will silently ignore metrics. In general, I'd recommend to use direct instrumentation with our client libraries instead of relying on the statsd_exporter so much. |
Yeah, in this case it was a shared instance among developers working on the same project - the person who was working on an update was trying to figure out why he was getting error messages from the statsd client when it terminated prematurely. |
Given we ignore a lot of metrics already in statsd_exporter, I'd be personally fine accepting such a pull request. Changing the behavior would require to change the signature of Lines 72 to 87 in 663b6a1
log.Errorf instead of log.Fatalf and continue with the next metric.
|
Also, it would be good to have an exporter internal error counter so you can monitor for the problem. |
Guys, probably my question is not related statsd_exporter, but when we aligned metric labels not to crash statsd_exporter, how to restart Prometheus to recalculate metrics? |
I just opened a PR to fix this (#72). Although this fixes the immediate issue of "exporter dies" it doesn't solve the longer-term issue of "you have to restart the exporter". The most common case that this would be an issue is the following: an app exists and is emitting metrics. A new release of the app goes out which adds/removes some tags-- at this point those metrics are "broken" until the exporter is restarted. In this situation the "old" metircs are no longer being emitted, and as such we could remove them (given some TTL). Because of this I was thinking of adding in a feature to basically TTL out metrics that haven't been emitted for a while if there is a new metric being emitted. Alternatively there could be some API call to "unregister" a metric-- but that seems fairly clunky (and not very "statsd-esque". I figured I'd float the idea here first -- as it is related to this larger issue. If one of those (or some other option) is wanted I'll open a separate PR for the feature-- so we can get this fix for the crashing in quicker. |
Thanks a lot @jacksontj. I merged your contribution. For the discussion of how to handle such conflicting metrics in general, I'd recommend to write to prometheus-developers@. |
Just for linkage (if anyone is interested) here is the thread on the developer list -- https://groups.google.com/forum/#!topic/prometheus-developers/Q2pRR-UlHI0 |
This patch simply moves the error message from a log.Fatalf() to a log.Errorf() to continue on. Fixes prometheus#63
I closed #74 because it had gone stale, but I am going to reopen this issue to track the underlying reasoning. |
x-ref: more discussion in #120 |
The StatsD -> graphite pipeline supports multiple metric types on the same name nicely (by type namespacing support within the graphite backend). Further there are things out there which send e.g. counters and timers on the same name. e.g.:
So what do you think about the idea to extend the mapper with a type match such that we can map different typed metrics of the same name to different prometheus metric names? |
@tobiashenkel that would be the ideal solution, I guess. @matthiasr would you like that solution? Any hints how to implement it? |
Sounds reasonable. How would you represent this in the configuration? Can
you mock up a mapping config snippet?
|
What about something like this?
|
Our just the option to specify the type at the end of the match attribute
`match: test.timimg.*|c`?
…On Fri, Feb 23, 2018, 09:37 Tobias Henkel ***@***.***> wrote:
What about something like this?
mappings:
- match: test.timing.*.*.*
match_metric_type: counter|gauge|timer
name: "my_timer"
labels:
provider: "$2"
outcome: "$3"
job: "${1}_server"
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#63 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAANaJAmQZ6UAiqXE_aze4pRafiKFreyks5tXnjIgaJpZM4MXDBn>
.
|
Both would work, one is nearer to the actual metric line, the other more comprehensive in the config file. I'd be fine with either way. |
I like the explicit match_metric_type better. It requires less knowledge
about the details of the protocol – someone using a client library has
probably never seen `|c` before.
I would be happy to accept a PR for this!
… |
I don't know if it worths creating a new issue, so I'll try here first.
Any suggestion how to handle such cases? |
Ideally this should be three new issues 😂
|
|
We had some developers on a new project starting to integrate statsd support in their application.
statsd_exporter
0.3.0 is crashing constantly with this error message:If I'm reading that correctly, this due to them having two versions of the app sending inconsistent metric values but while an error in this situation seems reasonable it actually causes the
statsd_exporter
process to crash.The text was updated successfully, but these errors were encountered: