-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure matcher values are present when parsing matchers from strings #2968
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fail? Or just it should match the empty value?
I pondered a bit on that as that was my initial approach, but it felt like allowing it might have an unwanted side effect (and we fail on a lot of cases) - I have no good reference points and couldn't find any in the code. That being said, I'm happy to rely on your expertise and experience for this one - do you think we should fall back to |
Empty labels are valid in Prometheus and alertmanager. cc @beorn7 WDYT here? |
Sorry, I could have been clearer that's what I meant with my comment e.g. you made a mistake while typing and silenced an alert with empty value label. |
I think it should be possible to create a silence for an empty label, which is equivalent to an non-existing label, or in other words, it gives you a straightforward way of silencing all alerts that do not have that label (with a non-empty value). If we forbid Maybe we could allow |
yeah, "" will be dropped by shell for most users anyway. I suggest we consider it as an empty value. |
not feeling strongly on this one but as the parser is already quite liberal (example), I would ok with |
I thought that's what I was doing with this PR, but after a close second look at the problem (not just the solution), I realise what's currently going on. If you look closely at the command executed as part of #2936 it's: ./amtool --alertmanager.url=http://127.0.0.1:9093 alert add foo= Which I believe it's a mistake. It doesn't include the alertname as the first argument - I think this should have resulted in an error so that the user is aware. Alertname must be presentNow, for this case, I propose the following: Allow the first argument to be used as a whole Case #1: ✅
Case #2: ✅
Case #3: ❌
Case #4: ❌
Case #5: ❌ ./amtool --alertmanager.url=http://127.0.0.1:9093 foo=AlwaysFiring alertname=NeverFiring
error: unable to determine alertname Empty value labelsNow, this brings us to the second question... The consensus I take from this discussion seems to be that: Case #1: ✅
Case #2: ✅
Josh's opinionPersonally, I would prefer to error on If we agree with the above, I'm happy to take it for another spin. |
Why would alertname need to be present? I can be relabeled away by Prometheus alert_relabel_config .. |
After Julien's comment, I took another holistic look and these are my findings:
To top the above, it seems like users have also reported confusion on different behaviour between silence matchers vs alert matchers with #2610. This is because the "filters" params on that endpoints treats it like any other matcher, however, silences when set have additional validation on them. I'm struggling to come up with suggestions on how to best proceed forward that a) don't incur a breaking change and b) solve the problem for both adding alerts and creating silences - any context to share here? Concretely, I'm struggling with:
tl;dr: Does it feel like we're at a point where a silence matcher has different semantics from a label matcher even if they have the same implementation (even if theoretically they're to match just key/value pairs)? I can see the appeal of separating the matching logic and documenting/validating the semantics of them concretely so that the CLIs, APIs and UI benefit from a consistent implementation. We don't have to turn this into a long discussion async, I'll add this to the WG so that we can discuss it. |
WRT the WRT at least one matcher that doesn't match the empty string: I'm not sure what the actual reason for this is, but I assume it's an efficiency concern. Any matcher matching the empty string essentially requires the equivalent of a "full table scan". If there is at least one matcher not matching the empty string, at least that "full table scan" is limited to whatever that one "proper" matcher returns rather than all alerts. To further speculate, I assume that the filter parameter on the API endpoint is more liberal because it's a user initiated action. If it is too expensive, well, then it times out eventually, but that's it. However, silences have to be checked continuously. If that's too expensive, the AM is permanently overloaded. (Again, this is all educated guessing. My knowledge of the AM internals is limited.) I do think that the preview should implement the same restrictions as the silence itself (cf. #2610). (The reason for the different behavior can be explained by my speculation above: One-time user initiated action vs. permanently running queries. Still, of course, we should do the same evaluation for the former in this case.) WRT allowing empty labels for adding alerts: While an empty label is the same as a non-existing label, it is still valid and has well defined semantics throughout the Prometheus ecosystem. It would be really confusing if the only place where you ban this kind of redundancy would be the API for adding alerts. In summary, I don't think that the matchers have different semantics for silences vs. alerts. I would even say that what we have is all rooted in the intention to keep the semantics the same. The one thing that sticks out (at least one matcher not matching the empty string) isn't really semantics of the matchers themselves but a higher level query requirement (which exists in the same way in Prometheus itself). |
895b1b0
to
6dd17d8
Compare
After recently experiencing a panic in one of our largest environments, I realised this is not worth blocking on some sort of theoretical consistency across silences and alerts APIs. The general comments have a tendency on As an example, in the case of the reported issue:
instead of a panic. I think we should move forward and merge this as this problem has already been reported several times on multiple fronts. |
Fixes #2936 Signed-off-by: gotjosh <[email protected]>
Signed-off-by: gotjosh <[email protected]>
6dd17d8
to
bd89550
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #2936
Signed-off-by: gotjosh [email protected]