Ensure matcher values are present when parsing matchers from strings #2968

gotjosh · 2022-06-28T10:51:40Z

Signed-off-by: gotjosh [email protected]

roidelapluie

Should we fail? Or just it should match the empty value?

gotjosh · 2022-06-28T11:26:26Z

Should we fail? Or just it should match the empty value?

I pondered a bit on that as that was my initial approach, but it felt like allowing it might have an unwanted side effect (and we fail on a lot of cases) - I have no good reference points and couldn't find any in the code.

That being said, I'm happy to rely on your expertise and experience for this one - do you think we should fall back to ""?

roidelapluie · 2022-06-28T11:28:34Z

Empty labels are valid in Prometheus and alertmanager.

cc @beorn7 WDYT here?

gotjosh · 2022-06-28T11:30:47Z

Empty labels are valid in Prometheus and alertmanager.

Sorry, I could have been clearer that's what I meant with my comment e.g. you made a mistake while typing and silenced an alert with empty value label.

beorn7 · 2022-06-28T12:19:52Z

I think it should be possible to create a silence for an empty label, which is equivalent to an non-existing label, or in other words, it gives you a straightforward way of silencing all alerts that do not have that label (with a non-empty value).

If we forbid foo="", it becomes kind of tedious to formulate that, e.g. foo!~".+".

Maybe we could allow foo="" and error on foo=? But this feels a bit inconsistent.

roidelapluie · 2022-06-28T12:25:42Z

yeah, "" will be dropped by shell for most users anyway. I suggest we consider it as an empty value.

simonpasquier · 2022-06-28T12:28:54Z

Should we fail? Or just it should match the empty value?

not feeling strongly on this one but as the parser is already quite liberal (example), I would ok with foo= being equivalent to foo="".

gotjosh · 2022-06-28T18:09:37Z

Maybe we could allow foo="" and error on foo=? But this feels a bit inconsistent.

I thought that's what I was doing with this PR, but after a close second look at the problem (not just the solution), I realise what's currently going on.

If you look closely at the command executed as part of #2936 it's:

./amtool --alertmanager.url=http://127.0.0.1:9093 alert add foo=

Which I believe it's a mistake. It doesn't include the alertname as the first argument - I think this should have resulted in an error so that the user is aware.

Alertname must be present

Now, for this case, I propose the following: Allow the first argument to be used as a whole string if it doesn't contain the model.AlertNameLabel as part of the string or we're unable to parse the label throw an error.

Case #1: ✅

./amtool --alertmanager.url=http://127.0.0.1:9093 alertname=AlwaysFiring
# results in alertname=AlwaysFiring

Case #2: ✅

./amtool --alertmanager.url=http://127.0.0.1:9093 AlwaysFiring
# results in alertname=AlwaysFiring

Case #3: ❌

./amtool --alertmanager.url=http://127.0.0.1:9093 alertname=
error: unable to determine alertname

Case #4: ❌

./amtool --alertmanager.url=http://127.0.0.1:9093 foo=
error: unable to determine alertname

Case #5: ❌

./amtool --alertmanager.url=http://127.0.0.1:9093 foo=AlwaysFiring alertname=NeverFiring
error: unable to determine alertname

Empty value labels

Now, this brings us to the second question... The consensus I take from this discussion seems to be that: foo= is the same as foo="". I think we're able to keep that behaviour for labels that are not the alertname, see below.

Case #1: ✅

./amtool --alertmanager.url=http://127.0.0.1:9093 AlwaysFiring foo=

Case #2: ✅

./amtool --alertmanager.url=http://127.0.0.1:9093 AlwaysFiring foo=""

Josh's opinion

Personally, I would prefer to error on foo= but as @simonpasquier pointed out - the parser is already quite flexible with this so I'm happy to accommodate. I'll have to look at a few more places where we use ParseMatchers to make sure I'm not breaking something but overall sounds doable.

If we agree with the above, I'm happy to take it for another spin.

roidelapluie · 2022-06-30T17:28:11Z

Why would alertname need to be present? I can be relabeled away by Prometheus alert_relabel_config ..

gotjosh · 2022-07-01T16:30:31Z

After Julien's comment, I took another holistic look and these are my findings:

alertname as a label is not required by the API (I thought it was - my bad 🤦)
Empty labels don't make it to the alertmanager, they're removed by the add alerts API endpoint
Silences allow empty matchers meaning foo="" is a valid matcher as long as there is at least one non-empty matcher. See here

To top the above, it seems like users have also reported confusion on different behaviour between silence matchers vs alert matchers with #2610. This is because the "filters" params on that endpoints treats it like any other matcher, however, silences when set have additional validation on them.

I'm struggling to come up with suggestions on how to best proceed forward that a) don't incur a breaking change and b) solve the problem for both adding alerts and creating silences - any context to share here?

Concretely, I'm struggling with:

I see no reason why the CLI should allow empty labels for adding alerts if they're going to end up being removed anyways. However, I see the appeal for silences.
If alertname is not required why do we have special handling for it in the CLI? IMO this is better suited for a CLI flag that when specified it helps you create the alertname e.g. ./amtool add alert -n AlwaysFiring foo=bar. As opposed to inferring that an error in parsing the first label means you're trying to set the alertname directly.

tl;dr: Does it feel like we're at a point where a silence matcher has different semantics from a label matcher even if they have the same implementation (even if theoretically they're to match just key/value pairs)? I can see the appeal of separating the matching logic and documenting/validating the semantics of them concretely so that the CLIs, APIs and UI benefit from a consistent implementation.

We don't have to turn this into a long discussion async, I'll add this to the WG so that we can discuss it.

beorn7 · 2022-07-07T12:38:36Z

WRT the amtool behavior: I agree that the special handling of the alertname feels weird. I guess that has historically evolved.

WRT at least one matcher that doesn't match the empty string: I'm not sure what the actual reason for this is, but I assume it's an efficiency concern. Any matcher matching the empty string essentially requires the equivalent of a "full table scan". If there is at least one matcher not matching the empty string, at least that "full table scan" is limited to whatever that one "proper" matcher returns rather than all alerts. To further speculate, I assume that the filter parameter on the API endpoint is more liberal because it's a user initiated action. If it is too expensive, well, then it times out eventually, but that's it. However, silences have to be checked continuously. If that's too expensive, the AM is permanently overloaded. (Again, this is all educated guessing. My knowledge of the AM internals is limited.) I do think that the preview should implement the same restrictions as the silence itself (cf. #2610). (The reason for the different behavior can be explained by my speculation above: One-time user initiated action vs. permanently running queries. Still, of course, we should do the same evaluation for the former in this case.)

WRT allowing empty labels for adding alerts: While an empty label is the same as a non-existing label, it is still valid and has well defined semantics throughout the Prometheus ecosystem. It would be really confusing if the only place where you ban this kind of redundancy would be the API for adding alerts.

In summary, I don't think that the matchers have different semantics for silences vs. alerts. I would even say that what we have is all rooted in the intention to keep the semantics the same. The one thing that sticks out (at least one matcher not matching the empty string) isn't really semantics of the matchers themselves but a higher level query requirement (which exists in the same way in Prometheus itself).

gotjosh · 2022-09-30T11:32:39Z

After recently experiencing a panic in one of our largest environments, I realised this is not worth blocking on some sort of theoretical consistency across silences and alerts APIs.

The general comments have a tendency on foo= being equal to foo="" and that's what this PR does. Effectively avoiding panic and providing the user with a consistent error message.

As an example, in the case of the reported issue:

 ~> ./amtool --alertmanager.url=http://127.0.0.1:9093 alert add foo=
amtool: error: [POST /alerts][400] postAlertsBadRequest  at least one label pair required

instead of a panic.

I think we should move forward and merge this as this problem has already been reported several times on multiple fronts.

Fixes #2936 Signed-off-by: gotjosh <[email protected]>

Signed-off-by: gotjosh <[email protected]>

roidelapluie

LGTM

gotjosh requested review from roidelapluie and simonpasquier June 28, 2022 10:51

roidelapluie reviewed Jun 28, 2022

View reviewed changes

LeviHarrison mentioned this pull request Aug 4, 2022

Alertmanager crashes when using 3rd party plugin grafana/mimir#2651

Closed

gotjosh force-pushed the parse-matchers-empty-value branch from 895b1b0 to 6dd17d8 Compare September 30, 2022 10:51

gotjosh added 2 commits September 30, 2022 12:36

Ensure matcher values are present when parsing matchers from strings

05f073f

Fixes #2936 Signed-off-by: gotjosh <[email protected]>

Take another approach

bd89550

Signed-off-by: gotjosh <[email protected]>

gotjosh force-pushed the parse-matchers-empty-value branch from 6dd17d8 to bd89550 Compare September 30, 2022 11:36

roidelapluie approved these changes Sep 30, 2022

View reviewed changes

gotjosh merged commit 5455296 into main Sep 30, 2022

gotjosh deleted the parse-matchers-empty-value branch September 30, 2022 11:51

gotjosh mentioned this pull request Sep 30, 2022

Alertmanager: Update the prometheus/alertmanager to its latest version grafana/mimir#3097

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure matcher values are present when parsing matchers from strings #2968

Ensure matcher values are present when parsing matchers from strings #2968

gotjosh commented Jun 28, 2022

roidelapluie left a comment

gotjosh commented Jun 28, 2022

roidelapluie commented Jun 28, 2022

gotjosh commented Jun 28, 2022 •

edited

Loading

beorn7 commented Jun 28, 2022

roidelapluie commented Jun 28, 2022

simonpasquier commented Jun 28, 2022 •

edited

Loading

gotjosh commented Jun 28, 2022 •

edited

Loading

roidelapluie commented Jun 30, 2022

gotjosh commented Jul 1, 2022

beorn7 commented Jul 7, 2022

gotjosh commented Sep 30, 2022

roidelapluie left a comment

Ensure matcher values are present when parsing matchers from strings #2968

Ensure matcher values are present when parsing matchers from strings #2968

Conversation

gotjosh commented Jun 28, 2022

roidelapluie left a comment

Choose a reason for hiding this comment

gotjosh commented Jun 28, 2022

roidelapluie commented Jun 28, 2022

gotjosh commented Jun 28, 2022 • edited Loading

beorn7 commented Jun 28, 2022

roidelapluie commented Jun 28, 2022

simonpasquier commented Jun 28, 2022 • edited Loading

gotjosh commented Jun 28, 2022 • edited Loading

Alertname must be present

Empty value labels

Josh's opinion

roidelapluie commented Jun 30, 2022

gotjosh commented Jul 1, 2022

beorn7 commented Jul 7, 2022

gotjosh commented Sep 30, 2022

roidelapluie left a comment

Choose a reason for hiding this comment

gotjosh commented Jun 28, 2022 •

edited

Loading

simonpasquier commented Jun 28, 2022 •

edited

Loading

gotjosh commented Jun 28, 2022 •

edited

Loading