-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruler: In stateless mode, ALERTS
and ALERTS_FOR_STATE
metrics are not being written
#5431
Comments
I think an (potentially) important part I left out is that the mentioned rule using |
I think this PR may be related: #5230 @yeya24 do you know if PR #5230 will cause the stateless Ruler to produce the |
Not related I think but it depends on the alert compliance test framework maybe. #5230 will help the stateless ruler restore the |
Thank you for the clarification @yeya24 |
|
My apologies, so I was mistaken, I cannot query for
Because we're specifying agent DB as the primary queryable in the fanout storage, calling I assume then we could remove agent DB from the fanout queryable, but looks like @yeya24 has other plans in #5230 which also seem to fix this (but I'd need to review it more closely). Either way would be nice to add some test cases for |
I still think that's a separate thing as this error happens only in the restore phase. It only happens at the first evaluation and has nothing to do with the compliance test unless the test requires alerts state restoration. For writing |
We can leave aside the compliance test and focus on why If I'm reading the Prom rules manager code correctly: Does that make sense? |
Thanks for the investigation, you are right. I didn't notice that unrestored rule will prevent appending new alerts and alerts state samples. |
ALERTS
in the expression are not evaluated correctlyALERTS
and ALERTS_FOR_STATE
metrics are not being written
dupe of #5219 ? or at least same topic / same fix needed :) |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
I will close this one as the feature was merged into main already |
Thanos, Prometheus and Golang version used:
Thanos - latest
main (93e7ced)
What happened:
As a part of the process of testing Thanos for Prometheus Alert Generator Compliance (PR: #5315) I realized that the test suite is failing when running ruler in a stateless mode. I traced this to a particular rule, which contains the
ALERTS
metrics in it's expression. This rule never seemed to be getting into pending / firing state (according to the rules web UI), although it was supposed to be (in contrast, with TSDB ruler, this is working as expected).This is despite the fact that when I queried for the same expression in querier, I could see correct results and that theI assume that the rule is never correctly evaluated.ALERTS
metrics are available for querying.What you expected to happen:
I expect all alerts to be evaluated correctly even in stateless mode.
How to reproduce it (as minimally and precisely as possible):
You can fire up the compatibility test locally (visit the PR - #5315 or if merged look for the test in codebase).
The setup consists of:
Full logs to relevant components:
Even with debug log level, I did not see any relevant output in ruler.
Anything else we need to know:
Few more details included in this comment #5315 (comment)
The text was updated successfully, but these errors were encountered: