-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert alerts to use task manager intervals #46001
Comments
Pinging @elastic/kibana-stack-services |
Updating issue: This portion will require #45152. Once completed, alerting should use the task manager's interval property as well as the update API whenever the interval changes. |
I've began investigating the changes needed in
|
Luckily we don't need to support migrations yet (ex: 7.5 -> 7.6) as we currently don't support existing alerts in pre 7.6 releases. |
Ahh so we're fine with just breaking existing alerts when moving form 7.5 to 7.6? |
@mikecote I don't think we should close this, as long term, I don't think we want to maintain two separate interval implementations, but I no longer think this is higher priority than the migration to the Kibana Platform. Shall we move this to 7.7 bellow migration to the platform? |
@gmmorris good point, we should re-triage this issue at the next sync. I guess it can be 7.7 or back into "code debt" column. |
This has come up again in #63188 (comment) so we should probably reprioritise this again. Looking into the above issue we've realised that Alerting needs quite a bit of what Task Manager offers out of the box in terms of retries for scheduled tasks, but it isn't a 1-to-1 fit and we'll likely need to make some changes to TM to properly support Alertings's needs. Whoever picks up this issue, please read through the comments here: #63188 (comment) In addition, something worth noting: In the above section you'll note that Task Manager uses a different path when a task has a This will have to be rethought, and is likely wrong from TM as a whole, not just Alerting. |
Adding a note that we should have a discussion on this issue before starting any development. Just to review what level of effort will be required, what approach and alternatives we have, what problem(s) we're solving, etc. |
The work on TM observability (#77456) has brought to my attention that we can't visualize the "stress" that alerting puts on TM until this issue is done. Luckily this issue is prioritised, so we'll have this pre GA, but I wanted to mention this so that we understand just how valuable this issue is. 👍 |
Also worth noting - TM doesn't support |
I think some of the early issues attempting to do this was OCC issues when a task was running and would / could revert the updated schedule. There's probably some lessons learned from updating alerts w/ OCC logic that could be applied here (cc @pmuellr) |
This issue should solve test 1 and 3 from #53650 (comment).
Otherwise separate issues should be opened for those as I believe they are the driver why this issue is in To-Do for GA. |
We might be able to side step this by allowing partial updates of certain parts of the task, but there's a danger here that it might still cause a failure of a running task if we don't limit what fields it touches 🤔 |
One important note about this issue - if we do not migrate Alerting to use the Task manager intervals, the alerting tasks will not appear to Task Manager as a recurring task, but rather as a single execution task. This means that some of the observability added by #77456 won't be as useful for diagnosing alerting problems as they would be otherwise. We could address this by providing these stats in the Alerting health endpoint, but that's far from ideal. Another thought is that we could possibly have an |
Side note: this issue should be paired with another person due to the potential complexity it will have. |
I've been spiking this locally and it seems this isn't nearly as complicated as we originally thought. We can break this down into 2 steps:
Spiked here: #80149 |
This portion will require #45152. Once completed, alerting should use the task manager's interval property as well as the update API whenever the interval changes.
The text was updated successfully, but these errors were encountered: