Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert related task manager tasks should clear out when deleting a space #50248

Closed
mikecote opened this issue Nov 12, 2019 · 6 comments
Closed
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) technical debt Improvement of the software architecture and operational architecture

Comments

@mikecote
Copy link
Contributor

No description provided.

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-stack-services (Team:Stack Services)

@bmcconaghy bmcconaghy added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) and removed Team:Stack Services labels Dec 12, 2019
@mikecote
Copy link
Contributor Author

Today a log like the following will show and task manager will mark the task as failed (and stop re-scheduling).

 server    log   [09:32:24.930] [error][plugins][taskManager] Task alerting:.index-threshold "c8f19c60-413d-11eb-8008-dfda2ee83934" failed: Error: Saved object [alert/c8374d10-413d-11eb-8008-dfda2ee83934] not found

The main problem was having waisted cycles in Task Manager after deleting a space as the tasks would keep retrying indefinitely. It seems with the logic we have today, the problem is gone and the task remains in an error state instead of disappearing. Sounds about right? or we should delete the task instead of marking it as failed?

@pmuellr
Copy link
Member

pmuellr commented Jan 13, 2021

I'm hesitant to delete the tasks, in case say the space was "accidentally" deleted - in that situation, once you re-create the space, I think the alerts would "still be there". Of course, you have the opposite problem as well - let's say you delete a space you're no longer using, and then a few months later, create a new space with same name and BOOM! you now see a bunch of alerts appear out of nowhere!

How do other space-specific saved object using apps deal with this?

It almost seems like what you'd want to do is mark these alerts as "orphans", change TM to ignore orphans, and provide some API/UI to deal with the orphans - import them into a space, or delete them. But that sounds like a lot of work.

We could just mark them as orphans and let TM ignore them - then they're just eating disk space. But we'd have them still there in case of "accidents".

@mikecote
Copy link
Contributor Author

From what I recall, the SO within that space are gone when the space is deleted (https://github.com/elastic/kibana/blob/master/x-pack/plugins/spaces/server/spaces_client/spaces_client.ts#L94).

@pmuellr
Copy link
Member

pmuellr commented Jan 13, 2021

Ah! And re-reading this and looking at the dates :-), I guess the suggestion is this is basically working today, but the task isn't deleted, just marked as failed and won't be run anymore. That feels right. I do worry a bit about TM keeping failed tasks around forever, but I'm thinking this case is small potatoes compared to failed action execution errors, where we've seen TM indices with 100's of 1000's of failed tasks.

At some point, we should have some way of cleaning these up, but until then, seems like the current behavior is fine.

@YulNaumenko YulNaumenko added the technical debt Improvement of the software architecture and operational architecture label Mar 11, 2021
@ymao1
Copy link
Contributor

ymao1 commented Mar 18, 2021

Closing in favor of #79977

@ymao1 ymao1 closed this as completed Mar 18, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) technical debt Improvement of the software architecture and operational architecture
Projects
None yet
Development

No branches or pull requests

7 participants