Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Sync rule statuses of Detection Engine and Alerting Framework #106482

Open
banderror opened this issue Jul 21, 2021 · 3 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:Rule Monitoring Security Solution Detection Rule Monitoring area impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@banderror
Copy link
Contributor

Summary

Alerting Framework uses the following rule execution statuses: SucceededFailedRunning, and Not started.

Additionally, in Security Solution we have custom execution statuses that are displayed in the Rule Management table and Rule Details page:

  • succeeded,
  • failed
  • going to run (actually this one means “running”)
  • warning (previously known as partial failure )

The difference from the Alerting framework statuses is:

  • we don’t show and don’t have a “not started” / “idle” status; instead, we always show the most recent previous status of execution (like, result, outcome)
  • we can transition from status A to status B (e.g. from going to run to warning ) asap in the middle of the rule execution (not waiting till the end of execution), and we can do it multiple times (e.g. going to run -> warning -> failed )

The way it's implemented today you could have a rule that's succeeded at framework level but failed at Security level.

This can happen because of the way error handling is implemented in Security rule type executor. Most of the logic is under try-catch, so if an exception happens within this try block, our status will become “failed”, but on the framework level the rule will be “succeeded”. Some of the errors are handled not via exceptions, in this case the result can be the same. Finally, some parts of the executor are not inside this try-catch, so if any exception happens there, the rule will become failed at the framework levels, but we won’t write our own status. I believe that this case is handled in our routes where we merge own statuses with the framework statuses to get the final, correct status of each rule.

This seems to be fixable from the Security Solution side. We could throw/re-throw exceptions from the executor whenever we decide to make the rule failed. This would sync Security Solution's custom statuses with the Framework statuses.

@banderror banderror added Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. labels Jul 21, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror
Copy link
Contributor Author

cc @gmmorris @mikecote @kobelb

@peluja1012 peluja1012 added Theme: rac label obsolete Feature:Rule Management Security Solution Detection Rule Management area Team:Detection Rule Management Security Detection Rule Management Team labels Sep 15, 2021
@banderror banderror added Feature:Rule Monitoring Security Solution Detection Rule Monitoring area 8.3 candidate and removed Feature:Rule Management Security Solution Detection Rule Management area labels Apr 27, 2022
@banderror banderror changed the title [Security Solution][Detections] Sync rule statuses of Detection Engine and Alerting Framework [Security Solution] Sync rule statuses of Detection Engine and Alerting Framework Nov 24, 2022
@banderror banderror added bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed Theme: rac label obsolete 8.7 candidate labels Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Rule Monitoring Security Solution Detection Rule Monitoring area impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

No branches or pull requests

4 participants