[SIEM][Detection Engine] Meta issue for alerting needs #50222

FrankHassanabad · 2019-11-11T22:03:47Z

This is meta ticket around ad-hoc requirements and feature requests from the detection engine underneath SIEM back to the alerting/actions plugins.

Top asks from our side along with use cases, possible solutions, and workarounds. These are ordered in our best guess of priority ranking in which we would like to have them.

Update 3/30/21, Frank Hassanabad: Updated to drop things lower in priority that have been fixed and push up the more higher priority items. Any questions about this ticket please consult with @spong or @peluja1012 as they are more in tune with the recent needs of detection engine.

Make rules sortable, filterable, and aggregatable

Issue:
#50213

Use Case:
As a rule user, I need to sort, filter, and sometimes aggregate on rule types. For example, I need to sort my rule types on severity, or I need to filter them by severity.

Technical solution:
The current alerting/actions does not allow mapping down to the level of alerting/actions parameters. Therefore we cannot use the saved objects API of kql mixed with "order by". If that were changed and we were allowed mapping abilities to the alerting/actions parameters that would solve this. Either that or a plain API (even if slower like a table scan) to abstract us away so we can natively to the actions/alerting objects would make it to where we don't have write our own hand rolled solutions.

Current workaround:
Nothing written, but we should be able to use the KQL order by and tags once that is checked in to part of the way. However, for alerting/actions params storage we cannot easily sort/filter on those without using a table scan type technique where we do a "find" on a page of results and go through each and every one.

Make alerts capable of being run ad-hoc/immediately

Issue:
#50215

Use case:
As a rule user, I need to be able to immediately run a rule from time to time. A rule could have failed multiple times during a day due to timeout issues or networking issues or the rule could have had a bug or mistake in it in which case I need to modify and then re-run the rule. As a rule user, I need to sometimes create a rule which needs to run every 5 minutes against data 5 minutes ago but need to run that rule immediately.

Technical solution:
API from the alerting team which provides the capability.

Current workaround:
Nothing written, but we can ad-hoc create a hidden duplicate alert temporarily which runs and then deletes its self at the end of the run. This would be an additional parameter to the rule and incurs technical debt to be removed later.

Bulk create, read, update, delete for alert client

Issue:
#53144

Use Case:
As a rule user, I need to perform bulk actions such as enabling a lot of rules at once or disabling them deleting them.

Technical solution:
Add a bulk action capability to the API

Current workaround:
Call them one at a time in a forEach loop

Post our own rule id

Issue:
#50210

Use case:
As a rule user, I want to package and identify my own rules without having duplicates existing in the system. As a rule user, I want to be able to share these rules with multiple Kibana systems under different divisions and different companies as well as update them among the different companies/divisions. As a rule user, I will want to release updates to these rules across companies. As a rule user, I want to be able to export and import rules without having duplicates showing up within the system. These imports and exports can be across divisions and companies as well.

Technical solution:
We need the ability to POST our own _id fields for our rule_id. This is important for packaging rules up and distributing them and not having any duplicate rule_id's. Currently we are using rule_id as a parameter to the rules which is a slow mechanism. We can use tags to speed this up but errors or bugs can and will lead to duplicate rule_id's.

Current workaround:
We post our own rule parameter in the alerting parameters and we can sometimes get duplicates showing up due to socket timeouts or uncaught promises/unhandled promise rejections as well as bugs. We will have to write code to interpret and find duplicates and remove those rule duplicates or allow the user to remove them when they inevitably happen.

Migration hooks

Issue:
#50216

Use case:
As a rule user, I will upgrade my system and expect to not have to do manual maintenance on rules or see rules not operate suddenly.

Technical solution:
We need a way to "hook into" migration code which can run on the alerting/actions side and/or our SIEM side to provide ways to fix data bugs and/or add features such as migrations as we progress the system.

Current workaround:

Update 3/30/21: We can manually write migrations directly in the alerting project and do pull requests there.

We have to manually cleanup mistakes from older systems. Luckily this is our first cut at these systems so we do not have legacy data at the moment but after our first release we will inevitably have user generated data which will need to be upgraded.

Control compression and timeout (Update 3/3/2021: We need async calls)

Issue:
#50212
#50217

Updated Use case (3/3/2021):
We might need to enable large volumes of rule runs but we handicap things to 100 max signals at the moment. If we increase this we will need to more efficiently push data. However, we do need longer rule run times and either we need something with async or we need longer timeouts from task manager and calling into elasticsearch.

Use case:
As a rule user, I sometimes need to trigger large volumes of signals over a longer period of time or even a short period of time but with a very active rule. When triggering that rule, I would like the engine to run as quickly as possible and right now the rate is very slow due to non-compression of the data. Sometimes I encounter timeouts and would like to control the timeouts as well for wild card or complex rules which might be run less frequently.

Technical solution:
We need to enable compression/gzip with the callCluster API (From kibana -> elastic search). This should be comparable to elasticsearch-js compression. This is needed for when large amounts of JSON is transmitted due to large amounts of signals. Looks like we are using the legacy client but would like an upgrade to the newer system. We also need to get to a per connection based timeout model to configure different rules with different timeouts.

Note: This one might be out of scope for the alerting team and we might need to negotiate this one with platform team....However, we do not want to have to manage the API keys and other complexities as we really enjoy the callCluster API :-)

Current workaround:
We might be able to send in a http header flag to turn on compression and we might be able to configure system wide the timeouts.

Enable alerting/actions plugin by default ✅

Issue:
~~#50209~~

Use case:
As a rule user, I will need rules always enabled and running.

Technical solution:
Turn on the alerting/actions flag by default and add them as requirements to the SIEM plugin. Note that alerting/actions requires TLS by default. The SIEM application will only be able to run signals in TLS mode because of the requirements of API keys for alerting/actions framework. Reference issue discussing this: #34339

Current workaround:
None. We ad-hoc turn it on in the codebase per developer and then don't check it in to not cause people to break who have not enabled alerting/actions in their kibana.dev.yml.

Delete old API keys when rules are deleted ✅

Issue:
#45144

Use case:
As a rule user, I can spend lots of time creating rules in a playground environment and/or spend a lot of days/weeks creating real world rules which could be deleted later. This is particularly the case for security where different techniques and tactics are part of the evolving landscape. As I delete older rules, I would like to also remove older API keys as that leaves a large surface area per compliance models at companies.

Technical solution:
When you delete a rule, the underlying API key eventually guarantees to be deleted within a timeframe.

Current workaround:
None, as we can't distinguish which API key belongs to which rule. Users if asking us which ones they can delete over time we will not have a good answer on how to distinguish them so I do not think a workaround is possible.

Pass down space id and other parameters ✅

Issue:
#50522

Use case:
As a rule user I want to copy rules from space to space and have the new space automatically provision the new space index and begin pushing signals to the new space suffixed data index. Right now, it will continue to point to the old index until someone from support gives me a script to update the outputIndex since it is hard coded at creation time of the rule.

Technical solution:
Pass down the space id and any other useful parameters to the executor so the executor can detect a space change and try to either push errors or compensate by auto creating the index.

Current workaround:
We hard code the space id on rule creation in the outputIndex field and if we move rules around we have to update the rules manually.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-11-11T22:03:49Z

Pinging @elastic/siem (Team:SIEM)

FrankHassanabad · 2019-12-12T16:49:16Z

Some notes about the tags we're using on SIEM:

#52838

In this PR, I am utilizing the alert.tags for both user entered tags as well as for internal tags that I want fast look ups on such as our rule_id which can optionally be set so that we can update rules between customer sites (basically an extern_id ) So, it looks to be working really well for fast look ups of our rule_id and I don't think I need to muck with direct access to the saved object's _id.

One thing though in that PR I want to point out is that what I do that is filter out any internal tags before returning them to the user so they only see the tags they added:

tags.filter(tag => !tag.startsWith(INTERNAL_IDENTIFIER));

I could almost benefit from two different tags on alerts. One for the user to be able to enter data against (the UI), and a second that is for internal structures only. For now, though, using an internal identifier and filtering seems to work out well though. Only caveat is that searches including my identifier will come back as positive hits but that should be rare as I begin the internal tags with __internal

The other thing in that PR which is of note, is there doesn't seem to be an aggregate functionality on attributes of saved objects, so I have to do a slow look up of all unique tags for the UI by paging through them all and getting the unique set of tags. The UI is going to use this to present to the user the set of all tags they entered without duplicates and allow them to select one and then filter based on only that one.

FrankHassanabad · 2019-12-18T18:20:12Z

Changed ordering where delete API key is highest priority and a blocker for us. We just need some way to delete those API keys since they are generated on enabled/disabled

FrankHassanabad · 2019-12-18T20:15:37Z

New issue and re-arrangement today. I added this one to the list above:

Pass down space id and other parameters
Issue:
#50522

Use case:
As a rule user I want to copy rules from space to space and have the new space automatically provision the new space index and begin pushing signals to the new space suffixed data index. Right now, it will continue to point to the old index until someone from support gives me a script to update the outputIndex since it is hard coded at creation time of the rule.

Technical solution:
Pass down the space id and any other useful parameters to the executor so the executor can detect a space change and try to either push errors or compensate by auto creating the index.

Current workaround:
We hard code the space id on rule creation in the outputIndex field and if we move rules around we have to update the rules manually.

spong · 2020-05-05T23:34:32Z

Adding #62532 as a reference, as this issue highlights the use case for the ability to refresh API tokens before the executor runs to ensure the most recent roles/permissions are available.

FrankHassanabad · 2020-07-29T19:59:03Z

"Reviewed by Frank Hassanabad on 7/29/2020, still valid as of this date"

Only notes is that some of the tech debt or workarounds might make it difficult to implement just this one part which is:

Post our own rule id

So I will move that more towards the bottom as far as requirements go.

mikecote · 2020-10-20T14:13:25Z

@FrankHassanabad the new Elasticsearch client is available for alert types to use. It can be accessed under services.scopedClusterClient (#80794). I figured I'd let you know in case that helps with controlling compression and timeouts?

gmmorris · 2021-03-30T19:48:55Z

I gave these a pass to identify what can be unblocked here.

I have a couple of notes worth considering:

Make alerts capable of being run ad-hoc/immediately

Issue:
#50215

Use case:
As a rule user, I need to be able to immediately run a rule from time to time. A rule could have failed multiple times during a day due to timeout issues or networking issues or the rule could have had a bug or mistake in it in which case I need to modify and then re-run the rule. As a rule user, I need to sometimes create a rule which needs to run every 5 minutes against data 5 minutes ago but need to run that rule immediately.

Technical solution:
API from the alerting team which provides the capability.

Current workaround:
Nothing written, but we can ad-hoc create a hidden duplicate alert temporarily which runs and then deletes its self at the end of the run. This would be an additional parameter to the rule and incurs technical debt to be removed later.

Regarding this requirement - this API is available on Task Manager, so we actually have a work around for this.
By fetching the Rule you can get the Task ID.
Using this Task ID you can call the runNow api on task Manager.

That will tell TM to run the alert now as if it was scheduled to run now.

Post our own rule id

Issue:
#50210

Use case:
As a rule user, I want to package and identify my own rules without having duplicates existing in the system. As a rule user, I want to be able to share these rules with multiple Kibana systems under different divisions and different companies as well as update them among the different companies/divisions. As a rule user, I will want to release updates to these rules across companies. As a rule user, I want to be able to export and import rules without having duplicates showing up within the system. These imports and exports can be across divisions and companies as well.

Technical solution:
We need the ability to POST our own _id fields for our rule_id. This is important for packaging rules up and distributing them and not having any duplicate rule_id's. Currently we are using rule_id as a parameter to the rules which is a slow mechanism. We can use tags to speed this up but errors or bugs can and will lead to duplicate rule_id's.

Current workaround:
We post our own rule parameter in the alerting parameters and we can sometimes get duplicates showing up due to socket timeouts or uncaught promises/unhandled promise rejections as well as bugs. We will have to write code to interpret and find duplicates and remove those rule duplicates or allow the user to remove them when they inevitably happen.

We delivered the ability to use specific IDs in 7.12, but the requirement is that these IDs be uuids.
I discussed the need for predictable IDs with @mikecote and it turns out this is blocked on security concerns related to ESO.

The Security team don't want ESOs to have hard coded IDs, as that makes it easier to identify the encrypted document and could be considered a security hole.
I suggest touching base with @elastic/kibana-security to see if some kind of security exemption can be baked into ESO specifically for the securitySolution.

You could in theory, obviously, use hard coded UUIDs, but that defeats the purpose of the security baked into ESOs.

elasticmachine · 2022-02-23T20:04:56Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

FrankHassanabad · 2023-02-08T01:11:55Z

lol...bye bye ticket! :-)

FrankHassanabad added the Team:SIEM label Nov 11, 2019

spong mentioned this issue Nov 12, 2019

[SIEM] [Detection Engine] [Meta] Create Detection Engine UI #50405

Closed

20 tasks

mchopda added the NeededFor:SIEM label Feb 5, 2020

spong mentioned this issue Feb 15, 2020

[SIEM] Sorting the table in the manage detection rule #57422

Open

mikecote mentioned this issue May 6, 2020

Make alert params searchable #50213

Open

rylnd mentioned this issue May 7, 2020

[SIEM][Detections] Restrict ML rule modification to ML Admins #65583

Merged

4 tasks

mikecote mentioned this issue Jul 17, 2020

Dependencies on Kibana Alerting #67992

Open

59 tasks

MindyRS added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Oct 27, 2020

mikecote mentioned this issue Mar 5, 2021

Ability to turn compression/gzip on for callCluster in alert and action executors #50212

Closed

banderror mentioned this issue Mar 22, 2021

[Discuss] [Security Solution] [Alerting] HTTP route RFC for unified rule management #95060

Open

dontcallmesherryli added the dependencies Pull requests that update a dependency file label Apr 1, 2021

dontcallmesherryli added the Theme: rac label obsolete label Apr 12, 2021

peluja1012 added Team:Detection Alerts Security Detection Alerts Area Team Feature:Rule Management Security Solution Detection Rule Management area Team:Detection Rule Management Security Detection Rule Management Team labels Sep 15, 2021

MindyRS added the Team:Detections and Resp Security Detection Response Team label Feb 23, 2022

FrankHassanabad closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIEM][Detection Engine] Meta issue for alerting needs #50222

[SIEM][Detection Engine] Meta issue for alerting needs #50222

FrankHassanabad commented Nov 11, 2019 •

edited

Loading

elasticmachine commented Nov 11, 2019

FrankHassanabad commented Dec 12, 2019 •

edited

Loading

FrankHassanabad commented Dec 18, 2019

FrankHassanabad commented Dec 18, 2019

spong commented May 5, 2020

FrankHassanabad commented Jul 29, 2020

mikecote commented Oct 20, 2020 •

edited

Loading

gmmorris commented Mar 30, 2021 •

edited

Loading

Make alerts capable of being run ad-hoc/immediately

Post our own rule id

elasticmachine commented Feb 23, 2022

FrankHassanabad commented Feb 8, 2023

[SIEM][Detection Engine] Meta issue for alerting needs #50222

[SIEM][Detection Engine] Meta issue for alerting needs #50222

Comments

FrankHassanabad commented Nov 11, 2019 • edited Loading

Make rules sortable, filterable, and aggregatable

Make alerts capable of being run ad-hoc/immediately

Bulk create, read, update, delete for alert client

Post our own rule id

Migration hooks

Control compression and timeout (Update 3/3/2021: We need async calls)

Enable alerting/actions plugin by default ✅

Delete old API keys when rules are deleted ✅

Pass down space id and other parameters ✅

elasticmachine commented Nov 11, 2019

FrankHassanabad commented Dec 12, 2019 • edited Loading

FrankHassanabad commented Dec 18, 2019

FrankHassanabad commented Dec 18, 2019

spong commented May 5, 2020

FrankHassanabad commented Jul 29, 2020

mikecote commented Oct 20, 2020 • edited Loading

gmmorris commented Mar 30, 2021 • edited Loading

Make alerts capable of being run ad-hoc/immediately

Post our own rule id

elasticmachine commented Feb 23, 2022

FrankHassanabad commented Feb 8, 2023

FrankHassanabad commented Nov 11, 2019 •

edited

Loading

FrankHassanabad commented Dec 12, 2019 •

edited

Loading

mikecote commented Oct 20, 2020 •

edited

Loading

gmmorris commented Mar 30, 2021 •

edited

Loading