[Security Solution][Detections Engine] Update search_after to use PIT to work with tie breakers and issues in @timestamp fields and overrides #103944

FrankHassanabad · 2021-06-30T17:09:58Z

Agents can have multiple documents having the same @timestamp, event.ingested, or other fields since multiple events can happen within the same granularity of second, millisecond, nanosecond, etc... IoC's and lists has the same potential use case where people can add multiple list items with the same exact timestamps. This can cause an issue where when we use search_after we do not have a tie-breaker and can cause missed alerts/signals.

Also, we have the problem where mutations to a list can happen while the rule is running or events are changed during the time we are trying to detect.

Luckily Elasticsearch has implemented a new way for us to do this called (PIT) Point In Time which should solve these use cases nicely:
https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after

We just have to implement and use this within detection engine and anywhere such as IoC lists where we are using search_after:

Unknowns are if we have large volumes of data, large look back times, or taking a very long time and we set the PIT to a very large number if that is going to block ingest, slow things down, or if we would have to "bend" this part a bit or not. But we should attempt to implement and test this and use it everywhere for users.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-30T17:10:01Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388

andrewkroh · 2021-07-01T22:11:42Z

On a related note in #104044 I'm removing the sub-seconds from event.ingested for Fleet integrations.

The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates #103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>

The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>

The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates #103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Andrew Kroh <[email protected]>

The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>

rylnd · 2021-08-31T18:45:38Z

@MikePaquette RE your performance concerns about PIT (Keeping point in time alive):

Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. However, open point-in-times prevent the old segments from being deleted since they are still in use.

So it sounds like the impact here would be on storage and open handles. Reasonably small values (1m in general, rule.interval for rule executions) should minimize the impact here.

yctercero · 2022-01-27T19:08:00Z

Ticket by core tracking these changes - #93770

… from saved objects to exception lists (#125182) ## Summary Exposes the functionality of * search_after * point in time (pit) From saved objects to the exception lists. This _DOES NOT_ expose these to the REST API just yet. Rather this exposes it at the API level to start with and changes code that had hard limits of 10k and other limited loops. I use the batching of 1k for this at a time as I thought that would be a decent batch guess and I see other parts of the code changed to it. It's easy to change the 1k if we find we need to throttle back more as we get feedback from others. See this PR where `PIT` and `search_after` were first introduced: #89915 See these 2 issues where we should be using more paging and PIT (Point in Time) with search_after: #93770 #103944 The new methods added to the `exception_list_client.ts` client class are: * openPointInTime * closePointInTime * findExceptionListItemPointInTimeFinder * findExceptionListPointInTimeFinder * findExceptionListsItemPointInTimeFinder * findValueListExceptionListItemsPointInTimeFinder The areas of functionality that have been changed: * Exception list exports * Deletion of lists * Getting exception list items when generating signals Note that currently we use our own ways of looping over the saved objects which you can see in the codebase such as this older way below which does work but had a limitation of 10k against saved objects and did not do point in time (PIT) Older way example (deprecated): ```ts let page = 1; let ids: string[] = []; let foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); while (foundExceptionListItems != null && foundExceptionListItems.data.length > 0) { ids = [ ...ids, ...foundExceptionListItems.data.map((exceptionListItem) => exceptionListItem.id), ]; page += 1; foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); } return ids; ``` But now that is replaced with this newer way using PIT: ```ts // Stream the results from the Point In Time (PIT) finder into this array let ids: string[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { const responseIds = response.data.map((exceptionListItem) => exceptionListItem.id); ids = [...ids, ...responseIds]; }; await findExceptionListItemPointInTimeFinder({ executeFunctionOnStream, filter: undefined, listId, maxSize: undefined, // NOTE: This is unbounded when it is "undefined" namespaceType, perPage: 1_000, savedObjectsClient, sortField: 'tie_breaker_id', sortOrder: 'desc', }); return ids; ``` We also have areas of code that has perPage listed at 10k or a constant that represents 10k which this removes in most areas (but not all areas): ```ts const items = await client.findExceptionListsItem({ listId: listIds, namespaceType: namespaceTypes, page: 1, pit: undefined, perPage: MAX_EXCEPTION_LIST_SIZE, // <--- Really bad to send in 10k per page at a time searchAfter: undefined, filter: [], sortOrder: undefined, sortField: undefined, }); ``` That is now: ```ts // Stream the results from the Point In Time (PIT) finder into this array let items: ExceptionListItemSchema[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { items = [...items, ...response.data]; }; await client.findExceptionListsItemPointInTimeFinder({ executeFunctionOnStream, listId: listIds, namespaceType: namespaceTypes, perPage: 1_000, filter: [], maxSize: undefined, // NOTE: This is unbounded when it is "undefined" sortOrder: undefined, sortField: undefined, }); ``` Left over areas will be handled in separate PR's because they are in other people's code ownership areas. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

yctercero · 2022-02-28T16:40:57Z

@MadameSheema this one will need QA check but I'm working on breaking this out into individual tickets for each section that needs updating so that it'll be more clear exactly what to test. Adding SecuritySolutionPlatform:QAAssist tag here so that I don't forget.

FrankHassanabad added bug Fixes for quality problems that affect the customer experience Team:Detections and Resp Security Detection Response Team labels Jun 30, 2021

FrankHassanabad assigned peluja1012 and spong Jun 30, 2021

andrewkroh mentioned this issue Jul 1, 2021

[Fleet] Remove subseconds from event.ingested #104044

Merged

2 tasks

peluja1012 added Team:Security Solution Platform Security Solution Platform Team technical debt Improvement of the software architecture and operational architecture labels Sep 15, 2021

spong removed their assignment Nov 10, 2021

peluja1012 added the 8.2 candidate considered, but not committed, for 8.2 release label Jan 19, 2022

yctercero mentioned this issue Jan 27, 2022

Expensive queries are causing unnecessary load and delays on Elasticsearch #93770

Open

45 tasks

FrankHassanabad mentioned this issue Feb 10, 2022

[Security Solutions] Exposes the search_after and point in time (pit) from saved objects to exception lists #125182

Merged

1 task

yctercero added the SecuritySolution:QAAssist Part of QA testing process for release label Feb 28, 2022

peluja1012 removed the bug Fixes for quality problems that affect the customer experience label Apr 4, 2022

FrankHassanabad closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution][Detections Engine] Update search_after to use PIT to work with tie breakers and issues in @timestamp fields and overrides #103944

[Security Solution][Detections Engine] Update search_after to use PIT to work with tie breakers and issues in @timestamp fields and overrides #103944

FrankHassanabad commented Jun 30, 2021 •

edited

Loading

elasticmachine commented Jun 30, 2021

andrewkroh commented Jul 1, 2021

rylnd commented Aug 31, 2021

yctercero commented Jan 27, 2022

yctercero commented Feb 28, 2022

[Security Solution][Detections Engine] Update search_after to use PIT to work with tie breakers and issues in @timestamp fields and overrides #103944

[Security Solution][Detections Engine] Update search_after to use PIT to work with tie breakers and issues in @timestamp fields and overrides #103944

Comments

FrankHassanabad commented Jun 30, 2021 • edited Loading

elasticmachine commented Jun 30, 2021

andrewkroh commented Jul 1, 2021

rylnd commented Aug 31, 2021

yctercero commented Jan 27, 2022

yctercero commented Feb 28, 2022

FrankHassanabad commented Jun 30, 2021 •

edited

Loading