-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Detections Engine] Update search_after to use PIT to work with tie breakers and issues in @timestamp fields and overrides #103944
Comments
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388
On a related note in #104044 I'm removing the sub-seconds from |
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates #103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates #103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Andrew Kroh <[email protected]>
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>
@MikePaquette RE your performance concerns about PIT (Keeping point in time alive):
So it sounds like the impact here would be on storage and open handles. Reasonably small values ( |
Ticket by core tracking these changes - #93770 |
… from saved objects to exception lists (#125182) ## Summary Exposes the functionality of * search_after * point in time (pit) From saved objects to the exception lists. This _DOES NOT_ expose these to the REST API just yet. Rather this exposes it at the API level to start with and changes code that had hard limits of 10k and other limited loops. I use the batching of 1k for this at a time as I thought that would be a decent batch guess and I see other parts of the code changed to it. It's easy to change the 1k if we find we need to throttle back more as we get feedback from others. See this PR where `PIT` and `search_after` were first introduced: #89915 See these 2 issues where we should be using more paging and PIT (Point in Time) with search_after: #93770 #103944 The new methods added to the `exception_list_client.ts` client class are: * openPointInTime * closePointInTime * findExceptionListItemPointInTimeFinder * findExceptionListPointInTimeFinder * findExceptionListsItemPointInTimeFinder * findValueListExceptionListItemsPointInTimeFinder The areas of functionality that have been changed: * Exception list exports * Deletion of lists * Getting exception list items when generating signals Note that currently we use our own ways of looping over the saved objects which you can see in the codebase such as this older way below which does work but had a limitation of 10k against saved objects and did not do point in time (PIT) Older way example (deprecated): ```ts let page = 1; let ids: string[] = []; let foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); while (foundExceptionListItems != null && foundExceptionListItems.data.length > 0) { ids = [ ...ids, ...foundExceptionListItems.data.map((exceptionListItem) => exceptionListItem.id), ]; page += 1; foundExceptionListItems = await findExceptionListItem({ filter: undefined, listId, namespaceType, page, perPage: PER_PAGE, pit: undefined, savedObjectsClient, searchAfter: undefined, sortField: 'tie_breaker_id', sortOrder: 'desc', }); } return ids; ``` But now that is replaced with this newer way using PIT: ```ts // Stream the results from the Point In Time (PIT) finder into this array let ids: string[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { const responseIds = response.data.map((exceptionListItem) => exceptionListItem.id); ids = [...ids, ...responseIds]; }; await findExceptionListItemPointInTimeFinder({ executeFunctionOnStream, filter: undefined, listId, maxSize: undefined, // NOTE: This is unbounded when it is "undefined" namespaceType, perPage: 1_000, savedObjectsClient, sortField: 'tie_breaker_id', sortOrder: 'desc', }); return ids; ``` We also have areas of code that has perPage listed at 10k or a constant that represents 10k which this removes in most areas (but not all areas): ```ts const items = await client.findExceptionListsItem({ listId: listIds, namespaceType: namespaceTypes, page: 1, pit: undefined, perPage: MAX_EXCEPTION_LIST_SIZE, // <--- Really bad to send in 10k per page at a time searchAfter: undefined, filter: [], sortOrder: undefined, sortField: undefined, }); ``` That is now: ```ts // Stream the results from the Point In Time (PIT) finder into this array let items: ExceptionListItemSchema[] = []; const executeFunctionOnStream = (response: FoundExceptionListItemSchema): void => { items = [...items, ...response.data]; }; await client.findExceptionListsItemPointInTimeFinder({ executeFunctionOnStream, listId: listIds, namespaceType: namespaceTypes, perPage: 1_000, filter: [], maxSize: undefined, // NOTE: This is unbounded when it is "undefined" sortOrder: undefined, sortField: undefined, }); ``` Left over areas will be handled in separate PR's because they are in other people's code ownership areas. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
@MadameSheema this one will need QA check but I'm working on breaking this out into individual tickets for each section that needs updating so that it'll be more clear exactly what to test. Adding |
Agents can have multiple documents having the same
@timestamp
,event.ingested
, or other fields since multiple events can happen within the same granularity of second, millisecond, nanosecond, etc... IoC's and lists has the same potential use case where people can add multiple list items with the same exact timestamps. This can cause an issue where when we usesearch_after
we do not have a tie-breaker and can cause missed alerts/signals.Also, we have the problem where mutations to a list can happen while the rule is running or events are changed during the time we are trying to detect.
Luckily Elasticsearch has implemented a new way for us to do this called (PIT) Point In Time which should solve these use cases nicely:
https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after
We just have to implement and use this within detection engine and anywhere such as IoC lists where we are using
search_after
:Unknowns are if we have large volumes of data, large look back times, or taking a very long time and we set the PIT to a very large number if that is going to block ingest, slow things down, or if we would have to "bend" this part a bit or not. But we should attempt to implement and test this and use it everywhere for users.
The text was updated successfully, but these errors were encountered: