[ML] Datafeed with aggregations that filter can go into an infinite loop #104699

droberts195 · 2024-01-24T13:17:15Z

When the ability to use aggregations was first added to datafeeds there was an assumption that the datafeed's query would be used to filter the input while its aggregations would simply group and summarise the input. As time has gone by, users are using more and more complex aggregations that also do filtering within the aggregation. Examples of sub-aggregations that could do this are filter and bucket_selector.

Datafeeds have functionality quickly skip periods of time when there is no data at all. They do this by adding a simple aggregation that gets the min and max timestamp onto the query from the datafeed config, and running this over the period from the last seen data to the datafeed's configured end time.

Unfortunately, the combination of these two things can cause a datafeed to go into an infinite loop. This happens if the following conditions all hold:

The datafeed is using aggregations
The datafeed has chunking enabled
The datafeed's aggregations filter data to a greater extent than its query
There is a particular chunk of time (as defined by the chunking_config) where data exists within the first bucket_span of that chunk of time that matches the query, but after the aggregations have been applied the entire chunk of time returns empty aggregations

In this scenario the datafeed will "skip" time back to the start of the current chunk, and hence go into an infinite loop.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-01-24T13:17:38Z

Pinging @elastic/ml-core (Team:ML)

When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699

#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes #104699

elastic#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699

#104722) (#104727) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes #104699

elastic#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699

droberts195 added >bug :ml Machine learning labels Jan 24, 2024

droberts195 self-assigned this Jan 24, 2024

elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 24, 2024

droberts195 mentioned this issue Jan 24, 2024

[ML] Avoid possible datafeed infinite loop with filtering aggregations #104722

Merged

droberts195 closed this as completed in #104722 Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Datafeed with aggregations that filter can go into an infinite loop #104699

[ML] Datafeed with aggregations that filter can go into an infinite loop #104699

droberts195 commented Jan 24, 2024

elasticsearchmachine commented Jan 24, 2024

[ML] Datafeed with aggregations that filter can go into an infinite loop #104699

[ML] Datafeed with aggregations that filter can go into an infinite loop #104699

Comments

droberts195 commented Jan 24, 2024

elasticsearchmachine commented Jan 24, 2024