[ML] Avoid possible datafeed infinite loop with filtering aggregations #104722

droberts195 · 2024-01-24T17:14:07Z

When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets.

Fixes #104699

When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699

elasticsearchmachine · 2024-01-24T17:14:31Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-01-24T17:14:31Z

Hi @droberts195, I've created a changelog YAML for you.

jonathan-buttner · 2024-01-24T17:26:52Z

...rc/main/java/org/elasticsearch/xpack/ml/datafeed/extractor/chunked/ChunkedDataExtractor.java

+                // where setUpChunkedSearch() thinks data exists at the current start time
+                // while the datafeed's own aggregation doesn't, at least we'll step forward
+                // a little bit rather than go into an infinite loop.
+                currentStart += chunkSpan;


Just double checking, but adding this wouldn't accidentally skip some data if the data did exist?

No, because if there was data then the test on line 174 should have passed and we should have returned the data on line 175, so exited from the function before this point.

On line 172 the time period we searched was [currentStart, currentStart + chunkSpan), so we should have found the data then. This is why it should be completely safe to step forward by chunkSpan. It might be possible to step forward further, which is what line 197 is trying to do.

elasticsearchmachine · 2024-01-24T20:45:05Z

💚 Backport successful

Status	Branch	Result
✅	8.12

elastic#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699

#104722) (#104727) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes #104699

elastic#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699

droberts195 added >bug :ml Machine learning auto-backport-and-merge v8.12.1 v8.13.0 labels Jan 24, 2024

elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 24, 2024

Update docs/changelog/104722.yaml

068ef4c

jonathan-buttner approved these changes Jan 24, 2024

View reviewed changes

droberts195 merged commit 0f17ff4 into elastic:main Jan 24, 2024
15 checks passed

droberts195 deleted the avoid_datafeed_infinite_loop branch January 24, 2024 20:42

droberts195 mentioned this pull request Jan 24, 2024

[8.12] [ML] Avoid possible datafeed infinite loop with filtering aggregations (#104722) #104727

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Avoid possible datafeed infinite loop with filtering aggregations #104722

[ML] Avoid possible datafeed infinite loop with filtering aggregations #104722

droberts195 commented Jan 24, 2024

elasticsearchmachine commented Jan 24, 2024

elasticsearchmachine commented Jan 24, 2024

jonathan-buttner Jan 24, 2024

droberts195 Jan 24, 2024

elasticsearchmachine commented Jan 24, 2024

[ML] Avoid possible datafeed infinite loop with filtering aggregations #104722

[ML] Avoid possible datafeed infinite loop with filtering aggregations #104722

Conversation

droberts195 commented Jan 24, 2024

elasticsearchmachine commented Jan 24, 2024

elasticsearchmachine commented Jan 24, 2024

jonathan-buttner Jan 24, 2024

Choose a reason for hiding this comment

droberts195 Jan 24, 2024

Choose a reason for hiding this comment

elasticsearchmachine commented Jan 24, 2024

💚 Backport successful