Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Avoid possible datafeed infinite loop with filtering aggregations #104722

Merged
merged 2 commits into from
Jan 24, 2024

Conversation

droberts195
Copy link
Contributor

When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets.

Fixes #104699

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes elastic#104699
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @droberts195, I've created a changelog YAML for you.

// where setUpChunkedSearch() thinks data exists at the current start time
// while the datafeed's own aggregation doesn't, at least we'll step forward
// a little bit rather than go into an infinite loop.
currentStart += chunkSpan;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking, but adding this wouldn't accidentally skip some data if the data did exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because if there was data then the test on line 174 should have passed and we should have returned the data on line 175, so exited from the function before this point.

On line 172 the time period we searched was [currentStart, currentStart + chunkSpan), so we should have found the data then. This is why it should be completely safe to step forward by chunkSpan. It might be possible to step forward further, which is what line 197 is trying to do.

@droberts195 droberts195 merged commit 0f17ff4 into elastic:main Jan 24, 2024
15 checks passed
@droberts195 droberts195 deleted the avoid_datafeed_infinite_loop branch January 24, 2024 20:42
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.12

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Jan 24, 2024
elastic#104722)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes elastic#104699
elasticsearchmachine pushed a commit that referenced this pull request Jan 24, 2024
#104722) (#104727)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes #104699
henningandersen pushed a commit to henningandersen/elasticsearch that referenced this pull request Jan 25, 2024
elastic#104722)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes elastic#104699
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team v8.12.1 v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Datafeed with aggregations that filter can go into an infinite loop
3 participants