-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Datafeed with aggregations that filter can go into an infinite loop #104699
Comments
Pinging @elastic/ml-core (Team:ML) |
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this issue
Jan 24, 2024
When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699
droberts195
added a commit
that referenced
this issue
Jan 24, 2024
#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes #104699
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this issue
Jan 24, 2024
elastic#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699
elasticsearchmachine
pushed a commit
that referenced
this issue
Jan 24, 2024
#104722) (#104727) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes #104699
henningandersen
pushed a commit
to henningandersen/elasticsearch
that referenced
this issue
Jan 25, 2024
elastic#104722) When advancing a datafeed's search interval past a period with no data, always advance by at least one time chunk. This avoids a problem where the simple aggregation used to advance time might think there is data while the datafeed's own aggregation has filtered it all out. Prior to this change, this could cause the datafeed to go into an infinite loop. After this change the worst that can happen is that we step slowly through a period where filtering inside the datafeed's aggregation is causing empty buckets. Fixes elastic#104699
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When the ability to use aggregations was first added to datafeeds there was an assumption that the datafeed's
query
would be used to filter the input while itsaggregations
would simply group and summarise the input. As time has gone by, users are using more and more complex aggregations that also do filtering within the aggregation. Examples of sub-aggregations that could do this arefilter
andbucket_selector
.Datafeeds have functionality quickly skip periods of time when there is no data at all. They do this by adding a simple aggregation that gets the min and max timestamp onto the
query
from the datafeed config, and running this over the period from the last seen data to the datafeed's configured end time.Unfortunately, the combination of these two things can cause a datafeed to go into an infinite loop. This happens if the following conditions all hold:
aggregations
filter data to a greater extent than itsquery
chunking_config
) where data exists within the firstbucket_span
of that chunk of time that matches thequery
, but after theaggregations
have been applied the entire chunk of time returns empty aggregationsIn this scenario the datafeed will "skip" time back to the start of the current chunk, and hence go into an infinite loop.
The text was updated successfully, but these errors were encountered: