Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Datafeed with aggregations that filter can go into an infinite loop #104699

Closed
droberts195 opened this issue Jan 24, 2024 · 1 comment · Fixed by #104722
Closed

[ML] Datafeed with aggregations that filter can go into an infinite loop #104699

droberts195 opened this issue Jan 24, 2024 · 1 comment · Fixed by #104722
Assignees
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team

Comments

@droberts195
Copy link
Contributor

When the ability to use aggregations was first added to datafeeds there was an assumption that the datafeed's query would be used to filter the input while its aggregations would simply group and summarise the input. As time has gone by, users are using more and more complex aggregations that also do filtering within the aggregation. Examples of sub-aggregations that could do this are filter and bucket_selector.

Datafeeds have functionality quickly skip periods of time when there is no data at all. They do this by adding a simple aggregation that gets the min and max timestamp onto the query from the datafeed config, and running this over the period from the last seen data to the datafeed's configured end time.

Unfortunately, the combination of these two things can cause a datafeed to go into an infinite loop. This happens if the following conditions all hold:

  1. The datafeed is using aggregations
  2. The datafeed has chunking enabled
  3. The datafeed's aggregations filter data to a greater extent than its query
  4. There is a particular chunk of time (as defined by the chunking_config) where data exists within the first bucket_span of that chunk of time that matches the query, but after the aggregations have been applied the entire chunk of time returns empty aggregations

In this scenario the datafeed will "skip" time back to the start of the current chunk, and hence go into an infinite loop.

@droberts195 droberts195 added >bug :ml Machine learning labels Jan 24, 2024
@droberts195 droberts195 self-assigned this Jan 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 24, 2024
droberts195 added a commit to droberts195/elasticsearch that referenced this issue Jan 24, 2024
When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes elastic#104699
droberts195 added a commit that referenced this issue Jan 24, 2024
#104722)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes #104699
droberts195 added a commit to droberts195/elasticsearch that referenced this issue Jan 24, 2024
elastic#104722)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes elastic#104699
elasticsearchmachine pushed a commit that referenced this issue Jan 24, 2024
#104722) (#104727)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes #104699
henningandersen pushed a commit to henningandersen/elasticsearch that referenced this issue Jan 25, 2024
elastic#104722)

When advancing a datafeed's search interval past a period with no data,
always advance by at least one time chunk. This avoids a problem where
the simple aggregation used to advance time might think there is data
while the datafeed's own aggregation has filtered it all out. Prior to
this change, this could cause the datafeed to go into an infinite loop.
After this change the worst that can happen is that we step slowly
through a period where filtering inside the datafeed's aggregation is
causing empty buckets.

Fixes elastic#104699
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants