Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Set data recognizer to ignore frozen indices #117208

Merged
merged 1 commit into from
Nov 3, 2021

Conversation

qn895
Copy link
Member

@qn895 qn895 commented Nov 2, 2021

Summary

Addresses #116696. Currently, it will take a long time to get results from /api/ml/modules/recognize especially if the index pattern includes indices that are frozen. This PR sets ignore_throttled to true as we would not need to search for frozen. This should speed up the request.

@qn895 qn895 requested a review from a team as a code owner November 2, 2021 18:46
@qn895 qn895 self-assigned this Nov 2, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @qn895

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qn895 qn895 merged commit 2f24d14 into elastic:main Nov 3, 2021
@qn895 qn895 added auto-backport Deprecated - use backport:version if exact versions are needed v8.1.0 labels Nov 4, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 4, 2021
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
8.0

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Nov 4, 2021
@droberts195
Copy link
Contributor

It's just been pointed out to me that the default for ignore_throttled is true. Therefore we shouldn't need to explicitly set it to true.

pattern includes indices that are frozen

I suspect that "frozen" in the context that caused the problem to be observed meant frozen tier aka searchable snapshots, and not frozen indices (which is what ignore_throttled relates to).

ignore_throttled and frozen indices are deprecated in 8.0, so unless there is more to this than meets the eye I think this PR should be reverted from both main and 8.0. But hold off doing this until next week to give me time to look more closely at the code.

If the problem really was with frozen tier aka searchable snapshots then the solution is instead to change the data recognizer to only search data from the most recent 30 or 90 days. Ideally this would be the most recent N days defined by the ILM policy that moves data into the frozen tier after N days. However, until there is an API that can answer the question of what that N is we'll have to go with a hardcoded 30 or 90 based on common practice for when data is moved to frozen.

@droberts195
Copy link
Contributor

Today I learnt about elastic/elasticsearch#69288.

Frozen tier (aka searchable snapshots) is the modern way to manage old data and is completely different to frozen indices which is old functionality that's being deprecated in 8.0.

I think instead of setting ignore_throttled you should add a terms query on the term _tier looking for the values content, hot or warm. This means the data recoginzer will ignore the cold and frozen tiers. We can justify this by saying that if you haven't seen a particular type of data for ages then there's no point setting up ML jobs for it even though you saw it once.

qn895 added a commit to qn895/kibana that referenced this pull request Nov 5, 2021
qn895 added a commit that referenced this pull request Nov 5, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 5, 2021
kibanamachine added a commit that referenced this pull request Nov 5, 2021
@qn895 qn895 deleted the ml-jobs-ignore-throttled branch November 10, 2021 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Anomaly Detection ML anomaly detection :ml release_note:fix v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants