Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] [AIOps] Uses standard analyzer in log pattern analysis to ensure filter in Discover matches correct documents #172188

Conversation

jgowdyelastic
Copy link
Member

@jgowdyelastic jgowdyelastic commented Nov 29, 2023

Fixes #169523

The categorize_text agg uses the ml_standard tokenizer by default which produces slightly different tokens compared to the standard tokenizer, which is the default used for search.
This means the category key (which is comprised of these tokens) will occasionally not match any documents when it is used as a filter in Discover to find docs in a category.

This PR ensures the standard tokenizer is always used in the pattern analysis query.

A future enhancement would be to check which analyzer is specified in the mappings for the source field and to use that instead of unconditionally using standard. However for an initial fix, using the standard analyzer will be more likely to match the results from the majority of searches.

@jgowdyelastic jgowdyelastic self-assigned this Dec 1, 2023
@jgowdyelastic jgowdyelastic added release_note:fix :ml Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis v8.12.0 labels Dec 1, 2023
@jgowdyelastic jgowdyelastic marked this pull request as ready for review December 1, 2023 11:31
@jgowdyelastic jgowdyelastic requested a review from a team as a code owner December 1, 2023 11:31
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@jgowdyelastic jgowdyelastic marked this pull request as draft December 1, 2023 11:33
@jgowdyelastic jgowdyelastic marked this pull request as ready for review December 1, 2023 11:36
Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and LGTM

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


const categorizationAnalyzer: AggregationsCustomCategorizeTextAnalyzer = {
char_filter: ['first_line_with_letters'],
tokenizer: 'standard',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a comment here saying:

  1. This is basically the default categorization analyzer but with the standard tokenizer instead of ml_standard.
  2. The ml_standard tokenizer splits tokens in a way that was observed to give better categories in testing many years ago, however, the downside of these better categories is then potential failures to find the original documents when using the category tokens to search for them.
  3. Ideally we'd use the tokenizer from the mappings of the field being categorized, but that's too hard, so using standard is a quick compromise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added in 287e868

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
aiops 386.6KB 387.1KB +473.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jgowdyelastic

@jgowdyelastic jgowdyelastic merged commit e73d35c into elastic:main Dec 5, 2023
35 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Dec 5, 2023
@jgowdyelastic jgowdyelastic changed the title [ML] [AIOps] Using standard analyser in pattern analysis [ML] [AIOps] Using standard analyzer in pattern analysis Dec 5, 2023
@peteharverson peteharverson changed the title [ML] [AIOps] Using standard analyzer in pattern analysis [ML] [AIOps] Uses standard analyzer in log pattern analysis to ensure filter in Discover matches correct documents Dec 8, 2023
walterra added a commit that referenced this pull request Feb 12, 2024
## Summary

Fixes #176387.

The `standard` analyser for log pattern analysis introduced in #172188
might return patterns that mess with the identifying of significant
patterns across time ranges, for example if a pattern matches different
parts of a date or time. This adds an update that allows to set the
analyser for log rate analysis to `ml_standard` but keep `standard` for
log pattern analysis.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
…ic#176587)

## Summary

Fixes elastic#176387.

The `standard` analyser for log pattern analysis introduced in elastic#172188
might return patterns that mess with the identifying of significant
patterns across time ranges, for example if a pattern matches different
parts of a date or time. This adds an update that allows to set the
analyser for log rate analysis to `ml_standard` but keep `standard` for
log pattern analysis.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
…ic#176587)

## Summary

Fixes elastic#176387.

The `standard` analyser for log pattern analysis introduced in elastic#172188
might return patterns that mess with the identifying of significant
patterns across time ranges, for example if a pattern matches different
parts of a date or time. This adds an update that allows to set the
analyser for log rate analysis to `ml_standard` but keep `standard` for
log pattern analysis.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis :ml release_note:fix v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] AIOPs - Discover filter can fail to match any documents
6 participants