Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] AIOps: Fix grouping for fields with large arrays. #177438

Merged
merged 11 commits into from
Feb 23, 2024

Conversation

walterra
Copy link
Contributor

@walterra walterra commented Feb 21, 2024

Summary

Fixes edge cases for datasets with large arrays within single fields:

  • Deduplicates groups as a final step of creating groups.
  • Limits how many values (50) to use per field for the frequent_item_sets aggregations.
  • Fixes the should clauses for the query for frequent_item_sets, the previous version of the query could be too narrow for fields with arrays and return no results.
  • For the fallback analysis when either deviation or baseline returns no docs, increases the limit from 10 to 100 docs.
  • It turned out the grouping for array values of fields had another bug: Because we treated the field/values of a group as a dictionary/record like structure, this didn't hold multiple values for a single field. The code was changed in this PR so it is an array of field/value pairs which now supports multiple values per field.
  • On the client side, fixes unique keys for the group item badges if there's multiple items for the same field.

Adds API integration tests for a dataset with large arrays. This dataset also triggers slowness of the frequent_item_sets agg and can be used for a performance journey in a follow up. Without the new limit for how many values per field to use, these new tests would fail because the agg cases a timeout. The assertions for chunk and action lengths were removed because they are flaky for longer running requests (because of how we implemented flush fix and keep alive behavior).

Dataset to test behavior: aiops-lra-frequent-items-array.ndjson.zip

Without this PR, the dataset would cause the grouping part of log rate analysis to time out. With this PR, it's still slow for just 18 docs, but it is able to return results.

Video to replicate the test: Upload via ML File Upload, Adjust the date picker in Log Rate Analysis, then run the analysis.

aiops-log-rate-analysis-arrays-0001.webm

Checklist

@walterra walterra added bug Fixes for quality problems that affect the customer experience release_note:fix :ml v8.13.0 v8.14.0 labels Feb 21, 2024
@walterra walterra self-assigned this Feb 21, 2024
@walterra walterra added the Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis label Feb 21, 2024
@walterra
Copy link
Contributor Author

walterra commented Feb 21, 2024

🔴 50x Flaky Test Runner https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5260 (flaky chunk length assertions)

@walterra walterra force-pushed the ml-aiops-fix-arrays branch from 40bcb26 to 09c29c6 Compare February 21, 2024 14:13
@walterra
Copy link
Contributor Author

walterra commented Feb 21, 2024

🔴 50x Flaky Test Runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5262 (flaky actions length assertions)

@walterra walterra mentioned this pull request Feb 21, 2024
11 tasks
@walterra walterra force-pushed the ml-aiops-fix-arrays branch from 7e6eed0 to e378c1c Compare February 21, 2024 15:01
@walterra
Copy link
Contributor Author

walterra commented Feb 21, 2024

@walterra walterra marked this pull request as ready for review February 21, 2024 15:54
@walterra walterra requested a review from a team as a code owner February 21, 2024 15:54
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@walterra walterra changed the title [ML] AIOps: Fix and API integration tests for large arrays for Log Rate Analysis [ML] AIOps: Fix grouping for fields with large arrays. Feb 21, 2024
@walterra
Copy link
Contributor Author

walterra commented Feb 22, 2024

Another Flaky Test Runner after another bugfix for grouping with arrays of values for a single field.

🟢 25x API / 25x Functional Tests Flaky Test Runner https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5289

Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM ⚡

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #89 / APM API tests correlations/latency.spec.ts trial 8.0.0 "before all" hook: runBefore in "8.0.0"
  • [job] [logs] Jest Tests #6 / CustomFields renders correctly

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
aiops 400.5KB 400.5KB +43.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @walterra

@qn895
Copy link
Member

qn895 commented Feb 23, 2024

Code LGTM 🎉

@walterra walterra merged commit 0d19e5e into elastic:main Feb 23, 2024
18 checks passed
@walterra walterra deleted the ml-aiops-fix-arrays branch February 23, 2024 23:22
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Feb 23, 2024
## Summary

Fixes edge cases for datasets with large arrays within single fields:

- Deduplicates groups as a final step of creating groups.
- Limits how many values (50) to use per field for the
`frequent_item_sets` aggregations.
- Fixes the `should` clauses for the query for `frequent_item_sets`, the
previous version of the query could be too narrow for fields with arrays
and return no results.
- For the fallback analysis when either deviation or baseline returns no
docs, increases the limit from 10 to 100 docs.
- It turned out the grouping for array values of fields had another bug:
Because we treated the field/values of a group as a dictionary/record
like structure, this didn't hold multiple values for a single field. The
code was changed in this PR so it is an array of field/value pairs which
now supports multiple values per field.
- On the client side, fixes unique keys for the group item badges if
there's multiple items for the same field.

Adds API integration tests for a dataset with large arrays. This dataset
also triggers slowness of the `frequent_item_sets` agg and can be used
for a performance journey in a follow up. Without the new limit for how
many values per field to use, these new tests would fail because the agg
cases a timeout. The assertions for chunk and action lengths were
removed because they are flaky for longer running requests (because of
how we implemented flush fix and keep alive behavior).

Dataset to test behavior:
[aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip)

Without this PR, the dataset would cause the grouping part of log rate
analysis to time out. With this PR, it's still slow for just 18 docs,
but it is able to return results.

Video to replicate the test: Upload via ML File Upload, Adjust the date
picker in Log Rate Analysis, then run the analysis.

[aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88)

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

(cherry picked from commit 0d19e5e)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.13

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Feb 24, 2024
… (#177765)

# Backport

This will backport the following commits from `main` to `8.13`:
- [[ML] AIOps: Fix grouping for fields with large arrays.
(#177438)](#177438)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Walter
Rafelsberger","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-02-23T23:22:40Z","message":"[ML]
AIOps: Fix grouping for fields with large arrays. (#177438)\n\n##
Summary\r\n\r\nFixes edge cases for datasets with large arrays within
single fields:\r\n\r\n- Deduplicates groups as a final step of creating
groups.\r\n- Limits how many values (50) to use per field for
the\r\n`frequent_item_sets` aggregations.\r\n- Fixes the `should`
clauses for the query for `frequent_item_sets`, the\r\nprevious version
of the query could be too narrow for fields with arrays\r\nand return no
results.\r\n- For the fallback analysis when either deviation or
baseline returns no\r\ndocs, increases the limit from 10 to 100
docs.\r\n- It turned out the grouping for array values of fields had
another bug:\r\nBecause we treated the field/values of a group as a
dictionary/record\r\nlike structure, this didn't hold multiple values
for a single field. The\r\ncode was changed in this PR so it is an array
of field/value pairs which\r\nnow supports multiple values per
field.\r\n- On the client side, fixes unique keys for the group item
badges if\r\nthere's multiple items for the same field.\r\n\r\nAdds API
integration tests for a dataset with large arrays. This dataset\r\nalso
triggers slowness of the `frequent_item_sets` agg and can be used\r\nfor
a performance journey in a follow up. Without the new limit for
how\r\nmany values per field to use, these new tests would fail because
the agg\r\ncases a timeout. The assertions for chunk and action lengths
were\r\nremoved because they are flaky for longer running requests
(because of\r\nhow we implemented flush fix and keep alive
behavior).\r\n\r\nDataset to test
behavior:\r\n[aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip)\r\n\r\nWithout
this PR, the dataset would cause the grouping part of log
rate\r\nanalysis to time out. With this PR, it's still slow for just 18
docs,\r\nbut it is able to return results.\r\n\r\nVideo to replicate the
test: Upload via ML File Upload, Adjust the date\r\npicker in Log Rate
Analysis, then run the
analysis.\r\n\r\n\r\n[aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88)\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] This was checked for breaking
API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"0d19e5eed93f358e6def5b14520b8d0b7f6aef20","branchLabelMapping":{"^v8.14.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:fix",":ml","Feature:ML/AIOps","v8.13.0","v8.14.0"],"title":"[ML]
AIOps: Fix grouping for fields with large
arrays.","number":177438,"url":"https://github.com/elastic/kibana/pull/177438","mergeCommit":{"message":"[ML]
AIOps: Fix grouping for fields with large arrays. (#177438)\n\n##
Summary\r\n\r\nFixes edge cases for datasets with large arrays within
single fields:\r\n\r\n- Deduplicates groups as a final step of creating
groups.\r\n- Limits how many values (50) to use per field for
the\r\n`frequent_item_sets` aggregations.\r\n- Fixes the `should`
clauses for the query for `frequent_item_sets`, the\r\nprevious version
of the query could be too narrow for fields with arrays\r\nand return no
results.\r\n- For the fallback analysis when either deviation or
baseline returns no\r\ndocs, increases the limit from 10 to 100
docs.\r\n- It turned out the grouping for array values of fields had
another bug:\r\nBecause we treated the field/values of a group as a
dictionary/record\r\nlike structure, this didn't hold multiple values
for a single field. The\r\ncode was changed in this PR so it is an array
of field/value pairs which\r\nnow supports multiple values per
field.\r\n- On the client side, fixes unique keys for the group item
badges if\r\nthere's multiple items for the same field.\r\n\r\nAdds API
integration tests for a dataset with large arrays. This dataset\r\nalso
triggers slowness of the `frequent_item_sets` agg and can be used\r\nfor
a performance journey in a follow up. Without the new limit for
how\r\nmany values per field to use, these new tests would fail because
the agg\r\ncases a timeout. The assertions for chunk and action lengths
were\r\nremoved because they are flaky for longer running requests
(because of\r\nhow we implemented flush fix and keep alive
behavior).\r\n\r\nDataset to test
behavior:\r\n[aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip)\r\n\r\nWithout
this PR, the dataset would cause the grouping part of log
rate\r\nanalysis to time out. With this PR, it's still slow for just 18
docs,\r\nbut it is able to return results.\r\n\r\nVideo to replicate the
test: Upload via ML File Upload, Adjust the date\r\npicker in Log Rate
Analysis, then run the
analysis.\r\n\r\n\r\n[aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88)\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] This was checked for breaking
API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"0d19e5eed93f358e6def5b14520b8d0b7f6aef20"}},"sourceBranch":"main","suggestedTargetBranches":["8.13"],"targetPullRequestStates":[{"branch":"8.13","label":"v8.13.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.14.0","branchLabelMappingKey":"^v8.14.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/177438","number":177438,"mergeCommit":{"message":"[ML]
AIOps: Fix grouping for fields with large arrays. (#177438)\n\n##
Summary\r\n\r\nFixes edge cases for datasets with large arrays within
single fields:\r\n\r\n- Deduplicates groups as a final step of creating
groups.\r\n- Limits how many values (50) to use per field for
the\r\n`frequent_item_sets` aggregations.\r\n- Fixes the `should`
clauses for the query for `frequent_item_sets`, the\r\nprevious version
of the query could be too narrow for fields with arrays\r\nand return no
results.\r\n- For the fallback analysis when either deviation or
baseline returns no\r\ndocs, increases the limit from 10 to 100
docs.\r\n- It turned out the grouping for array values of fields had
another bug:\r\nBecause we treated the field/values of a group as a
dictionary/record\r\nlike structure, this didn't hold multiple values
for a single field. The\r\ncode was changed in this PR so it is an array
of field/value pairs which\r\nnow supports multiple values per
field.\r\n- On the client side, fixes unique keys for the group item
badges if\r\nthere's multiple items for the same field.\r\n\r\nAdds API
integration tests for a dataset with large arrays. This dataset\r\nalso
triggers slowness of the `frequent_item_sets` agg and can be used\r\nfor
a performance journey in a follow up. Without the new limit for
how\r\nmany values per field to use, these new tests would fail because
the agg\r\ncases a timeout. The assertions for chunk and action lengths
were\r\nremoved because they are flaky for longer running requests
(because of\r\nhow we implemented flush fix and keep alive
behavior).\r\n\r\nDataset to test
behavior:\r\n[aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip)\r\n\r\nWithout
this PR, the dataset would cause the grouping part of log
rate\r\nanalysis to time out. With this PR, it's still slow for just 18
docs,\r\nbut it is able to return results.\r\n\r\nVideo to replicate the
test: Upload via ML File Upload, Adjust the date\r\npicker in Log Rate
Analysis, then run the
analysis.\r\n\r\n\r\n[aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88)\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] This was checked for breaking
API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"0d19e5eed93f358e6def5b14520b8d0b7f6aef20"}}]}]
BACKPORT-->

Co-authored-by: Walter Rafelsberger <[email protected]>
semd pushed a commit to semd/kibana that referenced this pull request Mar 4, 2024
## Summary

Fixes edge cases for datasets with large arrays within single fields:

- Deduplicates groups as a final step of creating groups.
- Limits how many values (50) to use per field for the
`frequent_item_sets` aggregations.
- Fixes the `should` clauses for the query for `frequent_item_sets`, the
previous version of the query could be too narrow for fields with arrays
and return no results.
- For the fallback analysis when either deviation or baseline returns no
docs, increases the limit from 10 to 100 docs.
- It turned out the grouping for array values of fields had another bug:
Because we treated the field/values of a group as a dictionary/record
like structure, this didn't hold multiple values for a single field. The
code was changed in this PR so it is an array of field/value pairs which
now supports multiple values per field.
- On the client side, fixes unique keys for the group item badges if
there's multiple items for the same field.

Adds API integration tests for a dataset with large arrays. This dataset
also triggers slowness of the `frequent_item_sets` agg and can be used
for a performance journey in a follow up. Without the new limit for how
many values per field to use, these new tests would fail because the agg
cases a timeout. The assertions for chunk and action lengths were
removed because they are flaky for longer running requests (because of
how we implemented flush fix and keep alive behavior).

Dataset to test behavior:
[aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip)

Without this PR, the dataset would cause the grouping part of log rate
analysis to time out. With this PR, it's still slow for just 18 docs,
but it is able to return results.

Video to replicate the test: Upload via ML File Upload, Adjust the date
picker in Log Rate Analysis, then run the analysis.


[aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88)


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
## Summary

Fixes edge cases for datasets with large arrays within single fields:

- Deduplicates groups as a final step of creating groups.
- Limits how many values (50) to use per field for the
`frequent_item_sets` aggregations.
- Fixes the `should` clauses for the query for `frequent_item_sets`, the
previous version of the query could be too narrow for fields with arrays
and return no results.
- For the fallback analysis when either deviation or baseline returns no
docs, increases the limit from 10 to 100 docs.
- It turned out the grouping for array values of fields had another bug:
Because we treated the field/values of a group as a dictionary/record
like structure, this didn't hold multiple values for a single field. The
code was changed in this PR so it is an array of field/value pairs which
now supports multiple values per field.
- On the client side, fixes unique keys for the group item badges if
there's multiple items for the same field.

Adds API integration tests for a dataset with large arrays. This dataset
also triggers slowness of the `frequent_item_sets` agg and can be used
for a performance journey in a follow up. Without the new limit for how
many values per field to use, these new tests would fail because the agg
cases a timeout. The assertions for chunk and action lengths were
removed because they are flaky for longer running requests (because of
how we implemented flush fix and keep alive behavior).

Dataset to test behavior:
[aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip)

Without this PR, the dataset would cause the grouping part of log rate
analysis to time out. With this PR, it's still slow for just 18 docs,
but it is able to return results.

Video to replicate the test: Upload via ML File Upload, Adjust the date
picker in Log Rate Analysis, then run the analysis.


[aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88)


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis :ml release_note:fix v8.13.0 v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants