[ML] AIOps: Fix grouping for fields with large arrays. #177438

walterra · 2024-02-21T13:25:51Z

Summary

Fixes edge cases for datasets with large arrays within single fields:

Deduplicates groups as a final step of creating groups.
Limits how many values (50) to use per field for the frequent_item_sets aggregations.
Fixes the should clauses for the query for frequent_item_sets, the previous version of the query could be too narrow for fields with arrays and return no results.
For the fallback analysis when either deviation or baseline returns no docs, increases the limit from 10 to 100 docs.
It turned out the grouping for array values of fields had another bug: Because we treated the field/values of a group as a dictionary/record like structure, this didn't hold multiple values for a single field. The code was changed in this PR so it is an array of field/value pairs which now supports multiple values per field.
On the client side, fixes unique keys for the group item badges if there's multiple items for the same field.

Adds API integration tests for a dataset with large arrays. This dataset also triggers slowness of the frequent_item_sets agg and can be used for a performance journey in a follow up. Without the new limit for how many values per field to use, these new tests would fail because the agg cases a timeout. The assertions for chunk and action lengths were removed because they are flaky for longer running requests (because of how we implemented flush fix and keep alive behavior).

Dataset to test behavior: aiops-lra-frequent-items-array.ndjson.zip

Without this PR, the dataset would cause the grouping part of log rate analysis to time out. With this PR, it's still slow for just 18 docs, but it is able to return results.

Video to replicate the test: Upload via ML File Upload, Adjust the date picker in Log Rate Analysis, then run the analysis.

aiops-log-rate-analysis-arrays-0001.webm

Checklist

Unit or functional tests were updated or added to match the most common scenarios
Flaky Test Runner was used on any tests changed
This was checked for breaking API changes and was labeled appropriately

walterra · 2024-02-21T13:32:24Z

🔴 50x Flaky Test Runner https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5260 (flaky chunk length assertions)

walterra · 2024-02-21T14:16:16Z

🔴 50x Flaky Test Runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5262 (flaky actions length assertions)

walterra · 2024-02-21T15:03:08Z

🟢 50x Flaky Test Runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5263

elasticmachine · 2024-02-21T15:54:27Z

Pinging @elastic/ml-ui (:ml)

x-pack/test/functional/services/aiops/log_rate_analysis_data_generator.ts

x-pack/plugins/aiops/common/constants.ts

x-pack/plugins/aiops/server/routes/log_rate_analysis/queries/fetch_frequent_item_sets.ts

walterra · 2024-02-22T15:28:38Z

Another Flaky Test Runner after another bugfix for grouping with arrays of values for a single field.

🟢 25x API / 25x Functional Tests Flaky Test Runner https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5289

alvarezmelissa87

Code LGTM ⚡

kibana-ci · 2024-02-23T09:29:43Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 021b102

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #89 / APM API tests correlations/latency.spec.ts trial 8.0.0 "before all" hook: runBefore in "8.0.0"
[job] [logs] Jest Tests #6 / CustomFields renders correctly

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`aiops`	400.5KB	400.5KB	+43.0B

History

💔 Build #195012 failed 156928a
💚 Build #194943 succeeded 6c46d9d
💚 Build #194690 succeeded e378c1c
💔 Build #194647 failed f721ea0446d2c84c4f1d33ec7029af2e5de945b9

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @walterra

qn895 · 2024-02-23T16:53:14Z

Code LGTM 🎉

## Summary Fixes edge cases for datasets with large arrays within single fields: - Deduplicates groups as a final step of creating groups. - Limits how many values (50) to use per field for the `frequent_item_sets` aggregations. - Fixes the `should` clauses for the query for `frequent_item_sets`, the previous version of the query could be too narrow for fields with arrays and return no results. - For the fallback analysis when either deviation or baseline returns no docs, increases the limit from 10 to 100 docs. - It turned out the grouping for array values of fields had another bug: Because we treated the field/values of a group as a dictionary/record like structure, this didn't hold multiple values for a single field. The code was changed in this PR so it is an array of field/value pairs which now supports multiple values per field. - On the client side, fixes unique keys for the group item badges if there's multiple items for the same field. Adds API integration tests for a dataset with large arrays. This dataset also triggers slowness of the `frequent_item_sets` agg and can be used for a performance journey in a follow up. Without the new limit for how many values per field to use, these new tests would fail because the agg cases a timeout. The assertions for chunk and action lengths were removed because they are flaky for longer running requests (because of how we implemented flush fix and keep alive behavior). Dataset to test behavior: [aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip) Without this PR, the dataset would cause the grouping part of log rate analysis to time out. With this PR, it's still slow for just 18 docs, but it is able to return results. Video to replicate the test: Upload via ML File Upload, Adjust the date picker in Log Rate Analysis, then run the analysis. [aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88) ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) (cherry picked from commit 0d19e5e)

kibanamachine · 2024-02-23T23:30:14Z

💚 All backports created successfully

Status	Branch	Result
✅	8.13

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

… (#177765) # Backport This will backport the following commits from `main` to `8.13`: - [[ML] AIOps: Fix grouping for fields with large arrays. (#177438)](#177438)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Walter Rafelsberger <[email protected]>

## Summary Fixes edge cases for datasets with large arrays within single fields: - Deduplicates groups as a final step of creating groups. - Limits how many values (50) to use per field for the `frequent_item_sets` aggregations. - Fixes the `should` clauses for the query for `frequent_item_sets`, the previous version of the query could be too narrow for fields with arrays and return no results. - For the fallback analysis when either deviation or baseline returns no docs, increases the limit from 10 to 100 docs. - It turned out the grouping for array values of fields had another bug: Because we treated the field/values of a group as a dictionary/record like structure, this didn't hold multiple values for a single field. The code was changed in this PR so it is an array of field/value pairs which now supports multiple values per field. - On the client side, fixes unique keys for the group item badges if there's multiple items for the same field. Adds API integration tests for a dataset with large arrays. This dataset also triggers slowness of the `frequent_item_sets` agg and can be used for a performance journey in a follow up. Without the new limit for how many values per field to use, these new tests would fail because the agg cases a timeout. The assertions for chunk and action lengths were removed because they are flaky for longer running requests (because of how we implemented flush fix and keep alive behavior). Dataset to test behavior: [aiops-lra-frequent-items-array.ndjson.zip](https://github.com/elastic/kibana/files/14362105/aiops-lra-frequent-items-array.ndjson.zip) Without this PR, the dataset would cause the grouping part of log rate analysis to time out. With this PR, it's still slow for just 18 docs, but it is able to return results. Video to replicate the test: Upload via ML File Upload, Adjust the date picker in Log Rate Analysis, then run the analysis. [aiops-log-rate-analysis-arrays-0001.webm](https://github.com/elastic/kibana/assets/230104/5d5ce34b-37ef-4e9f-81ae-f8002c194f88) ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

walterra added bug Fixes for quality problems that affect the customer experience release_note:fix :ml v8.13.0 v8.14.0 labels Feb 21, 2024

walterra self-assigned this Feb 21, 2024

walterra added the Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis label Feb 21, 2024

walterra force-pushed the ml-aiops-fix-arrays branch from 40bcb26 to 09c29c6 Compare February 21, 2024 14:13

walterra mentioned this pull request Feb 21, 2024

[ML] Increase Test Coverage 8.13.0 #173301

Closed

11 tasks

walterra added 4 commits February 21, 2024 15:58

fix and API integration tests for large arrays

8e5b7d7

remove chunk length assertions

3cdc6a0

fix unit test

ea22f5e

remove actions length assertions

e378c1c

walterra force-pushed the ml-aiops-fix-arrays branch from 7e6eed0 to e378c1c Compare February 21, 2024 15:01

walterra marked this pull request as ready for review February 21, 2024 15:54

walterra requested a review from a team as a code owner February 21, 2024 15:54

walterra requested review from alvarezmelissa87 and qn895 February 21, 2024 15:54

fix comment

3c8fa45

qn895 reviewed Feb 21, 2024

View reviewed changes

x-pack/test/functional/services/aiops/log_rate_analysis_data_generator.ts Outdated Show resolved Hide resolved

qn895 reviewed Feb 21, 2024

View reviewed changes

x-pack/plugins/aiops/common/constants.ts Outdated Show resolved Hide resolved

qn895 reviewed Feb 21, 2024

View reviewed changes

x-pack/plugins/aiops/server/routes/log_rate_analysis/queries/fetch_frequent_item_sets.ts Show resolved Hide resolved

walterra changed the title ~~[ML] AIOps: Fix and API integration tests for large arrays for Log Rate Analysis~~ [ML] AIOps: Fix grouping for fields with large arrays. Feb 21, 2024

walterra added 3 commits February 22, 2024 12:54

use 'ignore_unavailable: true' instead of separate index exists check

d3ddcb9

fix comment

4d6c47d

Merge branch 'main' into ml-aiops-fix-arrays

6c46d9d

fix array handling for grouping. changes itemset.set to array

156928a

alvarezmelissa87 approved these changes Feb 22, 2024

View reviewed changes

walterra added 2 commits February 23, 2024 07:50

Merge branch 'main' into ml-aiops-fix-arrays

11ba151

fix tests

021b102

qn895 approved these changes Feb 23, 2024

View reviewed changes

walterra merged commit 0d19e5e into elastic:main Feb 23, 2024
18 checks passed

walterra deleted the ml-aiops-fix-arrays branch February 23, 2024 23:22

kibanamachine mentioned this pull request Feb 23, 2024

[8.13] [ML] AIOps: Fix grouping for fields with large arrays. (#177438) #177765

Merged

peteharverson mentioned this pull request Mar 6, 2024

[ML] Increase Test Coverage 8.14.0 #178111

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] AIOps: Fix grouping for fields with large arrays. #177438

[ML] AIOps: Fix grouping for fields with large arrays. #177438

walterra commented Feb 21, 2024 •

edited by kibanamachine

Loading

walterra commented Feb 21, 2024 •

edited

Loading

walterra commented Feb 21, 2024 •

edited

Loading

walterra commented Feb 21, 2024 •

edited

Loading

elasticmachine commented Feb 21, 2024

walterra commented Feb 22, 2024 •

edited

Loading

alvarezmelissa87 left a comment

kibana-ci commented Feb 23, 2024

qn895 commented Feb 23, 2024

kibanamachine commented Feb 23, 2024

[ML] AIOps: Fix grouping for fields with large arrays. #177438

[ML] AIOps: Fix grouping for fields with large arrays. #177438

Conversation

walterra commented Feb 21, 2024 • edited by kibanamachine Loading

Summary

Checklist

walterra commented Feb 21, 2024 • edited Loading

walterra commented Feb 21, 2024 • edited Loading

walterra commented Feb 21, 2024 • edited Loading

elasticmachine commented Feb 21, 2024

walterra commented Feb 22, 2024 • edited Loading

alvarezmelissa87 left a comment

Choose a reason for hiding this comment

kibana-ci commented Feb 23, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

Async chunks

History

qn895 commented Feb 23, 2024

kibanamachine commented Feb 23, 2024

💚 All backports created successfully

Questions ?

walterra commented Feb 21, 2024 •

edited by kibanamachine

Loading

walterra commented Feb 21, 2024 •

edited

Loading

walterra commented Feb 21, 2024 •

edited

Loading

walterra commented Feb 21, 2024 •

edited

Loading

walterra commented Feb 22, 2024 •

edited

Loading