Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test: Serverless Observability Feature Flags API Integration Tests.x-pack/test_serverless/api_integration/test_suites/observability/custom_threshold_rule/group_by_fired·ts - Serverless observability API - feature flags Custom Threshold Rule Custom Threshold rule - GROUP_BY - FIRED Rule creation should set correct action variables #175776

Open
kibanamachine opened this issue Jan 29, 2024 · 4 comments
Assignees
Labels
failed-test A test failure on a tracked branch, potentially flaky-test Team:obs-ux-management Observability Management User Experience Team

Comments

@kibanamachine
Copy link
Contributor

kibanamachine commented Jan 29, 2024

A test failed on a tracked branch

Error: expected undefined to sort of equal 'observability.rules.custom_threshold'
    at Assertion.assert (expect.js:100:11)
    at Assertion.eql (expect.js:244:8)
    at Context.<anonymous> (group_by_fired.ts:253:53)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at Object.apply (wrap_function.js:73:16) {
  actual: undefined,
  expected: 'observability.rules.custom_threshold',
  showDiff: true
}

First failure: CI Build - main

@kibanamachine kibanamachine added the failed-test A test failure on a tracked branch, potentially flaky-test label Jan 29, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Jan 29, 2024
@mistic mistic added the Team:obs-ux-management Observability Management User Experience Team label Jan 29, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 29, 2024
@maryam-saeidi maryam-saeidi self-assigned this Mar 6, 2024
@maryam-saeidi
Copy link
Member

maryam-saeidi commented Mar 6, 2024

History of this failure

image

From the 4 instances above, 2 of them were wrong values in test that was fixed in the PR, and for the other 2, the root cause mentioned in the logs is:

index_not_found_exception: no such index [alert-action-threshold]

In both cases, we have the following error log in task manager:

[00:02:02]             │ proc [kibana] [2024-01-29T10:53:05.557+00:00][ERROR][plugins.taskManager] Task actions:.index "7484ee48-6ac2-4c5c-a778-8fef443c4025" 
failed: Error: Saved object [action/6a53161a-9e45-450e-b77a-bc3bfa399b96] not found 
{"tags":["actions:.index","7484ee48-6ac2-4c5c-a778-8fef443c4025","task-run-failed"],
"error":{"stack_trace":"Error: Saved object [action/6a53161a-9e45-450e-b77a-bc3bfa399b96] not found\n    
at Function.createGenericNotFoundError (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/core-saved-objects-server/src/saved_objects_error_helpers.js:236:37)\n    at performGet (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/core-saved-objects-api-server-internal/src/lib/apis/get.js:73:60)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at SavedObjectsRepository.get (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/core-saved-objects-api-server-internal/src/lib/repository.js:218:12)\n    at Object.getDecryptedAsInternalUser (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/encrypted-saved-objects-plugin/server/saved_objects/index.js:50:29)\n    at ActionExecutor.getActionInfoInternal (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/actions-plugin/server/lib/action_executor.js:377:25)\n    at Object.loadIndirectParams (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/actions-plugin/server/lib/task_runner_factory.js:81:30)\n    at TaskManagerRunner.validateIndirectTaskParams (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/task-manager-plugin/server/task_running/task_runner.js:419:11)\n    at TaskManagerRunner.run (/var/lib/buildkite-agent/builds/kb-n2-4-spot-17cd2aa3ed32e2f7/elastic/kibana-on-merge/kibana-build-xpack/node_modules/@kbn/task-manager-plugin/server/task_running/task_runner.js:320:34)"},"service":{"node":{"roles":["background_tasks","ui"]}}}

Started a discussion in Slack.

Update

It seems eventually a document is found but the information in that document is not as we expetected:

debg Found 1 docs, looking for atleast 1.

maryam-saeidi added a commit that referenced this issue Mar 19, 2024
…hod from retryService (#178515)

Related to #176401, #175776

## Summary

This PR:

- Improves logging (I've added debug logs to the helpers that does an
API request such as creating a data view)
- Uses retryService instead of pRetry
- In case of throwing an error in pRetry, when we have 10 retries, it
does not log the retry attempts and we end up in the situation that is
mentioned in this [comment, item
3](#176401 (comment))
    
|Before|After|
|---|---|

|![image](https://github.com/elastic/kibana/assets/12370520/576146f2-09da-4221-a570-6d47e047f229)|![image](https://github.com/elastic/kibana/assets/12370520/0a0897a3-0bd3-4d44-9b79-8f99fb580b4a)|
- Attempts to fix flakiness in rate reason message due to having
different data

![image](https://github.com/elastic/kibana/assets/12370520/dff48ac1-a9bf-4b93-addb-fd40acae382e)


### Flaky test runner
#### Current (after adding refresh index and adjusting timeout)
- [25]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5463
✅
- [200]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5465
✅

#### Old
- [25]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5452
✅
- [200]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5454
[1 Failed : 25 Canceled: 174 Passed ]
  ##### After checking data is generated in metric threshold
- [25]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5460
✅
- [200]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5462
[1 Failed : 199 Canceled ]

Inspired by #173998, special
thanks to @jpdjere and @dmlemeshko for their support and knowledge
sharing.
@maryam-saeidi
Copy link
Member

I made some improvements in this PR and will close this issue as I didn't see a failure when I ran the test 200 times after the improvement. I've also added some logs, so if it happens again, hopefully, it will be easier to investigate it.

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - 8.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed-test A test failure on a tracked branch, potentially flaky-test Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

4 participants