-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Add retryIfConflict
util for 409
conflicts in Integration tests
#174185
Conversation
retryIfConflict
utilretryIfConflict
util for 409
conflicts in Integration tests
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think retryIfConflicts
does what it should 🙂
...ecurity_solution_api_integration/test_suites/detections_response/utils/retry_if_conflicts.ts
Outdated
Show resolved
Hide resolved
...ecurity_solution_api_integration/test_suites/detections_response/utils/retry_if_conflicts.ts
Outdated
Show resolved
Hide resolved
...est_suites/detections_response/utils/rules/prebuilt_rules/delete_all_prebuilt_rule_assets.ts
Outdated
Show resolved
Hide resolved
Ok, so I had to rethink this a bit. Responses from the Elasticsearch client vary from one type of operation to another, so there's no export interface DeleteByQueryResponse {
batches?: long;
deleted?: long;
failures?: BulkIndexByScrollFailure[];
noops?: long;
requests_per_second?: float;
// other stuff......
total?: long;
version_conflicts?: long;
} and as you can see from the error in the ticket, the (Also, I cannot get access to anything else apart from the body when doing requests directly to ES in integration tests) So I refactored the code so that this utility is specific to retry DeleteByQuery operations if it hits 409 conflict errors, which seems to be, at least until now, the ES operation where this conflict is happening. I think we can try to generalize this utility, or create other specific version of retries if we need them in the future, and use this as a first step. Let me know what you think. Also, rerunning the Flaky test runner. |
..._license/prebuilt_rules/large_prebuilt_rules_package/install_large_prebuilt_rules_package.ts
Outdated
Show resolved
Hide resolved
for (const failure of operationResult?.failures) { | ||
if (failure.status === 409) { | ||
// if no retries left, throw it | ||
if (retries <= 0) { | ||
logger.error(`${name} conflict, exceeded retries`); | ||
throw new Error(`${name} conflict, exceeded retries`); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had this in the deleteAllPrebuiltRuleAssets
function where we call the delete by query method, we could use the generic retry
function and get rid of this retryIfDeleteByQueryConflicts
.
If you think that retryIfDeleteByQueryConflicts
has enough use cases to exist, maybe we could use the retry
function inside of it to avoid conceptual code duplication between retry
and retryIfDeleteByQueryConflicts
.
...ion_api_integration/test_suites/detections_response/utils/retry_delete_by_query_conflicts.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Flaky Test Runner is happy. Approving the PR to not being a blocker for it - posted only a bunch of minor comments. Thanks for addressing the feedback, LGTM 👍
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @jpdjere |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
… Integration tests (elastic#174185) ## Summary Fixes: elastic#171428 **NOTE: the test where this was reported wasn't skipped, so this PR does not unskip any tests.** However, the Flaky Test Runs help us determine that the issue is no longer reproducible. The `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`, presumably from `security-rule` assets that were attempted to be deleted while they were being updated by a parallel process. This PR wraps the `es.deleteByQuery` calls in the utils `deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new `retryIfConflict` helper, that will retry the operation if the ES request fails with a `409`. ## Flaky test run `bundled_prebuilt_rules_package` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790 `large_prebuilt_rules_package` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791 `update_prebuilt_rules_package` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792 `management` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793 ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) (cherry picked from commit b8c7306) # Conflicts: # x-pack/test/security_solution_api_integration/package.json
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync. |
…icts in Integration tests (#174185) (#174762) # Backport This will backport the following commits from `main` to `8.12`: - [[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)](#174185) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Juan Pablo Djeredjian","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-01-11T12:39:45Z","message":"[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)\n\n## Summary\r\n\r\nFixes: https://github.com/elastic/kibana/issues/171428\r\n\r\n**NOTE: the test where this was reported wasn't skipped, so this PR does\r\nnot unskip any tests.** However, the Flaky Test Runs help us determine\r\nthat the issue is no longer reproducible.\r\n\r\nThe `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`,\r\npresumably from `security-rule` assets that were attempted to be deleted\r\nwhile they were being updated by a parallel process.\r\n\r\nThis PR wraps the `es.deleteByQuery` calls in the utils\r\n`deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new\r\n`retryIfConflict` helper, that will retry the operation if the ES\r\nrequest fails with a `409`.\r\n\r\n## Flaky test run\r\n\r\n`bundled_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790\r\n\r\n`large_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791\r\n\r\n`update_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792\r\n\r\n`management` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"b8c7306d241807b68bedbd477dcec232e203f6ad","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["test","release_note:skip","Team:Detections and Resp","Team: SecuritySolution","Team:Detection Rule Management","Feature:Prebuilt Detection Rules","v8.12.0","v8.12.1","v8.13.0"],"number":174185,"url":"https://github.com/elastic/kibana/pull/174185","mergeCommit":{"message":"[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)\n\n## Summary\r\n\r\nFixes: https://github.com/elastic/kibana/issues/171428\r\n\r\n**NOTE: the test where this was reported wasn't skipped, so this PR does\r\nnot unskip any tests.** However, the Flaky Test Runs help us determine\r\nthat the issue is no longer reproducible.\r\n\r\nThe `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`,\r\npresumably from `security-rule` assets that were attempted to be deleted\r\nwhile they were being updated by a parallel process.\r\n\r\nThis PR wraps the `es.deleteByQuery` calls in the utils\r\n`deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new\r\n`retryIfConflict` helper, that will retry the operation if the ES\r\nrequest fails with a `409`.\r\n\r\n## Flaky test run\r\n\r\n`bundled_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790\r\n\r\n`large_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791\r\n\r\n`update_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792\r\n\r\n`management` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"b8c7306d241807b68bedbd477dcec232e203f6ad"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"8.12","label":"v8.12.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/174185","number":174185,"mergeCommit":{"message":"[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)\n\n## Summary\r\n\r\nFixes: https://github.com/elastic/kibana/issues/171428\r\n\r\n**NOTE: the test where this was reported wasn't skipped, so this PR does\r\nnot unskip any tests.** However, the Flaky Test Runs help us determine\r\nthat the issue is no longer reproducible.\r\n\r\nThe `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`,\r\npresumably from `security-rule` assets that were attempted to be deleted\r\nwhile they were being updated by a parallel process.\r\n\r\nThis PR wraps the `es.deleteByQuery` calls in the utils\r\n`deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new\r\n`retryIfConflict` helper, that will retry the operation if the ES\r\nrequest fails with a `409`.\r\n\r\n## Flaky test run\r\n\r\n`bundled_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790\r\n\r\n`large_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791\r\n\r\n`update_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792\r\n\r\n`management` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"b8c7306d241807b68bedbd477dcec232e203f6ad"}}]}] BACKPORT-->
This PR didn't make it into the latest BC for 8.12.0. Updating the labels. |
…sts (#189813) Fixes #176445 ## Summary Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6679 This PR fixes the advanced settings API integration tests that seem to be flaky. The reason for the occasional failures is most likely a `version_conflict_engine_exception` which is thrown when another node indexes the same documents. This can happen when we save an advanced setting since the settings API uses saved objects under the hood, and in CI, multiple nodes can try to save an advanced setting simultaneously. The solution in this PR is to retry the request if we encounter a 409 error. This is adapted from the solution in #174185 which resolves a similar failure.
…sts (elastic#189813) Fixes elastic#176445 Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6679 This PR fixes the advanced settings API integration tests that seem to be flaky. The reason for the occasional failures is most likely a `version_conflict_engine_exception` which is thrown when another node indexes the same documents. This can happen when we save an advanced setting since the settings API uses saved objects under the hood, and in CI, multiple nodes can try to save an advanced setting simultaneously. The solution in this PR is to retry the request if we encounter a 409 error. This is adapted from the solution in elastic#174185 which resolves a similar failure. (cherry picked from commit cc29eea)
…sts (elastic#189813) Fixes elastic#176445 Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6679 This PR fixes the advanced settings API integration tests that seem to be flaky. The reason for the occasional failures is most likely a `version_conflict_engine_exception` which is thrown when another node indexes the same documents. This can happen when we save an advanced setting since the settings API uses saved objects under the hood, and in CI, multiple nodes can try to save an advanced setting simultaneously. The solution in this PR is to retry the request if we encounter a 409 error. This is adapted from the solution in elastic#174185 which resolves a similar failure. (cherry picked from commit cc29eea)
Summary
Fixes: #171428
NOTE: the test where this was reported wasn't skipped, so this PR does not unskip any tests. However, the Flaky Test Runs help us determine that the issue is no longer reproducible.
The
deleteAllPrebuiltRuleAssets
utility reported a409 Conflict
, presumably fromsecurity-rule
assets that were attempted to be deleted while they were being updated by a parallel process.This PR wraps the
es.deleteByQuery
calls in the utilsdeleteAllPrebuiltRuleAssets
anddeleteAllTimelines
with a newretryIfConflict
helper, that will retry the operation if the ES request fails with a409
.Flaky test run
bundled_prebuilt_rules_package
- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790large_prebuilt_rules_package
- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791update_prebuilt_rules_package
- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792management
- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793For maintainers