[Alerting] Fixing flaky tests #111366

ymao1 · 2021-09-07T11:34:16Z

Resolves #106492
Resolves #111022
Resolves #111001
Resolves #110827
Resolves #110801
Resolves #110789
Resolves #111496

Summary

Started out fixing #106492, which was a failure in superuser at space1 should handle custom retry logic when appropriate but when unskipping the test suite, got a lot of flaky tests related to "expect 204, got 409", so I included changes to fix that in this PR as well.

As @mikecote suggested, the switch to multiple-isolated SO type for alert caused the behavior for deleting rules to change. Now that a rule can have an array of namespaces, deleting that rule could actually just be updating the SO document to remove the rule's namespace from the namespaces array. This leads to potential 409. Updated the rule client delete to use the same retry if conflict logic as the `update function does.

Flaky test runs:

…-test-custom-retry

ymao1 · 2021-09-07T22:53:05Z

x-pack/test/alerting_api_integration/security_and_spaces/tests/alerting/alerts.ts

@@ -502,19 +501,6 @@ instanceStateValue: true
              })
            );

-          // Enqueue non ephemerically so we the latter code can query properly


Removing this is the fix for the superuser at space1 should handle custom retry logic when appropriate flaky test. Discussed with @chrisronline and this was something that was added to update the tests when ephemeral actions were enabled. Since the tests are running with ephemeral disabled, this should have been removed but was overlooked. After removing, I no longer see flakiness in the flaky test runner.

ymao1 · 2021-09-07T22:54:35Z

x-pack/plugins/alerting/server/rules_client/rules_client.ts

@@ -672,6 +672,14 @@ export class RulesClient {
  }

  public async delete({ id }: { id: string }) {
+    return await retryIfConflicts(


Note that I will not be backporting this specific change to 7.x since 7.x should still have the SO type as single.

Do we need this change and the removal code below? I thought the removal of the below code solved the issue?

This fixes the "expected 204, got 409" flaky tests. When I unskipped the test suite that was fixed by removing the dead code, I got a bunch of flaky test failures related to the 204/409 issue so I handled both in the same PR

Is there any concern with this code change for normal use (outside of tests)? Is it possible/a good idea to move this retry logic into the test suites themselves to avoid exposing any new issue by changing the non test code?

This retry logic is already used in the rules client update and updateApiKey functions, which both have the possibility of returning a conflict if multiple Kibanas are updating a rule at the same time. We have to add this now to the delete function for multiple-isolated alert SOs because delete is not just deleting the SO doc with id ${spaceId}:alert:${alertId}. Instead it could be updating an alert SO that is shared between spaces by removing one of the spaces from the namespaces field.

To be clear, the "expected 204, got 409" errors are our functional tests correctly telling us that we've introduced a race condition with the change from single to multiple-isolated for the alert SO type. Yay for tests!

Thanks for the explanation, makes sense to me!

elasticmachine · 2021-09-07T22:54:47Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

…-test-custom-retry

ymao1 · 2021-09-08T18:10:41Z

@elasticmachine merge upstream

chrisronline

LGTM! I'd suggest rerunning CI on this a few times (once it starts passing) to make sure!

ymao1 · 2021-09-08T18:30:38Z

LGTM! I'd suggest rerunning CI on this a few times (once it starts passing) to make sure!

I linked to the flaky test runner runs I ran in the PR description. Total of 126 runs on this CI group with no failures 🎉

ymao1 · 2021-09-08T21:46:18Z

@elasticmachine merge upstream

YulNaumenko

LGTM

ymao1 · 2021-09-09T01:33:58Z

@elasticmachine merge upstream

ymao1 · 2021-09-09T11:27:40Z

@elasticmachine merge upstream

kibanamachine · 2021-09-09T13:39:50Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💚 Build #152091 succeeded d0b1f0c
💔 Build #152062 failed 9342b3d
💔 Build #151943 failed b1eb2f3
💔 Build #151919 failed e18bfce
💚 Build #151664 succeeded da2a69a

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

* Unskipping test * Retrying deletes * Unskipping test * Changing fn signature * hmm * Removing unnecessary code * Unskipping test Co-authored-by: Kibana Machine <[email protected]>

* [Alerting] Fixing flaky tests (#111366) * Unskipping test * Retrying deletes * Unskipping test * Changing fn signature * hmm * Removing unnecessary code * Unskipping test Co-authored-by: Kibana Machine <[email protected]> * Reverting change to delete function in rules client for 7.x Co-authored-by: Kibana Machine <[email protected]>

ymao1 added 4 commits September 2, 2021 14:40

Unskipping test

bfeeabe

Retrying deletes

9d763c4

Merge branch 'master' of https://github.com/elastic/kibana into flaky…

c421156

…-test-custom-retry

Unskipping test

0e34292

ymao1 changed the title ~~Flaky test custom retry~~ [Alerting] Fixing flaky test, expect 204 got 409 Sep 7, 2021

Changing fn signature

cf09172

ymao1 mentioned this pull request Sep 7, 2021

Fix flaky alerting API integration tests which occasionally return 409 errors #111384

Closed

ymao1 added 3 commits September 7, 2021 10:45

hmm

a057bca

Removing unnecessary code

e3be6ac

Merge branch 'master' of https://github.com/elastic/kibana into flaky…

da2a69a

…-test-custom-retry

ymao1 changed the title ~~[Alerting] Fixing flaky test, expect 204 got 409~~ [Alerting] Fixing flaky tests Sep 7, 2021

ymao1 commented Sep 7, 2021

View reviewed changes

ymao1 self-assigned this Sep 7, 2021

ymao1 added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.16.0 v8.0.0 labels Sep 7, 2021

ymao1 commented Sep 7, 2021

View reviewed changes

ymao1 marked this pull request as ready for review September 7, 2021 22:54

ymao1 requested a review from a team as a code owner September 7, 2021 22:54

ymao1 linked an issue Sep 7, 2021 that may be closed by this pull request

Fix flaky alerting API integration tests which occasionally return 409 errors #111384

Closed

ymao1 added 2 commits September 8, 2021 12:56

Merge branch 'master' of https://github.com/elastic/kibana into flaky…

de7903d

…-test-custom-retry

Unskipping test

e18bfce

chrisronline self-requested a review September 8, 2021 17:23

Merge branch 'master' into flaky-test-custom-retry

b1eb2f3

chrisronline approved these changes Sep 8, 2021

View reviewed changes

Merge branch 'master' into flaky-test-custom-retry

9342b3d

YulNaumenko approved these changes Sep 8, 2021

View reviewed changes

Merge branch 'master' into flaky-test-custom-retry

d0b1f0c

Merge branch 'master' into flaky-test-custom-retry

a0de601

ymao1 merged commit 334f129 into elastic:master Sep 9, 2021

ymao1 mentioned this pull request Sep 9, 2021

[7.x] [Alerting] Fixing flaky tests (#111366) #111729

Merged

ymao1 deleted the flaky-test-custom-retry branch September 9, 2021 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting] Fixing flaky tests #111366

[Alerting] Fixing flaky tests #111366

ymao1 commented Sep 7, 2021 •

edited

Loading

ymao1 Sep 7, 2021

ymao1 Sep 7, 2021

chrisronline Sep 8, 2021

ymao1 Sep 8, 2021

chrisronline Sep 8, 2021

ymao1 Sep 8, 2021

ymao1 Sep 8, 2021

chrisronline Sep 8, 2021

elasticmachine commented Sep 7, 2021

ymao1 commented Sep 8, 2021

chrisronline left a comment

ymao1 commented Sep 8, 2021

ymao1 commented Sep 8, 2021

YulNaumenko left a comment

ymao1 commented Sep 9, 2021

ymao1 commented Sep 9, 2021

kibanamachine commented Sep 9, 2021

[Alerting] Fixing flaky tests #111366

[Alerting] Fixing flaky tests #111366

Conversation

ymao1 commented Sep 7, 2021 • edited Loading

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Sep 7, 2021

ymao1 commented Sep 8, 2021

chrisronline left a comment

Choose a reason for hiding this comment

ymao1 commented Sep 8, 2021

ymao1 commented Sep 8, 2021

YulNaumenko left a comment

Choose a reason for hiding this comment

ymao1 commented Sep 9, 2021

ymao1 commented Sep 9, 2021

kibanamachine commented Sep 9, 2021

💚 Build Succeeded

Metrics [docs]

History

ymao1 commented Sep 7, 2021 •

edited

Loading