-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Fixing flaky tests #111366
[Alerting] Fixing flaky tests #111366
Changes from 11 commits
bfeeabe
9d763c4
c421156
0e34292
cf09172
a057bca
e3be6ac
da2a69a
de7903d
e18bfce
b1eb2f3
9342b3d
d0b1f0c
a0de601
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,8 +39,7 @@ export default function alertTests({ getService }: FtrProviderContext) { | |
const esTestIndexTool = new ESTestIndexTool(es, retry); | ||
const taskManagerUtils = new TaskManagerUtils(es, retry); | ||
|
||
// FLAKY: https://github.com/elastic/kibana/issues/106492 | ||
describe.skip('alerts', () => { | ||
describe('alerts', () => { | ||
const authorizationIndex = '.kibana-test-authorization'; | ||
const objectRemover = new ObjectRemover(supertest); | ||
|
||
|
@@ -502,19 +501,6 @@ instanceStateValue: true | |
}) | ||
); | ||
|
||
// Enqueue non ephemerically so we the latter code can query properly | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing this is the fix for the |
||
const enqueueResponse = await supertest | ||
.post(`${getUrlPrefix(space.id)}/api/alerts_fixture/${createdAction.id}/enqueue_action`) | ||
.set('kbn-xsrf', 'foo') | ||
.send({ | ||
params: { | ||
reference, | ||
index: ES_TEST_INDEX_NAME, | ||
retryAt: retryDate.getTime(), | ||
}, | ||
}); | ||
expect(enqueueResponse.status).to.eql(204); | ||
|
||
switch (scenario.id) { | ||
case 'no_kibana_privileges at space1': | ||
case 'global_read at space1': | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I will not be backporting this specific change to 7.x since 7.x should still have the SO type as
single
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change and the removal code below? I thought the removal of the below code solved the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the "expected 204, got 409" flaky tests. When I unskipped the test suite that was fixed by removing the dead code, I got a bunch of flaky test failures related to the 204/409 issue so I handled both in the same PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any concern with this code change for normal use (outside of tests)? Is it possible/a good idea to move this retry logic into the test suites themselves to avoid exposing any new issue by changing the non test code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This retry logic is already used in the rules client update and updateApiKey functions, which both have the possibility of returning a conflict if multiple Kibanas are updating a rule at the same time. We have to add this now to the delete function for
multiple-isolated
alert SOs because delete is not just deleting the SO doc with id${spaceId}:alert:${alertId}
. Instead it could be updating an alert SO that is shared between spaces by removing one of the spaces from thenamespaces
field.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the "expected 204, got 409" errors are our functional tests correctly telling us that we've introduced a race condition with the change from
single
tomultiple-isolated
for the alert SO type. Yay for tests!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, makes sense to me!