Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Fixing flaky tests #111366

Merged
merged 14 commits into from
Sep 9, 2021
Merged
8 changes: 8 additions & 0 deletions x-pack/plugins/alerting/server/rules_client/rules_client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,14 @@ export class RulesClient {
}

public async delete({ id }: { id: string }) {
return await retryIfConflicts(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I will not be backporting this specific change to 7.x since 7.x should still have the SO type as single.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this change and the removal code below? I thought the removal of the below code solved the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the "expected 204, got 409" flaky tests. When I unskipped the test suite that was fixed by removing the dead code, I got a bunch of flaky test failures related to the 204/409 issue so I handled both in the same PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern with this code change for normal use (outside of tests)? Is it possible/a good idea to move this retry logic into the test suites themselves to avoid exposing any new issue by changing the non test code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This retry logic is already used in the rules client update and updateApiKey functions, which both have the possibility of returning a conflict if multiple Kibanas are updating a rule at the same time. We have to add this now to the delete function for multiple-isolated alert SOs because delete is not just deleting the SO doc with id ${spaceId}:alert:${alertId}. Instead it could be updating an alert SO that is shared between spaces by removing one of the spaces from the namespaces field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the "expected 204, got 409" errors are our functional tests correctly telling us that we've introduced a race condition with the change from single to multiple-isolated for the alert SO type. Yay for tests!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, makes sense to me!

this.logger,
`rulesClient.delete('${id}')`,
async () => await this.deleteWithOCC({ id })
);
}

private async deleteWithOCC({ id }: { id: string }) {
let taskIdToRemove: string | undefined | null;
let apiKeyToInvalidate: string | null = null;
let attributes: RawAlert;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,7 @@ export default function alertTests({ getService }: FtrProviderContext) {
const esTestIndexTool = new ESTestIndexTool(es, retry);
const taskManagerUtils = new TaskManagerUtils(es, retry);

// FLAKY: https://github.com/elastic/kibana/issues/106492
describe.skip('alerts', () => {
describe('alerts', () => {
const authorizationIndex = '.kibana-test-authorization';
const objectRemover = new ObjectRemover(supertest);

Expand Down Expand Up @@ -502,19 +501,6 @@ instanceStateValue: true
})
);

// Enqueue non ephemerically so we the latter code can query properly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this is the fix for the superuser at space1 should handle custom retry logic when appropriate flaky test. Discussed with @chrisronline and this was something that was added to update the tests when ephemeral actions were enabled. Since the tests are running with ephemeral disabled, this should have been removed but was overlooked. After removing, I no longer see flakiness in the flaky test runner.

const enqueueResponse = await supertest
.post(`${getUrlPrefix(space.id)}/api/alerts_fixture/${createdAction.id}/enqueue_action`)
.set('kbn-xsrf', 'foo')
.send({
params: {
reference,
index: ES_TEST_INDEX_NAME,
retryAt: retryDate.getTime(),
},
});
expect(enqueueResponse.status).to.eql(204);

switch (scenario.id) {
case 'no_kibana_privileges at space1':
case 'global_read at space1':
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,7 @@ export default function createDeleteTests({ getService }: FtrProviderContext) {
const retry = getService('retry');
const supertestWithoutAuth = getService('supertestWithoutAuth');

// FLAKY https://github.com/elastic/kibana/issues/111001
describe.skip('delete', () => {
describe('delete', () => {
const objectRemover = new ObjectRemover(supertest);

after(() => objectRemover.removeAll());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@ export default function createFindTests({ getService }: FtrProviderContext) {
const supertest = getService('supertest');
const supertestWithoutAuth = getService('supertestWithoutAuth');

// FLAKY https://github.com/elastic/kibana/issues/111022
describe.skip('find', () => {
describe('find', () => {
const objectRemover = new ObjectRemover(supertest);

afterEach(() => objectRemover.removeAll());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@ export default function createGetTests({ getService }: FtrProviderContext) {
const supertest = getService('supertest');
const supertestWithoutAuth = getService('supertestWithoutAuth');

// FLAKY https://github.com/elastic/kibana/issues/111496
describe.skip('get', () => {
describe('get', () => {
const objectRemover = new ObjectRemover(supertest);

afterEach(() => objectRemover.removeAll());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,7 @@ export default function createUpdateTests({ getService }: FtrProviderContext) {
.then((response: SupertestResponse) => response.body);
}

// FLAKY: https://github.com/elastic/kibana/issues/110801
describe.skip('update', () => {
describe('update', () => {
const objectRemover = new ObjectRemover(supertest);

after(() => objectRemover.removeAll());
Expand Down