Test migrations success even on temporary ES failures #158995

Bamieh · 2023-06-04T19:36:42Z

[DRAFT, NO NEED FOR REVIEW YET]

This PR adds test cases to check that when Kibana migrations fail due to a temporary problem in Elasticsearch, Kibana can automatically finish the migration when the failure condition is resolved.

To achieve the ES error simulation during migration, I've done the following:

Proxied the ES client passed to the migrators to simulate errors.
Introduced a subject that emits the current step in the migrator state machine
We can run the migrator in the test and instruct it to fail after a certain step

Currently, the areas of failure testing are on the following migrator steps:

fail es at alias change (final step) MARK_VERSION_INDEX_READY
fail at the update target mappings (modifying operations, first 2 ops) UPDATE_TARGET_MAPPINGS_PROPERTIES and UPDATE_TARGET_MAPPINGS_PROPERTIES_WAIT_FOR_TASK
fail at any step in the clone target mappings CLONE_TEMP_TO_TARGET

A chaos test by writing a test case that fails ES at a random migrator step.

Closes #158818

…-fix'

kibana-ci · 2023-06-04T23:32:42Z

💔 Build Failed

Failed CI Steps

Test Failures

[job] [logs] Jest Integration Tests #4 / split .kibana index into multiple system indices failure cases successfully performs the migrations even if a migrator fails

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`@kbn/core-saved-objects-migration-server-internal`	89	91	+2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`@kbn/core-saved-objects-migration-server-internal`	45	46	+1

Unknown metric groups

API count

id	before	after	diff
`@kbn/core-saved-objects-migration-server-internal`	123	125	+2

ESLint disabled line counts

id	before	after	diff
`enterpriseSearch`	19	21	+2
`securitySolution`	414	418	+4
total			+6

Total ESLint disabled count

id	before	after	diff
`enterpriseSearch`	20	22	+2
`securitySolution`	498	502	+4
total			+6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

…ency (#166924) ## Summary Tackles #158818 The goal of the PR is to introduce failures in single migrators at some of the crucial steps of the migration, testing that consistency is maintained, and that subsequent migration attempts can successfully complete the upgrade. This is done by _proxying_ the `Client` class, which uses the elasticsearch-js library underneath to perform all calls to ES. Inspired on #158995.

rudolf · 2023-10-12T14:34:24Z

Closing as #158818 was completed in another PR

first pass on code

f1cff83

Bamieh requested a review from a team as a code owner June 4, 2023 19:36

Bamieh changed the title ~~first pass on code~~ Test migrations success even on temporary ES failures Jun 4, 2023

Bamieh marked this pull request as draft June 4, 2023 19:37

Bamieh added v8.8.1 release_note:skip Skip the PR/issue when compiling release notes labels Jun 4, 2023

Bamieh requested a review from gsoldevila June 4, 2023 19:49

kibanamachine and others added 6 commits June 4, 2023 20:08

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

ae4cd4c

…-fix'

merge main

a3b67eb

tidy up code

c889987

remove extra code while testing

ad2c0c4

remove argumnets spread

719b351

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

92b92a3

…-fix'

Bamieh mentioned this pull request Aug 3, 2023

8.8.0 test migrations success even on temporary ES failures #158818

Closed

gsoldevila mentioned this pull request Sep 21, 2023

[Migrations] Ensure individual migrator failures do not break consistency #166924

Merged

rudolf closed this Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test migrations success even on temporary ES failures #158995

Test migrations success even on temporary ES failures #158995

Bamieh commented Jun 4, 2023 •

edited

Loading

kibana-ci commented Jun 4, 2023 •

edited

Loading

API count

ESLint disabled line counts

Total ESLint disabled count

rudolf commented Oct 12, 2023

Test migrations success even on temporary ES failures #158995

Test migrations success even on temporary ES failures #158995

Conversation

Bamieh commented Jun 4, 2023 • edited Loading

[DRAFT, NO NEED FOR REVIEW YET]

kibana-ci commented Jun 4, 2023 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

Public APIs missing comments

Public APIs missing exports

API count

ESLint disabled line counts

Total ESLint disabled count

History

rudolf commented Oct 12, 2023

Bamieh commented Jun 4, 2023 •

edited

Loading

kibana-ci commented Jun 4, 2023 •

edited

Loading