v2 migrations should exit process on corrupt saved object document #91465

rudolf · 2021-02-16T11:01:35Z

Summary

Previously v2 migrations would show an error and keep on retrying if a corrupt saved object document is encountered instead of failing the migration and exiting Kibana.

This PR fixes it to fail the migration (and exit Kibana process) if a corrupt saved object document is encountered. Corrupt saved object documents are documents that can't be serialized and as a result Kibana cannot migrate these documents. Having unmigrated documents in index can cause all sorts of unexpected issues which are hard to diagnose such as a migration failing because the unmigrated corrupt document is incompatible with the target mappings.

Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
This was checked for cross-browser compatibility

For maintainers

This was checked for breaking API changes and was labeled appropriately

rudolf · 2021-02-19T08:53:12Z

@elasticmachine merge upstream

rudolf · 2021-02-19T08:57:34Z

src/core/server/saved_objects/migrations/core/migrate_raw_docs.test.ts

@@ -58,12 +58,12 @@ describe('migrateRawDocs', () => {
    expect(transform).toHaveBeenNthCalledWith(2, obj2);
  });

-  test('passes invalid docs through untouched and logs error', async () => {
+  test('throws when encountering a corrupt saved object document', async () => {


This means that v1 migrations will also start failing on corrupt saved objects. When v1 migrations encounter a corrupt saved object it will perform a migration on every restart which can cause data loss in a multi-instance Kibana setup if only one Kibana gets restarted at a time. So although it might block some users from upgrading, deleting the saved object will save them from worse problems.

When v1 migrations encounter a corrupt saved object it will perform a migration on every restart which can cause data loss in a multi-instance Kibana setup if only one Kibana gets restarted at a time.

Isn't this our recommended way to upgrade kibana? restarting one instance at a time

No, all kibana's need to be shutdown before upgrading otherwise v1 migrations will cause data loss.

from https://www.elastic.co/guide/en/kibana/current/upgrade.html

Shut down all Kibana instances. Running more than one Kibana version against the same Elasticseach index is unsupported. Upgrading while older Kibana instances are running can cause data loss or upgrade failures.

afharo · 2021-02-19T16:21:05Z

I get it that users will have to delete the doc via the Cloud Console or connecting to the ES instance, right? How does this go with the system indices limitation?

Should we allow Kibana to start, so we allow the users to access the index via the Dev Console? 🤔

pgayvallet · 2021-02-22T10:35:24Z

Should we allow Kibana to start, so we allow the users to access the index via the Dev Console

We are going to block access to system indices for the dev console in the near future, so I don't think this will be an option.

pgayvallet · 2021-02-22T10:46:13Z

src/core/server/saved_objects/migrationsv2/migrations_state_action_machine.ts

    } else {
      logger.error(e);
+
+      dumpExecutionLog(logger, logMessagePrefix, executionLog);
+      if (e.message.startsWith('Unable to migrate the corrupt saved object document')) {


NIT: I would use error sub-classing instead of relying on an error message for the handling behavior change.

yeah I was being lazy :(

rudolf · 2021-02-22T12:07:58Z

I get it that users will have to delete the doc via the Cloud Console or connecting to the ES instance, right? How does this go with the system indices limitation?
Should we allow Kibana to start, so we allow the users to access the index via the Dev Console? 🤔

Yeah that's a good question.

The existing index won't be a system index, so before upgrading to Kibana with system indices, a user can delete the corrupt saved object from their "normal" index. However, if they've already started a v2 migration and it failed because of a corrupt saved object then that corrupt saved object will already be inside the system index. So we will either have to provide a way to force a clean migration that deletes existing indices and starts from scratch, or an option to run migrations in so that it will automatically delete corrupt documents. I'm pretty sure in 99.9% of the cases there's nothing valuable inside a corrupt saved object, but just dropping (even if there's an error log) feels risky, but if it's a runtime option users can decide what behaviour they want.

kibanamachine · 2021-02-22T15:45:13Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💚 Build #107929 succeeded 96fa4e8
💚 Build #106763 succeeded 24bc2ee

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

* master: Ability to filter alerts by string parameters (elastic#92036) [APM] Fix for flaky correlations API test (elastic#91673) (elastic#92094) [Enterprise Search] Migrate shared role mapping components (elastic#91723) [file_upload] move ml Importer classes to file_upload plugin (elastic#91559) [Discover] Always show the "hide missing fields" toggle (elastic#91889) v2 migrations should exit process on corrupt saved object document (elastic#91465) [ML] Data Frame Analytics exploration page: filters improvements (elastic#91748) [ML] Data Frame Analytics: Improved error handling for scatterplot matrix. (elastic#91993) [coverage] speed up merging results of functional tests (elastic#92111) Adds a Reason indicator to the onClose handler in AddAlert and EditAlert (elastic#92149)

…lastic#91465) * Fail migrations if a corrupt saved object is encountered * Update test description * Use an error class instead of string matching Co-authored-by: Kibana Machine <[email protected]>

…91465) (#92274) * Fail migrations if a corrupt saved object is encountered * Update test description * Use an error class instead of string matching Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>

…91465) (#92273) * Fail migrations if a corrupt saved object is encountered * Update test description * Use an error class instead of string matching Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>

Fail migrations if a corrupt saved object is encountered

24bc2ee

kibanamachine and others added 2 commits February 19, 2021 03:53

Merge branch 'master' into migrations-v2-corrupts-docs

d5b7431

Update test description

96fa4e8

rudolf changed the title ~~migrations-v2-corrupts-docs~~ v2 migrations should exit process on corrupt saved object document Feb 19, 2021

rudolf added bug Fixes for quality problems that affect the customer experience Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Feb 19, 2021

rudolf added v7.12.0 v8.0.0 labels Feb 19, 2021

rudolf commented Feb 19, 2021

View reviewed changes

rudolf marked this pull request as ready for review February 19, 2021 09:04

rudolf requested a review from a team as a code owner February 19, 2021 09:04

rudolf added release_note:skip Skip the PR/issue when compiling release notes release_note:breaking and removed release_note:skip Skip the PR/issue when compiling release notes release_note:breaking labels Feb 19, 2021

pgayvallet reviewed Feb 22, 2021

View reviewed changes

pgayvallet approved these changes Feb 22, 2021

View reviewed changes

rudolf added 2 commits February 22, 2021 14:05

Merge branch 'master' into migrations-v2-corrupts-docs

61ca6cf

Use an error class instead of string matching

60f198f

rudolf enabled auto-merge (squash) February 22, 2021 13:36

rudolf merged commit dc475c9 into elastic:master Feb 22, 2021

rudolf mentioned this pull request Feb 22, 2021

[7.x] v2 migrations should exit process on corrupt saved object document (#91465) #92273

Merged

rudolf mentioned this pull request Feb 22, 2021

[7.12] v2 migrations should exit process on corrupt saved object document (#91465) #92274

Merged

rudolf deleted the migrations-v2-corrupts-docs branch February 23, 2021 10:51

rudolf mentioned this pull request Mar 22, 2021

Implement SavedObject Migrations v2 #75780

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2 migrations should exit process on corrupt saved object document #91465

v2 migrations should exit process on corrupt saved object document #91465

rudolf commented Feb 16, 2021 •

edited

Loading

rudolf commented Feb 19, 2021

rudolf Feb 19, 2021

Bamieh Feb 19, 2021

rudolf Feb 19, 2021

afharo commented Feb 19, 2021

pgayvallet commented Feb 22, 2021

pgayvallet Feb 22, 2021

rudolf Feb 22, 2021

rudolf commented Feb 22, 2021

kibanamachine commented Feb 22, 2021

v2 migrations should exit process on corrupt saved object document #91465

v2 migrations should exit process on corrupt saved object document #91465

Conversation

rudolf commented Feb 16, 2021 • edited Loading

Summary

Checklist

For maintainers

rudolf commented Feb 19, 2021

rudolf Feb 19, 2021

Choose a reason for hiding this comment

Bamieh Feb 19, 2021

Choose a reason for hiding this comment

rudolf Feb 19, 2021

Choose a reason for hiding this comment

afharo commented Feb 19, 2021

pgayvallet commented Feb 22, 2021

pgayvallet Feb 22, 2021

Choose a reason for hiding this comment

rudolf Feb 22, 2021

Choose a reason for hiding this comment

rudolf commented Feb 22, 2021

kibanamachine commented Feb 22, 2021

💚 Build Succeeded

Metrics [docs]

History

rudolf commented Feb 16, 2021 •

edited

Loading