-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2 migrations should exit process on corrupt saved object document #91465
Conversation
@elasticmachine merge upstream |
@@ -58,12 +58,12 @@ describe('migrateRawDocs', () => { | |||
expect(transform).toHaveBeenNthCalledWith(2, obj2); | |||
}); | |||
|
|||
test('passes invalid docs through untouched and logs error', async () => { | |||
test('throws when encountering a corrupt saved object document', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that v1 migrations will also start failing on corrupt saved objects. When v1 migrations encounter a corrupt saved object it will perform a migration on every restart which can cause data loss in a multi-instance Kibana setup if only one Kibana gets restarted at a time. So although it might block some users from upgrading, deleting the saved object will save them from worse problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When v1 migrations encounter a corrupt saved object it will perform a migration on every restart which can cause data loss in a multi-instance Kibana setup if only one Kibana gets restarted at a time.
Isn't this our recommended way to upgrade kibana? restarting one instance at a time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, all kibana's need to be shutdown before upgrading otherwise v1 migrations will cause data loss.
from https://www.elastic.co/guide/en/kibana/current/upgrade.html
Shut down all Kibana instances. Running more than one Kibana version against the same Elasticseach index is unsupported. Upgrading while older Kibana instances are running can cause data loss or upgrade failures.
I get it that users will have to delete the doc via the Cloud Console or connecting to the ES instance, right? How does this go with the system indices limitation? Should we allow Kibana to start, so we allow the users to access the index via the Dev Console? 🤔 |
We are going to block access to system indices for the dev console in the near future, so I don't think this will be an option. |
} else { | ||
logger.error(e); | ||
|
||
dumpExecutionLog(logger, logMessagePrefix, executionLog); | ||
if (e.message.startsWith('Unable to migrate the corrupt saved object document')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I would use error sub-classing instead of relying on an error message for the handling behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was being lazy :(
Yeah that's a good question. The existing index won't be a system index, so before upgrading to Kibana with system indices, a user can delete the corrupt saved object from their "normal" index. However, if they've already started a v2 migration and it failed because of a corrupt saved object then that corrupt saved object will already be inside the system index. So we will either have to provide a way to force a clean migration that deletes existing indices and starts from scratch, or an option to run migrations in so that it will automatically delete corrupt documents. I'm pretty sure in 99.9% of the cases there's nothing valuable inside a corrupt saved object, but just dropping (even if there's an error log) feels risky, but if it's a runtime option users can decide what behaviour they want. |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
* master: Ability to filter alerts by string parameters (elastic#92036) [APM] Fix for flaky correlations API test (elastic#91673) (elastic#92094) [Enterprise Search] Migrate shared role mapping components (elastic#91723) [file_upload] move ml Importer classes to file_upload plugin (elastic#91559) [Discover] Always show the "hide missing fields" toggle (elastic#91889) v2 migrations should exit process on corrupt saved object document (elastic#91465) [ML] Data Frame Analytics exploration page: filters improvements (elastic#91748) [ML] Data Frame Analytics: Improved error handling for scatterplot matrix. (elastic#91993) [coverage] speed up merging results of functional tests (elastic#92111) Adds a Reason indicator to the onClose handler in AddAlert and EditAlert (elastic#92149)
…lastic#91465) * Fail migrations if a corrupt saved object is encountered * Update test description * Use an error class instead of string matching Co-authored-by: Kibana Machine <[email protected]>
…lastic#91465) * Fail migrations if a corrupt saved object is encountered * Update test description * Use an error class instead of string matching Co-authored-by: Kibana Machine <[email protected]>
…91465) (#92274) * Fail migrations if a corrupt saved object is encountered * Update test description * Use an error class instead of string matching Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
…91465) (#92273) * Fail migrations if a corrupt saved object is encountered * Update test description * Use an error class instead of string matching Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
Summary
Fixes #65612
Previously v2 migrations would show an error and keep on retrying if a corrupt saved object document is encountered instead of failing the migration and exiting Kibana.
This PR fixes it to fail the migration (and exit Kibana process) if a corrupt saved object document is encountered. Corrupt saved object documents are documents that can't be serialized and as a result Kibana cannot migrate these documents. Having unmigrated documents in index can cause all sorts of unexpected issues which are hard to diagnose such as a migration failing because the unmigrated corrupt document is incompatible with the target mappings.
Checklist
Delete any items that are not applicable to this PR.
For maintainers