Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve saved objects migrations failure errors and logs #131359

Conversation

TinaHeiligers
Copy link
Contributor

Part of #128585
Resolves #130837

Summary

  • Updates error message links to point to relevant docs sections
  • Improves and uses an error label (unsupported_cluster_routing_allocation) for migration failures related to not having cluster routing allocation set appropriately.

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Risk Probability Severity Mitigation/Notes
References to the online documentation in migrations logs become outdated. Medium High Developers need to ensure that any changes to log messages from within saved objects migrations stay updated.

For maintainers

@TinaHeiligers TinaHeiligers added release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Feature:Migrations v8.3.0 labels May 2, 2022
Copy link
Contributor Author

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR comments and self-review

@@ -114,7 +114,8 @@ describe('unsupported_cluster_routing_allocation', () => {
await root.setup();

await expect(root.start()).rejects.toThrowError(
/Unable to complete saved object migrations for the \[\.kibana.*\] index: The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue\. To proceed, please remove the cluster routing allocation settings with PUT \/_cluster\/settings {"transient": {"cluster\.routing\.allocation\.enable": null}, "persistent": {"cluster\.routing\.allocation\.enable": null}}/
// eslint-disable-next-line prettier/prettier
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I couldn't figure out the right escapes and went with the easy option of disabling prettier here 🤷‍♀️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should need to disable prettier for this. Running eslint locally & it doesn't complain for me when I remove the eslint-disable-next-line:

node scripts/eslint src/core/server/saved_objects/migrations/integration_tests/cluster_routing_allocation_disabled.test.ts

@@ -149,7 +150,8 @@ describe('unsupported_cluster_routing_allocation', () => {
await root.setup();

await expect(root.start()).rejects.toThrowError(
/Unable to complete saved object migrations for the \[\.kibana.*\] index: The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue\. To proceed, please remove the cluster routing allocation settings with PUT \/_cluster\/settings {"transient": {"cluster\.routing\.allocation\.enable": null}, "persistent": {"cluster\.routing\.allocation\.enable": null}}/
// eslint-disable-next-line prettier/prettier
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions on how to escape the chars to override prettier formatting!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work, but as mentioned above, I'm not sure you need to disable prettier anyway.

/Unable to complete saved object migrations for the \[\.kibana\] index: \[unsupported_cluster_routing_allocation\] The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue\. To proceed, please remove the cluster routing allocation settings with PUT \/_cluster\/settings {\"transient\": {\"cluster\.routing\.allocation\.enable\": null}, \"persistent\": {\"cluster\.routing\.allocation\.enable\": null}}\. Refer to https:\/\/www.elastic.co\/guide\/en\/kibana\/master\/resolve-migrations-failures.html#routing-allocation-disabled for more information on how to resolve the issue\./

/**
* Constructs migration failure message string for doc exceeds max batch size in bytes
*/
export const fatalReasonDocumentExceedsMaxBatchSizeBytes = ({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding more and more formatting helpers for the final fatal reason that gets logged. It's easier to find (and test these) when they're in the same place.


/**
* Constructs migration failure message and logs message strings when an unsupported cluster routing allocation is configured.
* The full errorMessage is "[unsupported_cluster_routing_allocation] The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More for information and code search than anything else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TBH it might make things more readable to just concat the messages in model.ts instead of doing it inside the function. e.g.

reason: `${left.message} ${formatClusterRoutingAllocationFailureReason(docLink)}`,

Copy link
Contributor Author

@TinaHeiligers TinaHeiligers May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting inline in the model would be:

`${left.message} To proceed, please remove the cluster routing allocation settings with PUT /_cluster/settings {"transient": {"cluster.routing.allocation.enable": null}, "persistent": {"cluster.routing.allocation.enable": null}}. Refer to ${stateP.migrationDocLinks.routingAllocationDisabled} for more information on how to resolve the issue.`

And that bloats the model even more.

As it is, we have a mix of inline formatting and control flow in the model and what I hope we end up doing is extracting the 'cruft' out of it and stripping the model code down to control-flow only.

});
const newState = model(initState, res) as FatalState;

expect(newState.controlState).toEqual('FATAL');
expect(newState.reason).toMatchInlineSnapshot(
`"The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue. To proceed, please remove the cluster routing allocation settings with PUT /_cluster/settings {\\"transient\\": {\\"cluster.routing.allocation.enable\\": null}, \\"persistent\\": {\\"cluster.routing.allocation.enable\\": null}}"`
`"[unsupported_cluster_routing_allocation] To proceed, please remove the cluster routing allocation settings with PUT /_cluster/settings {\\"transient\\": {\\"cluster.routing.allocation.enable\\": null}, \\"persistent\\": {\\"cluster.routing.allocation.enable\\": null}}. Refer to routingAllocationDisabled for more information on how to resolve the issue."`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using the real, full message in the test input result to keep the snapshot smaller.

@@ -33,17 +38,7 @@ import {
} from './helpers';
import { createBatches } from './create_batches';

const FATAL_REASON_REQUEST_ENTITY_TOO_LARGE = `While indexing a batch of saved objects, Elasticsearch returned a 413 Request Entity Too Large exception. Ensure that the Kibana configuration option 'migrations.maxBatchSizeBytes' is set to a value that is lower than or equal to the Elasticsearch 'http.max_content_length' configuration option.`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to ./extract_errors for consistency

@TinaHeiligers TinaHeiligers marked this pull request as ready for review May 3, 2022 03:03
@TinaHeiligers TinaHeiligers requested review from a team as code owners May 3, 2022 03:03
Copy link
Contributor

@gchaps gchaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asciidoc changes and doclinks LGTM

Copy link
Member

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few little nits/suggestions, but overall lgtm!

@@ -114,7 +114,8 @@ describe('unsupported_cluster_routing_allocation', () => {
await root.setup();

await expect(root.start()).rejects.toThrowError(
/Unable to complete saved object migrations for the \[\.kibana.*\] index: The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue\. To proceed, please remove the cluster routing allocation settings with PUT \/_cluster\/settings {"transient": {"cluster\.routing\.allocation\.enable": null}, "persistent": {"cluster\.routing\.allocation\.enable": null}}/
// eslint-disable-next-line prettier/prettier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should need to disable prettier for this. Running eslint locally & it doesn't complain for me when I remove the eslint-disable-next-line:

node scripts/eslint src/core/server/saved_objects/migrations/integration_tests/cluster_routing_allocation_disabled.test.ts

@@ -149,7 +150,8 @@ describe('unsupported_cluster_routing_allocation', () => {
await root.setup();

await expect(root.start()).rejects.toThrowError(
/Unable to complete saved object migrations for the \[\.kibana.*\] index: The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue\. To proceed, please remove the cluster routing allocation settings with PUT \/_cluster\/settings {"transient": {"cluster\.routing\.allocation\.enable": null}, "persistent": {"cluster\.routing\.allocation\.enable": null}}/
// eslint-disable-next-line prettier/prettier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work, but as mentioned above, I'm not sure you need to disable prettier anyway.

/Unable to complete saved object migrations for the \[\.kibana\] index: \[unsupported_cluster_routing_allocation\] The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue\. To proceed, please remove the cluster routing allocation settings with PUT \/_cluster\/settings {\"transient\": {\"cluster\.routing\.allocation\.enable\": null}, \"persistent\": {\"cluster\.routing\.allocation\.enable\": null}}\. Refer to https:\/\/www.elastic.co\/guide\/en\/kibana\/master\/resolve-migrations-failures.html#routing-allocation-disabled for more information on how to resolve the issue\./

* Constructs migration failure message and logs message strings when an unsupported cluster routing allocation is configured.
* The full errorMessage is "[unsupported_cluster_routing_allocation] The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue."
*/
export const fatalReasonClusterRoutingAllocationUnsupported = ({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming super-nit: It's a little hard to grok the purpose of these functions from their names. I might've gone with something like formatClusterRoutingAllocationFailureReason, formatClusterRoutingAllocationFailureMessage, and formatDocumentExceedsMaxBatchSizeError. But also naming's hard so 🤷


/**
* Constructs migration failure message and logs message strings when an unsupported cluster routing allocation is configured.
* The full errorMessage is "[unsupported_cluster_routing_allocation] The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TBH it might make things more readable to just concat the messages in model.ts instead of doing it inside the function. e.g.

reason: `${left.message} ${formatClusterRoutingAllocationFailureReason(docLink)}`,

@TinaHeiligers TinaHeiligers enabled auto-merge (squash) May 3, 2022 23:52
@TinaHeiligers TinaHeiligers merged commit 4b19c72 into elastic:main May 4, 2022
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
core 288.7KB 288.9KB +205.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@TinaHeiligers TinaHeiligers added auto-backport Deprecated - use backport:version if exact versions are needed and removed backport:skip This commit does not require backporting labels May 4, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label May 4, 2022
@kibanamachine
Copy link
Contributor

⚪ Backport skipped

The pull request was not backported as there were no branches to backport to. If this is a mistake, please apply the desired version labels or run the backport tool manually.

Manual backport

To create the backport manually run:

node scripts/backport --pr 131359

Questions ?

Please refer to the Backport tool documentation

@TinaHeiligers TinaHeiligers removed the auto-backport Deprecated - use backport:version if exact versions are needed label May 4, 2022
academo pushed a commit to academo/kibana that referenced this pull request May 4, 2022
academo pushed a commit to academo/kibana that referenced this pull request May 4, 2022
academo added a commit that referenced this pull request May 5, 2022
* Add severity field to create API and migration

* Adds integration test for severity field migration

* remove exclusive test

* Change severity levels

* Update integration tests for post case

* Add more integration tests

* Fix all cases list test

* Fix some server test

* Fix util server test

* Fix client util test

* Convert event log's duration from number to string in Kibana (keep as "long" in Elasticsearch) (#130819)

* Convert event.duration to string in TypeScript, keep as long in Elasticsearch

* Fix jest test

* Fix functional tests

* Add ecsStringOrNumber to event log schema

* Fix jest test

* Add utility functions to event log plugin

* Use new event log utility functions

* PR fixes

Co-authored-by: Kibana Machine <[email protected]>

* filter o11y rule aggregations (#131301)

* [Cloud Posture] Display and save rules per benchmark (#131412)

* Adding aria-label for discover data grid select document checkbox (#131277)

* Update API docs (#130999)

Co-authored-by: Kibana Machine <[email protected]>

* [CI] Use GCS buckets for bazel remote caching (#131345)

* [Actionable Observability] Add license modal to rules table (#131232)

* Add fix license link

* fix localization

* fix CI error

* fix more translation issues

Co-authored-by: Kibana Machine <[email protected]>

* [RAM] Add shareable rule status filter (#130705)

* rule state filter

* turn off experiment

* [CI] Auto-commit changed files from 'node scripts/eslint --no-cache --fix'

* Status filter API call

* Fix tests

* rename state to status, added tests

* Address comments and fix tests

* Revert experiment flag

* Remove unused translations

* Addressed comments

Co-authored-by: kibanamachine <[email protected]>

* [storybook] Watch for changes in packages (#131467)

* [storybook] Watch for changes in packages

* Update default_config.ts

* Improve saved objects migrations failure errors and logs (#131359)

* [Unified observability] Add tour step to guided setup (#131149)

* [Lens] Improved interval input (#131372)

* [Vega] Adjust vega doc for usage of ems files (#130948)

* adjust vega doc

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: Nick Peihl <[email protected]>

* Excess intersections

* Create severity user action

* Add severity to create_case user action

* Fix and add integration tests

* Minor improvements

Co-authored-by: Mike Côté <[email protected]>
Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: mgiota <[email protected]>
Co-authored-by: Jordan <[email protected]>
Co-authored-by: Bhavya RM <[email protected]>
Co-authored-by: Thomas Neirynck <[email protected]>
Co-authored-by: Brian Seeders <[email protected]>
Co-authored-by: Jiawei Wu <[email protected]>
Co-authored-by: Clint Andrew Hall <[email protected]>
Co-authored-by: Christiane (Tina) Heiligers <[email protected]>
Co-authored-by: Alejandro Fernández Gómez <[email protected]>
Co-authored-by: Joe Reuter <[email protected]>
Co-authored-by: Nick Peihl <[email protected]>
Co-authored-by: Christos Nasikas <[email protected]>
@rudolf rudolf deleted the soMigrations-link-to-docs-subsections branch May 5, 2022 11:20
kertal pushed a commit to kertal/kibana that referenced this pull request May 24, 2022
kertal pushed a commit to kertal/kibana that referenced this pull request May 24, 2022
* Add severity field to create API and migration

* Adds integration test for severity field migration

* remove exclusive test

* Change severity levels

* Update integration tests for post case

* Add more integration tests

* Fix all cases list test

* Fix some server test

* Fix util server test

* Fix client util test

* Convert event log's duration from number to string in Kibana (keep as "long" in Elasticsearch) (elastic#130819)

* Convert event.duration to string in TypeScript, keep as long in Elasticsearch

* Fix jest test

* Fix functional tests

* Add ecsStringOrNumber to event log schema

* Fix jest test

* Add utility functions to event log plugin

* Use new event log utility functions

* PR fixes

Co-authored-by: Kibana Machine <[email protected]>

* filter o11y rule aggregations (elastic#131301)

* [Cloud Posture] Display and save rules per benchmark (elastic#131412)

* Adding aria-label for discover data grid select document checkbox (elastic#131277)

* Update API docs (elastic#130999)

Co-authored-by: Kibana Machine <[email protected]>

* [CI] Use GCS buckets for bazel remote caching (elastic#131345)

* [Actionable Observability] Add license modal to rules table (elastic#131232)

* Add fix license link

* fix localization

* fix CI error

* fix more translation issues

Co-authored-by: Kibana Machine <[email protected]>

* [RAM] Add shareable rule status filter (elastic#130705)

* rule state filter

* turn off experiment

* [CI] Auto-commit changed files from 'node scripts/eslint --no-cache --fix'

* Status filter API call

* Fix tests

* rename state to status, added tests

* Address comments and fix tests

* Revert experiment flag

* Remove unused translations

* Addressed comments

Co-authored-by: kibanamachine <[email protected]>

* [storybook] Watch for changes in packages (elastic#131467)

* [storybook] Watch for changes in packages

* Update default_config.ts

* Improve saved objects migrations failure errors and logs (elastic#131359)

* [Unified observability] Add tour step to guided setup (elastic#131149)

* [Lens] Improved interval input (elastic#131372)

* [Vega] Adjust vega doc for usage of ems files (elastic#130948)

* adjust vega doc

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

* Update docs/user/dashboard/vega-reference.asciidoc

Co-authored-by: Nick Peihl <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: Nick Peihl <[email protected]>

* Excess intersections

* Create severity user action

* Add severity to create_case user action

* Fix and add integration tests

* Minor improvements

Co-authored-by: Mike Côté <[email protected]>
Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: mgiota <[email protected]>
Co-authored-by: Jordan <[email protected]>
Co-authored-by: Bhavya RM <[email protected]>
Co-authored-by: Thomas Neirynck <[email protected]>
Co-authored-by: Brian Seeders <[email protected]>
Co-authored-by: Jiawei Wu <[email protected]>
Co-authored-by: Clint Andrew Hall <[email protected]>
Co-authored-by: Christiane (Tina) Heiligers <[email protected]>
Co-authored-by: Alejandro Fernández Gómez <[email protected]>
Co-authored-by: Joe Reuter <[email protected]>
Co-authored-by: Nick Peihl <[email protected]>
Co-authored-by: Christos Nasikas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Migrations project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient release_note:skip Skip the PR/issue when compiling release notes v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve saved objects migrations failure errors and logs
5 participants