Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving saved object management to an ES plugin #49764

Closed
1 task
tylersmalley opened this issue Oct 30, 2019 · 9 comments
Closed
1 task

Moving saved object management to an ES plugin #49764

tylersmalley opened this issue Oct 30, 2019 · 9 comments
Assignees
Labels
discuss Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team v8.0.0

Comments

@tylersmalley
Copy link
Contributor

tylersmalley commented Oct 30, 2019

This issue is a work-in-progress and not to be considered complete. Please raise any possible issues or suggestions.

This is part of a stack wide effort to prevent the handling of internal indices by purely naming convention (prefixed by a period). Doing so will have multiple benefits, including preventing user mutations to the indices through direct or indirect (index templates) means. The general idea being proposed is that each internal index would be represented by an Elasticsearch plugin providing an interface separate to that of the common ES API. Additionally, the common ES API’s would no longer return any data contained in these indices.

Currently, Kibana manages a single alias backed by a single index for multiple types (kibana/task_manager). When Kibana starts up and determines it needs to run a migration, a new index is created and the data is re-indexed. This is not an Elasticsearch reindex, as there are Javascript transformations which are performed. Once that is completed we update the alias. More information on this process located here. Going forward, with what we're calling Saved Object Migrations v2, we're looking to avoid these reindexes and maintain only a single index.

The following indices are included in the Kibana Saved Objects:

  • kibana.index defaults to .kibana
  • xpack.task_manager.index defaults to .kibana_task_manager

Each of these have indices appended with _{int} where each migration increments the integer.

The following ES calls are used by the index migrations (elasticsearch-js name references):

We previously used index templates to allow for the Kibana index to be re-created automatically. We can safely drop this since migrations were added in 6.5.0:

  • cat.templates (name=kibana_index_template*)
  • indices.deleteTemplate

The following is used to perform the migration, ideally these would be removed as part of Saved Object Migrations v2

  • create.index
  • indices.getAlias
  • indices.updateAliases
  • indices.refresh

Access to these saved objects indices should always be done through the saved object client here. I have also checked instances where plugins directly interact with this data, usually through tests for telemetry, but a full audit will need to be done.

  • search
  • mget
  • get
  • index
  • deleteByQuery
  • indices.putSettings - only used in tests here

The reporting index is defined by xpack.reporting.index and the reporting plugin manages this index. They would like to eventually migrate this be managed by saved objects but they need separate lifecycle polices, etc.

  • indices.exists
  • indices.create
  • index
  • index.refresh
  • get
  • update
  • search
  • scroll
  • bulk

It's common for deployments to have separate Kibana instances per Elasticsearch cluster by setting a different kibana.index, xpack.task_manager.index, and xpack.reporting.index. We should merge this into a single configuration key and use it in the ES endpoint _kibana/{kibana.name}/*. Elasticsearch would like to keep the available names under a corresponding configuration in elasticsearch.yml

Questions:

  • How do we handle the migrate to the single index/system index?
  • Can we do anything about the usage collector, like moving it to the ES plugin?
  • With saved objects with independent indices, will they still hit the general endpoint _kibana/{kibana.name}/_search`, or will there be a separate endpoint for it?

Additional Discussions:

Move to single definition to derive the index names which will be included in the ES api path

initial propoal - archived February 25, 2020 **Proposal**

A Kibana plugin would be created in Elasticsearch to provide an API for Kibana to consume. If we were strictly looking to do the minimal amount of work, we could alias all of the existing ES endpoints which we rely on. However, I believe that would complicate the efforts as well as ignore an opportunity regarding saved object migrations.

In addition to the routes required to search and fetch documents, we would also need a way for Kibana to register the desired mappings and migrations. If a migration is required, Elasticsearch would then perform the migration.

With this, migrations would no longer be written using Javascript. During the initial development of migrations, we briefly investigated defining migrations as a reindex script. However, since one of the requirements for migration was to move data stored in JSON strings to actual fields for searching, this was not an option. With this proposal, JSON tools would be available in Painless for the migration script.

There will only be support for a single Kibana per Elasticsearch cluster. Kibana currently supports defining the name of multiple underlying indices (kibana.index, xpack.reporting.index, xpack.task_manager.index, …?) historically used for multi-tenancy. I believe that most of these users can migrate to Spaces. Others can take advantage of Cross-Cluster Search by having a small cluster for each Kibana instance, and use CCS to access the desired shared data. This also aligns with the “one Kibana per cluster” montra during the multiple host discussion for Console.

Questions:

Some teams, including Security are discussing the need to split objects and perform complex migrations. What do we want to support and allow here? Could we have a sort of async migration here that the plugin could kick-off using the task manager?
How do we handle importing documents which need migrated?
Up until this point we have had a single index as it’s easier to manage. With this moving to ES, would it make more sense to have an index/mapping per plugin/type?
What would snapshot/restore look like?
What changes will be necessary for the EsArchiver?
Several plugins are accessing the Saved Objects index directly, what are those use cases and how can we support them going forward?

Background on migrations:

There exists some documentation on our migration process here. To summarize; on startup, we may perform a migration of the objects.

Here is roughly what happenings during one of those migrations:

  • If .kibana (or whatever the Kibana index is named) is not an alias, it will be converted to one:
    • Reindex .kibana into .kibana_1
  • Delete .kibana
    • Create an alias .kibana that points to .kibana_1
  • Create a .kibana_2 index
  • Copy all documents from .kibana_1 into .kibana_2, running them through any applicable migrations
  • Point the .kibana alias to .kibana_2

There are two things plugins define which could result in a migration taking place. The mapping defined by the plugin differs from the Kibana index in ES, or a migration was defined by the plugin which has not be ran on all it’s documents. Since these are both defined by a plugin, it’s possible that migration could be ran by an upgrade, or by a plugin being installed.

Since Kibana does not have a single cluster state across multiple instances, we first attempt to create the next index and allow the migration to run on the instance able to create the index.

Related #11102

  • Prevent access to unsupported version of ES, and add warning to response headers for mismatched patch version. This will allow us to remove the health check.
@tylersmalley tylersmalley added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team v8.0.0 labels Oct 30, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@epixa
Copy link
Contributor

epixa commented Oct 30, 2019

How will rolling upgrade work if Elasticsearch is performing migrations based on an ES plugin? In that scenario, the ES cluster gets upgraded first before the Kibana upgrade procedure starts. This ensures no downtime in Kibana while ES is undergoing a rolling upgrade procedure.

How might a future rolling upgrade procedure in Kibana work with ES-based migrations? We don't have a clear answer to this question when migrations exist in Kibana either, but it's potentially easier to address in the old model since at least the migration process is coupled to Kibana starting up rather than Elasticsearch starting up.

@epixa
Copy link
Contributor

epixa commented Oct 30, 2019

How will third party (non-bundled) plugin migrations work?

Is it safe to assume that migrations would be defined with the storage mechanics (document shape) of Elasticsearch rather than the primitives of the saved object, such as attributes and references? If so, how do we validate/enforce certain aspects of the document shape that we do not want plugins breaking (e.g. namespace prefixes on ids).

@joshdover
Copy link
Contributor

I could be wrong, but my impression was that the migrations themselves would still live in Kibana, written in Painless, but would be applied by the Elasticsearch Kibana plugin.

This would allow Elasticsearch to be able to manage rolling back a failed migration and avoid putting the index in a corrupt or broken state that cannot be recovered from without manual intervention (the current scenario).

@tylersmalley
Copy link
Contributor Author

@joshdover that is correct. On startup, Kibana would register the mappings/migrations with the ES plugin and it would migrate as needed.

@epixa It's kind of separate from this issue, but since you brought up rolling migrations, it might be worth mentioning. We discussed sending the current version mapping/migrations hash with the request, ES could respond with deprecations headers if something out of date was making the request. Some requests would proceed and include a deprecation while requests with side-effects would fail. We need to make improvements to the status endpoint used by load-balancers, but it could also mark the node as down once it's out of date. This doesn't get us to a truly rolling-upgrade, but it's close. Probably also worth mentioning this is possible today but doesn't provide a guarantee for users writing directly to ES outside the SaveObject API.

@tylersmalley
Copy link
Contributor Author

Here is an audit of the existing migrations:

graph-workspace:
7.0.0: Migrate to saved object references #28199

space:
6.6.0 Ensure disabledFeatures defaults to empty array #46181

map:
7.2.0 Implements saved objects references #31745
7.4.0 Convert tile layers with EMS_TMS source to vector tile layers #43777
7.5.0 replaces 1topHitsTimeField1 with sortField and sortOrder #47361
7.6.0 Adds field meta options #51713 and moves applyGlobalQuery from layer to sources. #50523

canvas-workpad
7.0.0 Removes id attribute #30736

task:
7.4.0 Creates updated_at and sets to now #39829

index-pattern:
6.5.0 noop #21117
7.6.0 change to changes to parent/subType #47070

visualization:
6.7.2 Removes timezone #34795
7.0.0 Migrates to references #28199 and removes nested table splits #26057
7.0.1 Removes timezone #34795
7.2.0 Migrate percentile-rank aggregation (value -> values) and remove customInterval #36624
7.3.0 Replace deprecated moving_avg by moving_fn aggregation #36624 and migrates filter aggs query #37287 and replaces text input with QueryBarInput in TSVB #36784
7.3.1 Transform query_string queries into simple lucene query strings in the filters agg #43310
7.4.2 Transforms any remaining split_filters filters that are still strings #49000

dashboard:
7.0.0 Migrates to references #28199
7.3.0 Migrates older style queries into newer format #38945 and modified in #41245

search:
7.0.0 Migrates to references #28199
7.4.0 Migrate legacy sort arrays #43038

@rudolf
Copy link
Contributor

rudolf commented Dec 10, 2019

There will only be support for a single Kibana per Elasticsearch cluster

+1

migrations would no longer be written using Javascript

Running migrations with painless offloads a lot of complexity to Elasticsearch and will also help with performance (Beats CM wants to migrate 100k documents #28506). But it does have some disadvantages in that migrations have to be written in a different language without the typing information that comes from our typescript source code.

Migration failures have three sources:

  1. Exceptions in the migration "framework", this includes migration failures during snapshots Kibana migrations potentially fail if there's a running ES snapshot #47808, attempting to perform rolling upgrades and probably also network layer errors such as ECONNRESET.
  2. A type's migration script makes the wrong assumptions of the shape of the data or encounters "corrupted" objects:
    1. Migration script throws causing the whole migration to fail Fix migration issue with old filter queries #41245 or
    2. Migration script catches exception and persists the un-migrated document as-is to prevent data loss. However, when a field was removed from mappings, persisting the un-migrated document fails because of strict mappings (Dashboard migration issue from 6.1 dashboards #42519, Migration of dashboard fails due to missing panelIndex property #44639).

To fix (2) we need to make better assumptions about our data (using typescript and strict validation?) and have a better strategy for dealing with corrupt documents. I think painless makes it harder to accomplish the former.

I think the best way to deal with corrupt documents is to separately store documents that failed to migrate for later inspection. A document that fails to migrate will never cause the whole migration to fail. Instead Kibana will start up (potentially in a degraded state due to some documents being missing). Users can then inspect their "migration failures" and either delete these documents or manually edit them and "retry migration". Once migration succeeds, the documents will be added back to the saved object type it belonged to.

This will turn a sev 1, "Kibana is offline" into "Kibana is degraded". If we, in addition, introduce the ability to simulate migrations by running migrations against a cluster but not persisting any changes, we can give administrators a very high confidence that there will be no downtime during an upgrade.

@rudolf
Copy link
Contributor

rudolf commented Mar 15, 2021

Closing in favour of #81536 as that's a more up to date record of the effort around system indices in Kibana

@rudolf rudolf closed this as completed Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team v8.0.0
Projects
None yet
Development

No branches or pull requests

5 participants