Moving saved object management to an ES plugin #49764

tylersmalley · 2019-10-30T17:49:06Z

This issue is a work-in-progress and not to be considered complete. Please raise any possible issues or suggestions.

This is part of a stack wide effort to prevent the handling of internal indices by purely naming convention (prefixed by a period). Doing so will have multiple benefits, including preventing user mutations to the indices through direct or indirect (index templates) means. The general idea being proposed is that each internal index would be represented by an Elasticsearch plugin providing an interface separate to that of the common ES API. Additionally, the common ES API’s would no longer return any data contained in these indices.

Currently, Kibana manages a single alias backed by a single index for multiple types (kibana/task_manager). When Kibana starts up and determines it needs to run a migration, a new index is created and the data is re-indexed. This is not an Elasticsearch reindex, as there are Javascript transformations which are performed. Once that is completed we update the alias. More information on this process located here. Going forward, with what we're calling Saved Object Migrations v2, we're looking to avoid these reindexes and maintain only a single index.

The following indices are included in the Kibana Saved Objects:

kibana.index defaults to .kibana
xpack.task_manager.index defaults to .kibana_task_manager

Each of these have indices appended with _{int} where each migration increments the integer.

The following ES calls are used by the index migrations (elasticsearch-js name references):

We previously used index templates to allow for the Kibana index to be re-created automatically. We can safely drop this since migrations were added in 6.5.0:

cat.templates (name=kibana_index_template*)
indices.deleteTemplate

The following is used to perform the migration, ideally these would be removed as part of Saved Object Migrations v2

create.index
indices.getAlias
indices.updateAliases
indices.refresh

Access to these saved objects indices should always be done through the saved object client here. I have also checked instances where plugins directly interact with this data, usually through tests for telemetry, but a full audit will need to be done.

search
mget
get
index
deleteByQuery
indices.putSettings - only used in tests here

The reporting index is defined by xpack.reporting.index and the reporting plugin manages this index. They would like to eventually migrate this be managed by saved objects but they need separate lifecycle polices, etc.

indices.exists
indices.create
index
index.refresh
get
update
search
scroll
bulk

It's common for deployments to have separate Kibana instances per Elasticsearch cluster by setting a different kibana.index, xpack.task_manager.index, and xpack.reporting.index. We should merge this into a single configuration key and use it in the ES endpoint _kibana/{kibana.name}/*. Elasticsearch would like to keep the available names under a corresponding configuration in elasticsearch.yml

Questions:

How do we handle the migrate to the single index/system index?
Can we do anything about the usage collector, like moving it to the ES plugin?
With saved objects with independent indices, will they still hit the general endpoint _kibana/{kibana.name}/_search`, or will there be a separate endpoint for it?

Additional Discussions:

Move to single definition to derive the index names which will be included in the ES api path

initial propoal - archived February 25, 2020

**Proposal**

A Kibana plugin would be created in Elasticsearch to provide an API for Kibana to consume. If we were strictly looking to do the minimal amount of work, we could alias all of the existing ES endpoints which we rely on. However, I believe that would complicate the efforts as well as ignore an opportunity regarding saved object migrations.

In addition to the routes required to search and fetch documents, we would also need a way for Kibana to register the desired mappings and migrations. If a migration is required, Elasticsearch would then perform the migration.

With this, migrations would no longer be written using Javascript. During the initial development of migrations, we briefly investigated defining migrations as a reindex script. However, since one of the requirements for migration was to move data stored in JSON strings to actual fields for searching, this was not an option. With this proposal, JSON tools would be available in Painless for the migration script.

There will only be support for a single Kibana per Elasticsearch cluster. Kibana currently supports defining the name of multiple underlying indices (kibana.index, xpack.reporting.index, xpack.task_manager.index, …?) historically used for multi-tenancy. I believe that most of these users can migrate to Spaces. Others can take advantage of Cross-Cluster Search by having a small cluster for each Kibana instance, and use CCS to access the desired shared data. This also aligns with the “one Kibana per cluster” montra during the multiple host discussion for Console.

Questions:

Some teams, including Security are discussing the need to split objects and perform complex migrations. What do we want to support and allow here? Could we have a sort of async migration here that the plugin could kick-off using the task manager?
How do we handle importing documents which need migrated?
Up until this point we have had a single index as it’s easier to manage. With this moving to ES, would it make more sense to have an index/mapping per plugin/type?
What would snapshot/restore look like?
What changes will be necessary for the EsArchiver?
Several plugins are accessing the Saved Objects index directly, what are those use cases and how can we support them going forward?

Background on migrations:

There exists some documentation on our migration process here. To summarize; on startup, we may perform a migration of the objects.

Here is roughly what happenings during one of those migrations:

If .kibana (or whatever the Kibana index is named) is not an alias, it will be converted to one:
- Reindex .kibana into .kibana_1
Delete .kibana
- Create an alias .kibana that points to .kibana_1
Create a .kibana_2 index
Copy all documents from .kibana_1 into .kibana_2, running them through any applicable migrations
Point the .kibana alias to .kibana_2

There are two things plugins define which could result in a migration taking place. The mapping defined by the plugin differs from the Kibana index in ES, or a migration was defined by the plugin which has not be ran on all it’s documents. Since these are both defined by a plugin, it’s possible that migration could be ran by an upgrade, or by a plugin being installed.

Since Kibana does not have a single cluster state across multiple instances, we first attempt to create the next index and allow the migration to run on the instance able to create the index.

Related #11102

Prevent access to unsupported version of ES, and add warning to response headers for mismatched patch version. This will allow us to remove the health check.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-30T17:49:08Z

Pinging @elastic/kibana-operations (Team:Operations)

elasticmachine · 2019-10-30T17:49:10Z

Pinging @elastic/kibana-platform (Team:Platform)

epixa · 2019-10-30T19:35:21Z

How will rolling upgrade work if Elasticsearch is performing migrations based on an ES plugin? In that scenario, the ES cluster gets upgraded first before the Kibana upgrade procedure starts. This ensures no downtime in Kibana while ES is undergoing a rolling upgrade procedure.

How might a future rolling upgrade procedure in Kibana work with ES-based migrations? We don't have a clear answer to this question when migrations exist in Kibana either, but it's potentially easier to address in the old model since at least the migration process is coupled to Kibana starting up rather than Elasticsearch starting up.

epixa · 2019-10-30T19:41:37Z

How will third party (non-bundled) plugin migrations work?

Is it safe to assume that migrations would be defined with the storage mechanics (document shape) of Elasticsearch rather than the primitives of the saved object, such as attributes and references? If so, how do we validate/enforce certain aspects of the document shape that we do not want plugins breaking (e.g. namespace prefixes on ids).

joshdover · 2019-10-30T20:07:31Z

I could be wrong, but my impression was that the migrations themselves would still live in Kibana, written in Painless, but would be applied by the Elasticsearch Kibana plugin.

This would allow Elasticsearch to be able to manage rolling back a failed migration and avoid putting the index in a corrupt or broken state that cannot be recovered from without manual intervention (the current scenario).

tylersmalley · 2019-10-30T21:19:19Z

@joshdover that is correct. On startup, Kibana would register the mappings/migrations with the ES plugin and it would migrate as needed.

@epixa It's kind of separate from this issue, but since you brought up rolling migrations, it might be worth mentioning. We discussed sending the current version mapping/migrations hash with the request, ES could respond with deprecations headers if something out of date was making the request. Some requests would proceed and include a deprecation while requests with side-effects would fail. We need to make improvements to the status endpoint used by load-balancers, but it could also mark the node as down once it's out of date. This doesn't get us to a truly rolling-upgrade, but it's close. Probably also worth mentioning this is possible today but doesn't provide a guarantee for users writing directly to ES outside the SaveObject API.

tylersmalley · 2019-12-10T07:26:12Z

Here is an audit of the existing migrations:

graph-workspace:
7.0.0: Migrate to saved object references #28199

space:
6.6.0 Ensure disabledFeatures defaults to empty array #46181

map:
7.2.0 Implements saved objects references #31745
7.4.0 Convert tile layers with EMS_TMS source to vector tile layers #43777
7.5.0 replaces 1topHitsTimeField1 with sortField and sortOrder #47361
7.6.0 Adds field meta options #51713 and moves applyGlobalQuery from layer to sources. #50523

canvas-workpad
7.0.0 Removes id attribute #30736

task:
7.4.0 Creates updated_at and sets to now #39829

index-pattern:
6.5.0 noop #21117
7.6.0 change to changes to parent/subType #47070

visualization:
6.7.2 Removes timezone #34795
7.0.0 Migrates to references #28199 and removes nested table splits #26057
7.0.1 Removes timezone #34795
7.2.0 Migrate percentile-rank aggregation (value -> values) and remove customInterval #36624
7.3.0 Replace deprecated moving_avg by moving_fn aggregation #36624 and migrates filter aggs query #37287 and replaces text input with QueryBarInput in TSVB #36784
7.3.1 Transform query_string queries into simple lucene query strings in the filters agg #43310
7.4.2 Transforms any remaining split_filters filters that are still strings #49000

dashboard:
7.0.0 Migrates to references #28199
7.3.0 Migrates older style queries into newer format #38945 and modified in #41245

search:
7.0.0 Migrates to references #28199
7.4.0 Migrate legacy sort arrays #43038

rudolf · 2019-12-10T12:00:05Z

There will only be support for a single Kibana per Elasticsearch cluster

+1

migrations would no longer be written using Javascript

Running migrations with painless offloads a lot of complexity to Elasticsearch and will also help with performance (Beats CM wants to migrate 100k documents #28506). But it does have some disadvantages in that migrations have to be written in a different language without the typing information that comes from our typescript source code.

Migration failures have three sources:

Exceptions in the migration "framework", this includes migration failures during snapshots Kibana migrations potentially fail if there's a running ES snapshot #47808, attempting to perform rolling upgrades and probably also network layer errors such as ECONNRESET.
A type's migration script makes the wrong assumptions of the shape of the data or encounters "corrupted" objects:
1. Migration script throws causing the whole migration to fail Fix migration issue with old filter queries #41245 or
2. Migration script catches exception and persists the un-migrated document as-is to prevent data loss. However, when a field was removed from mappings, persisting the un-migrated document fails because of strict mappings (Dashboard migration issue from 6.1 dashboards #42519, Migration of dashboard fails due to missing panelIndex property #44639).

To fix (2) we need to make better assumptions about our data (using typescript and strict validation?) and have a better strategy for dealing with corrupt documents. I think painless makes it harder to accomplish the former.

I think the best way to deal with corrupt documents is to separately store documents that failed to migrate for later inspection. A document that fails to migrate will never cause the whole migration to fail. Instead Kibana will start up (potentially in a degraded state due to some documents being missing). Users can then inspect their "migration failures" and either delete these documents or manually edit them and "retry migration". Once migration succeeds, the documents will be added back to the saved object type it belonged to.

This will turn a sev 1, "Kibana is offline" into "Kibana is degraded". If we, in addition, introduce the ability to simulate migrations by running migrations against a cluster but not persisting any changes, we can give administrators a very high confidence that there will be no downtime during an upgrade.

rudolf · 2021-03-15T21:18:20Z

Closing in favour of #81536 as that's a more up to date record of the effort around system indices in Kibana

tylersmalley added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team v8.0.0 labels Oct 30, 2019

pgomulka mentioned this issue Dec 4, 2019

Implement auto-upgrade mechanism for joda-java migration elastic/elasticsearch#45548

Closed

rudolf mentioned this issue Dec 10, 2019

Improve Saved Object Migrations to minimize operational impact of Kibana upgrades #52202

Closed

rudolf added the Feature:Saved Objects label Dec 10, 2019

pgomulka mentioned this issue Jan 22, 2020

Implement auto upgrade for date fields #55533

Closed

rudolf mentioned this issue Feb 5, 2020

Restructure SavedObject types internal representation #56378

Merged

4 tasks

tylersmalley mentioned this issue Feb 19, 2020

Unable to upgrade kibana - forbidden #40987

Closed

jinmu03 assigned tylersmalley Feb 21, 2020

jaymode mentioned this issue Feb 26, 2020

Introduce system index APIs for Kibana elastic/elasticsearch#52385

Merged

This was referenced Mar 12, 2020

[discuss] Removal of kibana.index configuration setting #60053

Closed

Specify index.refresh_interval on index creation #15202

Closed

rudolf mentioned this issue Mar 31, 2020

Sanity check internal index settings #24266

Closed

rudolf closed this as completed Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving saved object management to an ES plugin #49764

Moving saved object management to an ES plugin #49764

tylersmalley commented Oct 30, 2019 •

edited

Loading

elasticmachine commented Oct 30, 2019

elasticmachine commented Oct 30, 2019

epixa commented Oct 30, 2019

epixa commented Oct 30, 2019

joshdover commented Oct 30, 2019

tylersmalley commented Oct 30, 2019

tylersmalley commented Dec 10, 2019

rudolf commented Dec 10, 2019

rudolf commented Mar 15, 2021

Moving saved object management to an ES plugin #49764

Moving saved object management to an ES plugin #49764

Comments

tylersmalley commented Oct 30, 2019 • edited Loading

elasticmachine commented Oct 30, 2019

elasticmachine commented Oct 30, 2019

epixa commented Oct 30, 2019

epixa commented Oct 30, 2019

joshdover commented Oct 30, 2019

tylersmalley commented Oct 30, 2019

tylersmalley commented Dec 10, 2019

rudolf commented Dec 10, 2019

rudolf commented Mar 15, 2021

tylersmalley commented Oct 30, 2019 •

edited

Loading