Mandatory data tier preference #76147

henningandersen · 2021-08-05T07:37:10Z

Using data tiers allows allocating indices to dedicated tiers of nodes. Such nodes would typically have different characteristics, either physically (storage type, RAM:storage ratio) or from a usage standpoint (my hot tier is expected to respond fast).

Using data tiers is optional in that using the data role will assign all data tiers to the node. However, if a cluster is using separate data tiers it is desirable to be explicit about where a specific index belongs.

Today we allow index.routing.allocation.include._tier_preference to be unspecified for an index. This prevents Elasticsearch and its clients from relying on which tier an index/shard is located on, affecting following:

Autoscaling does not know which data tier to scale up.
The _tier query will not know the tier of an index/shard.

Futhermore, it allows us to rely on this for future developments, such as balancing of shards, UI, monitoring and more. There is no known good use case for a tier-less index and allowing it only adds complexity for ourselves and users and can be considered bad data.

The proposal here is to work towards having index.routing.allocation.include._tier_preference be mandatory for all indices in following steps:

Add a cluster setting to signal that creating new indices should always result in a tier preference. When set, creating an index should add the default tier preference if no explicit preference was given in the request. This will be default off in 7.x, default on in 8.
- And, in fact, only on allowed in 8. (edit: it's always treated as on, in that we disregard the value of the setting)
Add a deprecation info and warnings in 7.x, only for clusters that have data nodes without all data roles.
- Add deprecation info for indices without a tier preference set.
- Add deprecation warning when creating an index results in no tier preference set. This should include create index, rollover and create data stream.
Make ILM migrate action mandatory in 8.0, regardless of allocate action.
On 7.x, change the migrate_to_data_tiers API to apply the default data tier preference to any index that results in no tier preference otherwise and set the cluster setting mentioned in the first work item to ensure new indices are assigned a tier preference.

In a future release (possibly 9.0) we should close the loop and:

Remove the flag from cluster settings.
Enforce not setting tier preference to null (we could consider doing this in 8.0 too).
Evaluate at what point we need/want to drop the migrate_to_data_tiers API from the code (8.x? 9.x?)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-08-05T07:37:12Z

Pinging @elastic/es-core-features (Team:Core/Features)

gwbrown · 2021-08-05T16:20:47Z

@colings86 @dakrone This issue also seems relevant to recent conversations regarding how node shutdown should interact with tier preference - if tier preference becomes mandatory, that could change our calculus on the issue.

joegallo · 2021-10-15T16:48:46Z

Things to come back to:

@henningandersen asked for a change to test Add setting to enforce a default TIER_PREFERENCE #79210 (comment)), don't forget that.
Need to add documentation for the new cluster.routing.allocation.enforce_default_tier_preference setting
Follow up to Default ENFORCE_DEFAULT_TIER_PREFERENCE to true #79275 (comment), we still need to make the only acceptable value for the setting be true on master/8.0+. (edit: see Always enforce a default tier preference (ENFORCE_DEFAULT_TIER_PREFERENCE must always be true) #79723) (edit: slightly different resolution, but this is handled via Always enforce default tier preference (ENFORCE_DEFAULT_TIER_PREFERENCE is ignored) #79751)
I punted on docs changes for Migrate to data tiers should always ensure a TIER_PREFERENCE is set #79100, so I'll need take care of that.
The details in the docs here are quite a bit different now on both 7.x and 8.0, that'll need an update.
The "Migrate to data tiers routing API" docs need to mention the new enforcement setting, and how on 7.x it changes the value.
Similarly, the broader "Migrate index allocation filters to node roles" docs deserve a run-through / review / re-write in light of the changes here.
The deprecation info and warning messages from Add deprecation info and warnings for an empty TIER_PREFERENCE #79305 are non-final, we need to settle on the final wording (and short links).
Similarly, the new deprecation info in Always enforce default tier preference (ENFORCE_DEFAULT_TIER_PREFERENCE is ignored) #79751 needs to be finalized.

droberts195 · 2021-11-11T09:31:05Z

What is the expectation when a snapshot taken in a pre-7.10 cluster is restored into an 8.x cluster? The indices in that snapshot will not have a tier preference set. Does it mean that 8.x code cannot safely assume that every index will have a tier preference set? Or will snapshot restoration be changed for 8.0 and above to automatically set a tier preference on restored indices that didn't have one when snapshotted?

henningandersen · 2021-11-11T09:37:27Z

@droberts195
We allow indices without a _tier_preference in 8.0, we did not break this, that is for an upcoming release after proper deprecation period. We cannot be guaranteed a _tier_preference, but will assume it is there (without breaking if missing) for things like autoscaling, since users should run the migrate api to fix deprecations before upgrading.

droberts195 · 2021-11-11T09:53:28Z

users should run the migrate api to fix deprecations before upgrading

If it's not already documented I think it might be worth calling out in the docs that even if you fix all the deprecations before upgrading you can reintroduce indices that have those same deprecated characteristics by restoring an old snapshot.

will assume it is there (without breaking if missing)

This needs to be well-known among developers who write code that searches Elasticsearch. It's not safe to assume _tier_preference is always set in 8.0+. Code needs to be written in such a way that it won't break if it encounters an index that was restored from an old snapshot. This knowledge doesn't just need to be in the core ES team, but also for example in teams writing UI code that wants to quickly obtain an example document from the fastest indices that match a pattern. I will make sure the ML UI team are aware.

henningandersen added >enhancement :Data Management/ILM+SLM Index and Snapshot lifecycle management needs:triage Requires assignment of a team area label labels Aug 5, 2021

elasticmachine added the Team:Data Management Meta label for data/management team label Aug 5, 2021

dakrone assigned joegallo Aug 5, 2021

gwbrown removed the needs:triage Requires assignment of a team area label label Aug 5, 2021

joegallo mentioned this issue Sep 28, 2021

Rename INDEX_ROUTING_PREFER to TIER_PREFERENCE #78411

Merged

This was referenced Oct 15, 2021

Default ENFORCE_DEFAULT_TIER_PREFERENCE to true #79275

Merged

[7.x] Migrate to data tiers should always ensure a TIER_PREFERENCE is set #79297

Merged

Add deprecation info and warnings for an empty TIER_PREFERENCE #79305

Merged

jimczi mentioned this issue Oct 18, 2021

_terms_enum API index_filter doesn’t work with _tier field on upgraded cluster from 6.8.19->6.8.20->7.15.1 #79200

Closed

This was referenced Oct 25, 2021

Always enforce a default tier preference (ENFORCE_DEFAULT_TIER_PREFERENCE must always be true) #79723

Closed

Always enforce default tier preference (ENFORCE_DEFAULT_TIER_PREFERENCE is ignored) #79751

Merged

joegallo mentioned this issue Nov 11, 2021

Better UX around migration to data tiers during 7.16->8.0 upgrade #80645

Closed

This was referenced Dec 6, 2021

_tier_preference docs changes for 8.0 #81389

Merged

_tier_preference docs changes for 7.16 #81401

Merged

VimCommando mentioned this issue Feb 10, 2022

[Upgrade Assistant] Warn if cluster's node attributes and data tiers may not match #83800

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mandatory data tier preference #76147

Mandatory data tier preference #76147

henningandersen commented Aug 5, 2021 •

edited by joegallo

Loading

elasticmachine commented Aug 5, 2021

gwbrown commented Aug 5, 2021

joegallo commented Oct 15, 2021 •

edited

Loading

droberts195 commented Nov 11, 2021

henningandersen commented Nov 11, 2021

droberts195 commented Nov 11, 2021

Mandatory data tier preference #76147

Mandatory data tier preference #76147

Comments

henningandersen commented Aug 5, 2021 • edited by joegallo Loading

elasticmachine commented Aug 5, 2021

gwbrown commented Aug 5, 2021

joegallo commented Oct 15, 2021 • edited Loading

droberts195 commented Nov 11, 2021

henningandersen commented Nov 11, 2021

droberts195 commented Nov 11, 2021

henningandersen commented Aug 5, 2021 •

edited by joegallo

Loading

joegallo commented Oct 15, 2021 •

edited

Loading