Track Shard-Snapshot Index Generation at Repository Root #48371

original-brownbear · 2019-10-23T07:26:48Z

Backport of #46250

* Track Shard Snapshot Generation in CS Adds communication of new shard generations from datanodes to master and tracking of those generations in the CS. This is a preliminary to elastic#46250

This change adds a new field `"shards"` to `RepositoryData` that contains a mapping of `IndexId` to a `String[]`. This string array can be accessed by shard id to get the generation of a shard's shard folder (i.e. the `N` in the name of the currently valid `/indices/${indexId}/${shardId}/index-${N}` for the shard in question). This allows for creating a new snapshot in the shard without doing any LIST operations on the shard's folder. In the case of AWS S3, this saves about 1/3 of the cost for updating an empty shard (see elastic#45736) and removes one out of two remaining potential issues with eventually consistent blob stores (see elastic#38941 ... now only the root `index-${N}` is determined by listing). Also and equally if not more important, a number of possible failure modes on eventually consistent blob stores like AWS S3 are eliminated by moving all delete operations to the `master` node and moving from incremental naming of shard level index-N to uuid suffixes for these blobs. This change moves the deleting of the previous shard level `index-${uuid}` blob to the master node instead of the data node allowing for a safe and consistent update of the shard's generation in the `RepositoryData` by first updating `RepositoryData` and then deleting the now unreferenced `index-${newUUID}` blob. __No deletes are executed on the data nodes at all for any operation with this change.__ Note also: Previous issues with hanging data nodes interfering with master nodes are completely impossible, even on S3 (see next section for details). This change changes the naming of the shard level `index-${N}` blobs to a uuid suffix `index-${UUID}`. The reason for this is the fact that writing a new shard-level `index-` generation blob is not atomic anymore in its effect. Not only does the blob have to be written to have an effect, it must also be referenced by the root level `index-N` (`RepositoryData`) to become an effective part of the snapshot repository. This leads to a problem if we were to use incrementing names like we did before. If a blob `index-${N+1}` is written but due to the node/network/cluster/... crashes the root level `RepositoryData` has not been updated then a future operation will determine the shard's generation to be `N` and try to write a new `index-${N+1}` to the already existing path. Updates like that are problematic on S3 for consistency reasons, but also create numerous issues when thinking about stuck data nodes. Previously stuck data nodes that were tasked to write `index-${N+1}` but got stuck and tried to do so after some other node had already written `index-${N+1}` were prevented form doing so (except for on S3) by us not allowing overwrites for that blob and thus no corruption could occur. Were we to continue using incrementing names, we could not do this. The stuck node scenario would either allow for overwriting the `N+1` generation or force us to continue using a `LIST` operation to figure out the next `N` (which would make this change pointless). With uuid naming and moving all deletes to `master` this becomes a non-issue. Data nodes write updated shard generation `index-${uuid}` and `master` makes those `index-${uuid}` part of the `RepositoryData` that it deems correct and cleans up all those `index-` that are unused. Co-authored-by: Yannick Welsch <[email protected]> Co-authored-by: Tanguy Leroux <[email protected]>

elasticmachine · 2019-10-23T07:26:50Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Muting BwC tests so elastic#48371 can be merged.

Muting BwC tests so #48371 can be merged.

* elastic#48371 has been merged -> reenabling tests Will remove the BwC logic that has become obsolete with the backporting of relevant changes to 7.6 in a follow-up as it is quite a bit of code that goes away and I didn't want to delay BwC tests.

#48371 has been merged -> reenabling tests

With elastic#48371 merged there is no need to keep the BwC logic for writing snapshots in the old style. The only thing necessary to retain is the `SnapshotsInProgress.Entry` serialization BwC.

original-brownbear and others added 2 commits October 23, 2019 06:56

Track Shard Snapshot Generation in CS (elastic#46864)

0f84675

* Track Shard Snapshot Generation in CS Adds communication of new shard generations from datanodes to master and tracking of those generations in the CS. This is a preliminary to elastic#46250

original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Oct 23, 2019

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 23, 2019

Mute BwC Tests for elastic#48371

3d4df01

Muting BwC tests so elastic#48371 can be merged.

original-brownbear mentioned this pull request Oct 23, 2019

Mute BwC Tests for #48371 #48376

Merged

original-brownbear added a commit that referenced this pull request Oct 23, 2019

Mute BwC Tests for #48371 (#48376)

e8547be

Muting BwC tests so #48371 can be merged.

original-brownbear merged commit 7215201 into elastic:7.x Oct 23, 2019

original-brownbear deleted the 46250-7.x branch October 23, 2019 09:58

original-brownbear mentioned this pull request Oct 23, 2019

Reenable BwC Tests after #48371 #48377

Merged

original-brownbear added a commit that referenced this pull request Oct 23, 2019

Reenable BwC Tests after #48371 (#48377)

b84db6a

#48371 has been merged -> reenabling tests

original-brownbear mentioned this pull request Oct 23, 2019

Remove Obsolete ShardGenerations BwC Logic #48416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track Shard-Snapshot Index Generation at Repository Root #48371

Track Shard-Snapshot Index Generation at Repository Root #48371

original-brownbear commented Oct 23, 2019

elasticmachine commented Oct 23, 2019

Track Shard-Snapshot Index Generation at Repository Root #48371

Track Shard-Snapshot Index Generation at Repository Root #48371

Conversation

original-brownbear commented Oct 23, 2019

elasticmachine commented Oct 23, 2019