Fix Partial Snapshots Recording Spurious Errors #69150

TommyWind · 2021-02-17T19:11:22Z

If an index is deleted while a partial snapshot is running the behavior
was not deterministic.
If an index was deleted just as one of its shard snapshots was about to
start then it would be recorded as a shard snapshot failure in the
snapshot result and the snapshot would show up as PARTIAL.
If the index delete however happened after the shard had been
snapshotted, then the snapshot would show SUCCESS.
In both cases however, the snapshot would contain the exact same data
because the deleted index would become part of the final snapshot.
Also, it was confusing that in the PARTIAL case, there would be errors
recorded for shards the indices of which would not be part of the
snapshot.

This commit makes it such that not only are indices filtered from the
list of indices in a snapshot but also from the shard snapshot errors
in a snapshot entry so that the snapshot always shows up as SUCCESS
because concurrent index deletes are not a failure but allowed in
partial snapshots.

Closes #69014

If an index is deleted while a partial snapshot is running the behavior was not deterministic. If an index was deleted just as one of its shard snapshots was about to start then it would be recorded as a shard snapshot failure in the snapshot result and the snapshot would show up as `PARTIAL`. If the index delete however happened after the shard had been snapshotted, then the snapshot would show `SUCCESS`. In both cases however, the snapshot would contain the exact same data because the deleted index would become part of the final snapshot. Also, it was confusing that in the `PARTIAL` case, there would be errors recorded for shards the indices of which would not be part of the snapshot. This commit makes it such that not only are indices filtered from the list of indices in a snapshot but also from the shard snapshot errors in a snapshot entry so that the snapshot always shows up as `SUCCESS` because concurrent index deletes are not a failure but allowed in partial snapshots. Closes elastic#69014

elasticmachine · 2021-02-17T19:23:44Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2021-02-17T19:23:47Z

Jenkins test this

original-brownbear · 2021-02-17T21:10:22Z

Thanks @TommyWind !

If an index is deleted while a partial snapshot is running the behavior was not deterministic. If an index was deleted just as one of its shard snapshots was about to start then it would be recorded as a shard snapshot failure in the snapshot result and the snapshot would show up as `PARTIAL`. If the index delete however happened after the shard had been snapshotted, then the snapshot would show `SUCCESS`. In both cases however, the snapshot would contain the exact same data because the deleted index would become part of the final snapshot. Also, it was confusing that in the `PARTIAL` case, there would be errors recorded for shards the indices of which would not be part of the snapshot. This commit makes it such that not only are indices filtered from the list of indices in a snapshot but also from the shard snapshot errors in a snapshot entry so that the snapshot always shows up as `SUCCESS` because concurrent index deletes are not a failure but allowed in partial snapshots. Closes elastic#69014

If an index is deleted while a partial snapshot is running the behavior was not deterministic. If an index was deleted just as one of its shard snapshots was about to start then it would be recorded as a shard snapshot failure in the snapshot result and the snapshot would show up as `PARTIAL`. If the index delete however happened after the shard had been snapshotted, then the snapshot would show `SUCCESS`. In both cases however, the snapshot would contain the exact same data because the deleted index would become part of the final snapshot. Also, it was confusing that in the `PARTIAL` case, there would be errors recorded for shards the indices of which would not be part of the snapshot. This commit makes it such that not only are indices filtered from the list of indices in a snapshot but also from the shard snapshot errors in a snapshot entry so that the snapshot always shows up as `SUCCESS` because concurrent index deletes are not a failure but allowed in partial snapshots. Closes #69014 Co-authored-by: Tamara Braun <[email protected]>

original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >bug v7.12.0 v8.0.0 labels Feb 17, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 17, 2021

original-brownbear self-requested a review February 17, 2021 19:23

original-brownbear merged commit 86de86b into elastic:master Feb 17, 2021

original-brownbear added the backport pending label Feb 17, 2021

williamrandolph added v7.12.1 v7.13.0 and removed v7.12.0 v7.12.1 labels Feb 19, 2021

original-brownbear added v7.12.0 and removed backport pending labels Feb 22, 2021

original-brownbear mentioned this pull request Feb 22, 2021

Fix Partial Snapshots Recording Spurious Errors (#69150) #69372

Merged

original-brownbear mentioned this pull request Feb 22, 2021

Fix Partial Snapshots Recording Spurious Errors (#69150) #69373

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Partial Snapshots Recording Spurious Errors #69150

Fix Partial Snapshots Recording Spurious Errors #69150

TommyWind commented Feb 17, 2021

elasticmachine commented Feb 17, 2021

original-brownbear commented Feb 17, 2021

original-brownbear commented Feb 17, 2021

Fix Partial Snapshots Recording Spurious Errors #69150

Fix Partial Snapshots Recording Spurious Errors #69150

Conversation

TommyWind commented Feb 17, 2021

elasticmachine commented Feb 17, 2021

original-brownbear commented Feb 17, 2021

original-brownbear commented Feb 17, 2021