Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Partial Snapshots Recording Spurious Errors #69150

Merged
merged 1 commit into from
Feb 17, 2021

Conversation

TommyWind
Copy link
Contributor

If an index is deleted while a partial snapshot is running the behavior
was not deterministic.
If an index was deleted just as one of its shard snapshots was about to
start then it would be recorded as a shard snapshot failure in the
snapshot result and the snapshot would show up as PARTIAL.
If the index delete however happened after the shard had been
snapshotted, then the snapshot would show SUCCESS.
In both cases however, the snapshot would contain the exact same data
because the deleted index would become part of the final snapshot.
Also, it was confusing that in the PARTIAL case, there would be errors
recorded for shards the indices of which would not be part of the
snapshot.

This commit makes it such that not only are indices filtered from the
list of indices in a snapshot but also from the shard snapshot errors
in a snapshot entry so that the snapshot always shows up as SUCCESS
because concurrent index deletes are not a failure but allowed in
partial snapshots.

Closes #69014

If an index is deleted while a partial snapshot is running the behavior
was not deterministic.
If an index was deleted just as one of its shard snapshots was about to
start then it would be recorded as a shard snapshot failure in the
snapshot result and the snapshot would show up as `PARTIAL`.
If the index delete however happened after the shard had been
snapshotted, then the snapshot would show `SUCCESS`.
In both cases however, the snapshot would contain the exact same data
because the deleted index would become part of the final snapshot.
Also, it was confusing that in the `PARTIAL` case, there would be errors
recorded for shards the indices of which would not be part of the
snapshot.

This commit makes it such that not only are indices filtered from the
list of indices in a snapshot but also from the shard snapshot errors
in a snapshot entry so that the snapshot always shows up as `SUCCESS`
because concurrent index deletes are not a failure but allowed in
partial snapshots.

Closes elastic#69014
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 17, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@original-brownbear
Copy link
Member

Jenkins test this

@original-brownbear
Copy link
Member

Thanks @TommyWind !

@original-brownbear original-brownbear merged commit 86de86b into elastic:master Feb 17, 2021
original-brownbear pushed a commit to original-brownbear/elasticsearch that referenced this pull request Feb 22, 2021
If an index is deleted while a partial snapshot is running the behavior
was not deterministic.
If an index was deleted just as one of its shard snapshots was about to
start then it would be recorded as a shard snapshot failure in the
snapshot result and the snapshot would show up as `PARTIAL`.
If the index delete however happened after the shard had been
snapshotted, then the snapshot would show `SUCCESS`.
In both cases however, the snapshot would contain the exact same data
because the deleted index would become part of the final snapshot.
Also, it was confusing that in the `PARTIAL` case, there would be errors
recorded for shards the indices of which would not be part of the
snapshot.

This commit makes it such that not only are indices filtered from the
list of indices in a snapshot but also from the shard snapshot errors
in a snapshot entry so that the snapshot always shows up as `SUCCESS`
because concurrent index deletes are not a failure but allowed in
partial snapshots.

Closes elastic#69014
original-brownbear pushed a commit to original-brownbear/elasticsearch that referenced this pull request Feb 22, 2021
If an index is deleted while a partial snapshot is running the behavior
was not deterministic.
If an index was deleted just as one of its shard snapshots was about to
start then it would be recorded as a shard snapshot failure in the
snapshot result and the snapshot would show up as `PARTIAL`.
If the index delete however happened after the shard had been
snapshotted, then the snapshot would show `SUCCESS`.
In both cases however, the snapshot would contain the exact same data
because the deleted index would become part of the final snapshot.
Also, it was confusing that in the `PARTIAL` case, there would be errors
recorded for shards the indices of which would not be part of the
snapshot.

This commit makes it such that not only are indices filtered from the
list of indices in a snapshot but also from the shard snapshot errors
in a snapshot entry so that the snapshot always shows up as `SUCCESS`
because concurrent index deletes are not a failure but allowed in
partial snapshots.

Closes elastic#69014
original-brownbear added a commit that referenced this pull request Feb 22, 2021
If an index is deleted while a partial snapshot is running the behavior
was not deterministic.
If an index was deleted just as one of its shard snapshots was about to
start then it would be recorded as a shard snapshot failure in the
snapshot result and the snapshot would show up as `PARTIAL`.
If the index delete however happened after the shard had been
snapshotted, then the snapshot would show `SUCCESS`.
In both cases however, the snapshot would contain the exact same data
because the deleted index would become part of the final snapshot.
Also, it was confusing that in the `PARTIAL` case, there would be errors
recorded for shards the indices of which would not be part of the
snapshot.

This commit makes it such that not only are indices filtered from the
list of indices in a snapshot but also from the shard snapshot errors
in a snapshot entry so that the snapshot always shows up as `SUCCESS`
because concurrent index deletes are not a failure but allowed in
partial snapshots.

Closes #69014

Co-authored-by: Tamara Braun <[email protected]>
original-brownbear added a commit that referenced this pull request Feb 23, 2021
If an index is deleted while a partial snapshot is running the behavior
was not deterministic.
If an index was deleted just as one of its shard snapshots was about to
start then it would be recorded as a shard snapshot failure in the
snapshot result and the snapshot would show up as `PARTIAL`.
If the index delete however happened after the shard had been
snapshotted, then the snapshot would show `SUCCESS`.
In both cases however, the snapshot would contain the exact same data
because the deleted index would become part of the final snapshot.
Also, it was confusing that in the `PARTIAL` case, there would be errors
recorded for shards the indices of which would not be part of the
snapshot.

This commit makes it such that not only are indices filtered from the
list of indices in a snapshot but also from the shard snapshot errors
in a snapshot entry so that the snapshot always shows up as `SUCCESS`
because concurrent index deletes are not a failure but allowed in
partial snapshots.

Closes #69014

Co-authored-by: Tamara Braun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.0 v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SystemIndicesSnapshotIT#testParallelIndexDeleteRemovesFeatureState failures
5 participants