Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

Closed
original-brownbear opened this issue Jul 21, 2021 · 1 comment · Fixed by #75733
Closed

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

original-brownbear opened this issue Jul 21, 2021 · 1 comment · Fixed by #75733
Assignees
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@original-brownbear
Copy link
Member

There is a bug in the concurrent snapshot logic where the following situation involving three concurrent snapshots and a snapshot delete is broken and may lead to writing corrupted repository metadata:

  1. Start 3 snapshots for the same two indices
  2. Abort the one in the middle before after the first snapshot finishes on the data node (as far as writing to the repository goes) but before the index gets out of the queued state for the second snapshot
  3. third snapshot is moved started once the middling snapshot completes to FAILED state but has null for the shard generation for shards in the shared index

This is a fairly unlikely scenario to run into since the abort must be timed just right, but it's somewhat more likely if the second snapshot has a larger diff with the first snapshot (so writing the files takes longer ... though even in this scenario the CS with hte abort has to be applied on the data node right after finishing the last file).
=> fixing this asap but probably not before next week

@original-brownbear original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jul 21, 2021
@original-brownbear original-brownbear self-assigned this Jul 21, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 21, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jul 27, 2021
…after Failing Operations

The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.

closes elastic#75598
original-brownbear added a commit that referenced this issue Jul 27, 2021
…after Failing Operations (#75733)

The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.

closes #75598
ywangd pushed a commit to ywangd/elasticsearch that referenced this issue Jul 30, 2021
…after Failing Operations (elastic#75733)

The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.

closes elastic#75598
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Aug 16, 2021
…after Failing Operations (elastic#75733)

The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.

closes elastic#75598
original-brownbear added a commit that referenced this issue Aug 16, 2021
…after Failing Operations (#75733) (#76548)

The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.

closes #75598
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Aug 16, 2021
…after Failing Operations (elastic#75733) (elastic#76548)

The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.

closes elastic#75598
original-brownbear added a commit that referenced this issue Aug 16, 2021
…after Failing Operations (#75733) (#76548) (#76556)

The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.

closes #75598
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 15, 2021
Adds a description of elastic#75598, and the mitigation, to the release notes
of versions 7.13.2 through 7.14.0.
DaveCTurner added a commit that referenced this issue Oct 15, 2021
Adds a description of #75598, and the mitigation, to the release notes
of versions 7.13.2 through 7.14.0.
DaveCTurner added a commit that referenced this issue Oct 15, 2021
Adds a description of #75598, and the mitigation, to the release notes
of versions 7.13.2 through 7.14.0.
DaveCTurner added a commit that referenced this issue Oct 15, 2021
Adds a description of #75598, and the mitigation, to the release notes
of versions 7.13.2 through 7.14.0.
DaveCTurner added a commit that referenced this issue Oct 15, 2021
Adds a description of #75598, and the mitigation, to the release notes
of versions 7.13.2 through 7.14.0.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
The known-issue docs give the impression that an upgrade will restore
the lost data in the repository. This isn't the case, so this commit
clarifies this in the docs.

Relates elastic#73456
Relates elastic#75598
Relates elastic#79221
DaveCTurner added a commit that referenced this issue Nov 15, 2021
The known-issue docs give the impression that an upgrade will restore
the lost data in the repository. This isn't the case, so this commit
clarifies this in the docs.

Relates #73456
Relates #75598
Relates #79221
DaveCTurner added a commit that referenced this issue Nov 15, 2021
The known-issue docs give the impression that an upgrade will restore
the lost data in the repository. This isn't the case, so this commit
clarifies this in the docs.

Relates #73456
Relates #75598
Relates #79221
DaveCTurner added a commit that referenced this issue Nov 15, 2021
The known-issue docs give the impression that an upgrade will restore
the lost data in the repository. This isn't the case, so this commit
clarifies this in the docs.

Relates #73456
Relates #75598
Relates #79221
DaveCTurner added a commit that referenced this issue Nov 15, 2021
The known-issue docs give the impression that an upgrade will restore
the lost data in the repository. This isn't the case, so this commit
clarifies this in the docs.

Relates #73456
Relates #75598
Relates #79221
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
DaveCTurner added a commit that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023
Linebreaks in `preformatted text` are preserved in the rendered docs,
which is not what we want here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
2 participants