Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

original-brownbear · 2021-07-21T17:32:07Z

There is a bug in the concurrent snapshot logic where the following situation involving three concurrent snapshots and a snapshot delete is broken and may lead to writing corrupted repository metadata:

Start 3 snapshots for the same two indices
Abort the one in the middle before after the first snapshot finishes on the data node (as far as writing to the repository goes) but before the index gets out of the queued state for the second snapshot
third snapshot is moved started once the middling snapshot completes to FAILED state but has null for the shard generation for shards in the shared index

This is a fairly unlikely scenario to run into since the abort must be timed just right, but it's somewhat more likely if the second snapshot has a larger diff with the first snapshot (so writing the files takes longer ... though even in this scenario the CS with hte abort has to be applied on the data node right after finishing the last file).
=> fixing this asap but probably not before next week

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-07-21T17:32:09Z

Pinging @elastic/es-distributed (Team:Distributed)

…after Failing Operations The node executing a shard level operation would in many cases communicate `null` for the shard state update, leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch. closes elastic#75598

…after Failing Operations (#75733) The node executing a shard level operation would in many cases communicate `null` for the shard state update, leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch. closes #75598

…after Failing Operations (elastic#75733) The node executing a shard level operation would in many cases communicate `null` for the shard state update, leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch. closes elastic#75598

…after Failing Operations (#75733) (#76548) The node executing a shard level operation would in many cases communicate `null` for the shard state update, leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch. closes #75598

…after Failing Operations (elastic#75733) (elastic#76548) The node executing a shard level operation would in many cases communicate `null` for the shard state update, leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch. closes elastic#75598

…after Failing Operations (#75733) (#76548) (#76556) The node executing a shard level operation would in many cases communicate `null` for the shard state update, leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch. closes #75598

Adds a description of elastic#75598, and the mitigation, to the release notes of versions 7.13.2 through 7.14.0.

Adds a description of #75598, and the mitigation, to the release notes of versions 7.13.2 through 7.14.0.

The known-issue docs give the impression that an upgrade will restore the lost data in the repository. This isn't the case, so this commit clarifies this in the docs. Relates elastic#73456 Relates elastic#75598 Relates elastic#79221

The known-issue docs give the impression that an upgrade will restore the lost data in the repository. This isn't the case, so this commit clarifies this in the docs. Relates #73456 Relates #75598 Relates #79221

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jul 21, 2021

original-brownbear self-assigned this Jul 21, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 21, 2021

original-brownbear mentioned this issue Jul 21, 2021

Refactor SnapshotsInProgress to Use RepositoryId for Concurency Logic #75501

Merged

original-brownbear mentioned this issue Jul 27, 2021

Fix Concurrent Snapshot Repository Corruption from Operations Queued after Failing Operations #75733

Merged

original-brownbear closed this as completed in #75733 Jul 27, 2021

original-brownbear mentioned this issue Aug 16, 2021

Fix Concurrent Snapshot Repository Corruption from Operations Queued after Failing Operations (#75733) #76548

Merged

original-brownbear mentioned this issue Aug 16, 2021

Fix Concurrent Snapshot Repository Corruption from Operations Queued after Failing Operations (#75733) (#76548) #76556

Merged

DaveCTurner mentioned this issue Oct 15, 2021

Add known issue docs for #75598 #79221

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 15, 2021

Add known issue docs for elastic#75598

4ca78e9

Adds a description of elastic#75598, and the mitigation, to the release notes of versions 7.13.2 through 7.14.0.

DaveCTurner added a commit that referenced this issue Oct 15, 2021

Add known issue docs for #75598 (#79221)

afc3814

Adds a description of #75598, and the mitigation, to the release notes of versions 7.13.2 through 7.14.0.

DaveCTurner added a commit that referenced this issue Oct 15, 2021

Add known issue docs for #75598 (#79221)

fecc105

Adds a description of #75598, and the mitigation, to the release notes of versions 7.13.2 through 7.14.0.

DaveCTurner added a commit that referenced this issue Oct 15, 2021

Add known issue docs for #75598 (#79221)

850ce64

Adds a description of #75598, and the mitigation, to the release notes of versions 7.13.2 through 7.14.0.

DaveCTurner added a commit that referenced this issue Oct 15, 2021

Add known issue docs for #75598 (#79221)

5f8cb09

Adds a description of #75598, and the mitigation, to the release notes of versions 7.13.2 through 7.14.0.

DaveCTurner mentioned this issue Nov 11, 2021

Add docs about repair of repo affected by corruption bug #80662

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023

Remove linebreak from elastic#75598 known-issue docs

f6514a5

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

DaveCTurner added a commit that referenced this issue Jun 29, 2023

Remove linebreak from #75598 known-issue docs (#97216)

e1c90fa

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023

Remove linebreak from elastic#75598 known-issue docs (elastic#97216)

c8a91dc

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

DaveCTurner mentioned this issue Jun 29, 2023

[7.13] Remove linebreak from #75598 known-issue docs (#97216) #97217

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023

Remove linebreak from elastic#75598 known-issue docs (elastic#97216)

c6210c2

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

DaveCTurner mentioned this issue Jun 29, 2023

[7.14] Remove linebreak from #75598 known-issue docs (#97216) #97218

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023

Remove linebreak from elastic#75598 known-issue docs (elastic#97216)

77dba12

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

DaveCTurner mentioned this issue Jun 29, 2023

[7.15] Remove linebreak from #75598 known-issue docs (#97216) #97219

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 29, 2023

Remove linebreak from elastic#75598 known-issue docs (elastic#97216)

405fe96

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

DaveCTurner mentioned this issue Jun 29, 2023

[7.16] Remove linebreak from #75598 known-issue docs (#97216) #97220

Merged

elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023

Remove linebreak from #75598 known-issue docs (#97216) (#97217)

e838081

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023

Remove linebreak from #75598 known-issue docs (#97216) (#97218)

8cb4609

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023

Remove linebreak from #75598 known-issue docs (#97216) (#97220)

a6b5009

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

elasticsearchmachine pushed a commit that referenced this issue Jun 29, 2023

Remove linebreak from #75598 known-issue docs (#97216) (#97219)

13996d3

Linebreaks in `preformatted text` are preserved in the rendered docs, which is not what we want here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

original-brownbear commented Jul 21, 2021

elasticmachine commented Jul 21, 2021

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

Comments

original-brownbear commented Jul 21, 2021

elasticmachine commented Jul 21, 2021