Add docs about repair of repo affected by corruption bug

The known-issue docs give the impression that an upgrade will restore the lost data in the repository. This isn't the case, so this commit clarifies this in the docs. Relates elastic#73456 Relates elastic#75598 Relates elastic#79221
DaveCTurner · Nov 11, 2021 · 48fedfa · 48fedfa
1 parent ff91b80
commit 48fedfa
Show file tree

Hide file tree

Showing 5 changed files with 36 additions and 114 deletions.
diff --git a/docs/reference/release-notes/7.10.asciidoc b/docs/reference/release-notes/7.10.asciidoc
@@ -19,15 +19,7 @@ https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-22132[CVE-2021-22132]
 [discrete]
 === Known issues
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[bug-7.10.2]]
 [float]
@@ -105,15 +97,7 @@ Also see <<breaking-changes-7.10,Breaking changes in 7.10>>.
 ** With nested `inner_hits`, the fast vector highlighter may load snippets from the wrong document. ({es-issue}65533[#65533])
 ** When _source is disabled, we can fail load nested `inner_hits` and `top_hits`. ({es-issue}66572[#66572])
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[bug-7.10.1]]
 [float]
@@ -211,15 +195,7 @@ see {es-issue}65488[#65488].
 ** With nested `inner_hits`, the fast vector highlighter may load snippets from the wrong document. ({es-issue}65533[#65533])
 ** When _source is disabled, we can fail load nested `inner_hits` and `top_hits`. ({es-issue}66572[#66572])
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[breaking-7.10.0]]
 [float]

diff --git a/docs/reference/release-notes/7.11.asciidoc b/docs/reference/release-notes/7.11.asciidoc
@@ -7,15 +7,7 @@ Also see <<breaking-changes-7.11,Breaking changes in 7.11>>.
 [discrete]
 === Known issues
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[enhancement-7.11.2]]
 [float]
@@ -77,15 +69,7 @@ Also see <<breaking-changes-7.11,Breaking changes in 7.11>>.
 [discrete]
 === Known issues
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[enhancement-7.11.1]]
 [float]
@@ -190,15 +174,7 @@ Also see <<breaking-changes-7.11,Breaking changes in 7.11>>.
   sizes may increase much higher than required. Elasticsearch 7.13.0 contains a fix for this.
   For more details, see {es-issue}72509[#72509]
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [discrete]
 [[fips-140-2-compliance-7.11.0]]

diff --git a/docs/reference/release-notes/7.12.asciidoc b/docs/reference/release-notes/7.12.asciidoc
@@ -9,15 +9,7 @@ Also see <<breaking-changes-7.12,Breaking changes in 7.12>>.
 
 include::7.12.asciidoc[tag=frozen-tier-79371-known-issue]
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[enhancement-7.12.1]]
 [float]
@@ -163,15 +155,7 @@ For more details, see {es-issue}79371[#79371].
   sizes may increase much higher than required. Elasticsearch 7.13.0 contains a fix for this.
   For more details, see {es-issue}72509[#72509]
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[breaking-7.12.0]]
 [float]

diff --git a/docs/reference/release-notes/7.13.asciidoc b/docs/reference/release-notes/7.13.asciidoc
@@ -172,8 +172,11 @@ PUT _cluster/settings
 }
 ----
 +
-This issue is fixed in {es} versions 7.14.1 and later. For more details, see
-{es-issue}75598[#75598].
+This issue is fixed in {es} versions 7.14.1 and later. It is not possible to
+repair a repository once it is affected by this issue, so you must restore the
+repository from a backup, or clear the repository by executing `DELETE
+_snapshot/<repository name>/*`, or move to a fresh repository. For more
+details, see {es-issue}75598[#75598].
 // end::snapshot-repo-corruption-75598-known-issue[]
 
 [[bug-7.13.1]]
@@ -220,15 +223,7 @@ maximum value of `512`. This allows autoscaling to run reliably as it relies on
 assigning jobs only via memory. Having `xpack.ml.max_open_jobs` as a small
 number may cause autoscaling to behave unexpectedly.
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 * If local and remote clusters are on different patch releases, response
 serialization fails for requests to a remote cluster that use the

diff --git a/docs/reference/release-notes/7.9.asciidoc b/docs/reference/release-notes/7.9.asciidoc
@@ -16,15 +16,7 @@ with a `NOT IN` operator.
 We have fixed this issue in {es} 7.10.1 and later versions. For more details,
 see {es-issue}65488[#65488].
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[bug-7.9.3]]
 [float]
@@ -104,15 +96,7 @@ with a `NOT IN` operator.
 We have fixed this issue in {es} 7.10.1 and later versions. For more details,
 see {es-issue}65488[#65488].
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[deprecation-7.9.2]]
 [float]
@@ -223,15 +207,7 @@ with a `NOT IN` operator.
 We have fixed this issue in {es} 7.10.1 and later versions. For more details,
 see {es-issue}65488[#65488].
 
-* Snapshot and restore: If an index is deleted while the cluster is
-concurrently taking more than one snapshot then there is a risk that one of the
-snapshots may never complete and also that some shard data may be lost from the
-repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
-+
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+include::7.9.asciidoc[tag=snapshot-repo-corruption-73456-known-issue]
 
 [[feature-7.9.1]]
 [float]
@@ -390,15 +366,30 @@ with a `NOT IN` operator.
 We have fixed this issue in {es} 7.10.1 and later versions. For more details,
 see {es-issue}65488[#65488].
 
+// tag::snapshot-repo-corruption-73456-known-issue[]
 * Snapshot and restore: If an index is deleted while the cluster is
 concurrently taking more than one snapshot then there is a risk that one of the
 snapshots may never complete and also that some shard data may be lost from the
 repository, causing future restore operations to fail. To mitigate this
-problem, prevent concurrent snapshot operations by setting
-`snapshot.max_concurrent_operations: 1`.
+problem, set `snapshot.max_concurrent_operations: 1` to prevent concurrent
+snapshot operations:
++
+[source,console]
+----
+PUT _cluster/settings
+{
+  "persistent" : {
+    "snapshot.max_concurrent_operations" : 1
+  }
+}
+----
 +
-This issue is fixed in {es} versions 7.13.1 and later. For more details, see
-{es-issue}73456[#73456].
+This issue is fixed in {es} versions 7.13.1 and later. It is not possible to
+repair a repository once it is affected by this issue, so you must restore the
+repository from a backup, or clear the repository by executing
+`DELETE _snapshot/<repository name>/*`, or move to a fresh repository. For more
+details, see {es-issue}73456[#73456].
+// end::snapshot-repo-corruption-73456-known-issue[]
 
 [[breaking-7.9.0]]
 [discrete]