Emphasize that filesystem-level backups don't work #33102

DaveCTurner · 2018-08-23T17:50:03Z

It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.

It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.

elasticmachine · 2018-08-23T17:50:04Z

Pinging @elastic/es-distributed

…ackups

DaveCTurner · 2018-08-27T09:08:08Z

@elasticmachine retest this please - packaging-sample test run failed for what looks to be environmental issues.

bleskes · 2018-09-09T17:30:30Z

docs/reference/modules/snapshots.asciidoc

+WARNING: It is not possible to back up an Elasticsearch cluster simply by
+taking a copy of the data directories of all of its nodes. Elasticsearch may be
+making changes to the contents of its data directories while it is running, and
+this means that copying its data directories cannot be expected to capture a


I think we should add a caveat here saying "copying it's data directories without guaranteeing a point in time snapshot".

I'd prefer not to do so. If we mention PIT snapshots like this then we are sorta hinting that this is a recognised way of taking snapshots, and I think it's unwise to imply this. For instance, restoring a cluster from PIT snapshots of each of its nodes breaks assumptions in Zen2 which could result in a split-brain in the restored cluster. Also there's a good chance that someone would end up accidentally restoring a node into a cluster that already contains a fresher copy of the same node, which I think can lead to data loss.

Furthermore, we don't have any tests that restoring from PIT snapshots works, and although in theory it should work ok, subject to caveats, in practice there are many variables (filesystem bugs, storage-vendor-specific bugs, multiple-data-path complexity, ...) that make me think we should steer clear of mentioning it at all here.

I was thinking about people running on file systems that support PIT and use that to have a back up. I was only thinking from the point of view of the shard data. I agree that from a cluster level perspective this is problematic. I'm fine with leaving as is.

bleskes

LGTM

It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.

* master: (46 commits) Fixing assertions in integration test (elastic#33833) [CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (elastic#33821) HLRC: Delete ML calendar (elastic#33775) Move DocsStats into Engine (elastic#33835) [Docs] Clarify accessing Date methods in painless (elastic#33560) add elasticsearch-shard tool (elastic#32281) Cut over to unwrap segment reader (elastic#33843) SQL: Fix issue with options for QUERY() and MATCH(). (elastic#33828) Emphasize that filesystem-level backups don't work (elastic#33102) Use the global doc id to generate a random score (elastic#33599) Add minimal sanity checks to custom/scripted similarities. (elastic#33564) Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. (elastic#33196) [CCR] Change FollowIndexAction.Request class to be more user friendly (elastic#33810) SQL: day and month name functions tests locale providers enforcement (elastic#33653) TESTS: Set SO_LINGER = 0 for MockNioTransport (elastic#32560) Test: Relax jarhell gradle test (elastic#33787) [CCR] Fail with a descriptive error if leader index does not exist (elastic#33797) Add ES version 6.4.2 (elastic#33831) MINOR: Remove Some Dead Code in Scripting (elastic#33800) Ensure realtime `_get` and `_termvectors` don't run on the network thread (elastic#33814) ...

In elastic#33102 we added a warning against using filesystem backups. Experience has shown that the wording we added was insufficiently general and open to misinterpretation. This commit reworks it to be clearer. This commit also clarifies that snapshots are not incremental across repositories.

In #33102 we added a warning against using filesystem backups. Experience has shown that the wording we added was insufficiently general and open to misinterpretation. This commit reworks it to be clearer. This commit also clarifies that snapshots are not incremental across repositories.

DaveCTurner added >docs General docs changes :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.0.0 v6.0.3 v6.1.5 v6.2.5 v6.5.0 v6.3.3 v6.4.1 labels Aug 23, 2018

DaveCTurner requested review from debadair, tlrx and dliappis August 23, 2018 17:50

Merge branch 'master' into 2018-08-23-snapshot-restore-is-how-to-do-b…

9408404

…ackups

bleskes reviewed Sep 9, 2018

View reviewed changes

bleskes approved these changes Sep 19, 2018

View reviewed changes

DaveCTurner merged commit c9765d5 into elastic:master Sep 19, 2018

DaveCTurner deleted the 2018-08-23-snapshot-restore-is-how-to-do-backups branch September 19, 2018 07:36

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

DaveCTurner mentioned this pull request Jan 18, 2021

Further emphasise filesystem backups don't work #67634

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emphasize that filesystem-level backups don't work #33102

Emphasize that filesystem-level backups don't work #33102

DaveCTurner commented Aug 23, 2018

elasticmachine commented Aug 23, 2018

DaveCTurner commented Aug 27, 2018

bleskes Sep 9, 2018

DaveCTurner Sep 9, 2018

bleskes Sep 19, 2018

bleskes left a comment

Emphasize that filesystem-level backups don't work #33102

Emphasize that filesystem-level backups don't work #33102

Conversation

DaveCTurner commented Aug 23, 2018

elasticmachine commented Aug 23, 2018

DaveCTurner commented Aug 27, 2018

bleskes Sep 9, 2018

Choose a reason for hiding this comment

DaveCTurner Sep 9, 2018

Choose a reason for hiding this comment

bleskes Sep 19, 2018

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment