Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emphasize that filesystem-level backups don't work #33102

Conversation

DaveCTurner
Copy link
Contributor

It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.

It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor Author

@elasticmachine retest this please - packaging-sample test run failed for what looks to be environmental issues.

WARNING: It is not possible to back up an Elasticsearch cluster simply by
taking a copy of the data directories of all of its nodes. Elasticsearch may be
making changes to the contents of its data directories while it is running, and
this means that copying its data directories cannot be expected to capture a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a caveat here saying "copying it's data directories without guaranteeing a point in time snapshot".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to do so. If we mention PIT snapshots like this then we are sorta hinting that this is a recognised way of taking snapshots, and I think it's unwise to imply this. For instance, restoring a cluster from PIT snapshots of each of its nodes breaks assumptions in Zen2 which could result in a split-brain in the restored cluster. Also there's a good chance that someone would end up accidentally restoring a node into a cluster that already contains a fresher copy of the same node, which I think can lead to data loss.

Furthermore, we don't have any tests that restoring from PIT snapshots works, and although in theory it should work ok, subject to caveats, in practice there are many variables (filesystem bugs, storage-vendor-specific bugs, multiple-data-path complexity, ...) that make me think we should steer clear of mentioning it at all here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about people running on file systems that support PIT and use that to have a back up. I was only thinking from the point of view of the shard data. I agree that from a cluster level perspective this is problematic. I'm fine with leaving as is.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit c9765d5 into elastic:master Sep 19, 2018
@DaveCTurner DaveCTurner deleted the 2018-08-23-snapshot-restore-is-how-to-do-backups branch September 19, 2018 07:36
DaveCTurner added a commit that referenced this pull request Sep 19, 2018
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
DaveCTurner added a commit that referenced this pull request Sep 19, 2018
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
DaveCTurner added a commit that referenced this pull request Sep 19, 2018
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
DaveCTurner added a commit that referenced this pull request Sep 19, 2018
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
DaveCTurner added a commit that referenced this pull request Sep 19, 2018
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
DaveCTurner added a commit that referenced this pull request Sep 19, 2018
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 19, 2018
* master: (46 commits)
  Fixing assertions in integration test (elastic#33833)
  [CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (elastic#33821)
  HLRC: Delete ML calendar (elastic#33775)
  Move DocsStats into Engine (elastic#33835)
  [Docs] Clarify accessing Date methods in painless (elastic#33560)
  add elasticsearch-shard tool (elastic#32281)
  Cut over to unwrap segment reader (elastic#33843)
  SQL: Fix issue with options for QUERY() and MATCH(). (elastic#33828)
  Emphasize that filesystem-level backups don't work (elastic#33102)
  Use the global doc id to generate a random score (elastic#33599)
  Add minimal sanity checks to custom/scripted similarities. (elastic#33564)
  Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. (elastic#33196)
  [CCR] Change FollowIndexAction.Request class to be more user friendly (elastic#33810)
  SQL: day and month name functions tests locale providers enforcement (elastic#33653)
  TESTS: Set SO_LINGER = 0 for MockNioTransport (elastic#32560)
  Test: Relax jarhell gradle test (elastic#33787)
  [CCR] Fail with a descriptive error if leader index does not exist (elastic#33797)
  Add ES version 6.4.2 (elastic#33831)
  MINOR: Remove Some Dead Code in Scripting (elastic#33800)
  Ensure realtime `_get` and `_termvectors` don't run on the network thread (elastic#33814)
  ...
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jan 18, 2021
In elastic#33102 we added a warning against using filesystem backups.
Experience has shown that the wording we added was insufficiently
general and open to misinterpretation. This commit reworks it to be
clearer.

This commit also clarifies that snapshots are not incremental across
repositories.
DaveCTurner added a commit that referenced this pull request Jan 19, 2021
In #33102 we added a warning against using filesystem backups.
Experience has shown that the wording we added was insufficiently
general and open to misinterpretation. This commit reworks it to be
clearer.

This commit also clarifies that snapshots are not incremental across
repositories.
DaveCTurner added a commit that referenced this pull request Jan 19, 2021
In #33102 we added a warning against using filesystem backups.
Experience has shown that the wording we added was insufficiently
general and open to misinterpretation. This commit reworks it to be
clearer.

This commit also clarifies that snapshots are not incremental across
repositories.
DaveCTurner added a commit that referenced this pull request Jan 19, 2021
In #33102 we added a warning against using filesystem backups.
Experience has shown that the wording we added was insufficiently
general and open to misinterpretation. This commit reworks it to be
clearer.

This commit also clarifies that snapshots are not incremental across
repositories.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants