-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emphasize that filesystem-level backups don't work #33102
Emphasize that filesystem-level backups don't work #33102
Conversation
It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.
Pinging @elastic/es-distributed |
@elasticmachine retest this please - packaging-sample test run failed for what looks to be environmental issues. |
WARNING: It is not possible to back up an Elasticsearch cluster simply by | ||
taking a copy of the data directories of all of its nodes. Elasticsearch may be | ||
making changes to the contents of its data directories while it is running, and | ||
this means that copying its data directories cannot be expected to capture a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a caveat here saying "copying it's data directories without guaranteeing a point in time snapshot".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to do so. If we mention PIT snapshots like this then we are sorta hinting that this is a recognised way of taking snapshots, and I think it's unwise to imply this. For instance, restoring a cluster from PIT snapshots of each of its nodes breaks assumptions in Zen2 which could result in a split-brain in the restored cluster. Also there's a good chance that someone would end up accidentally restoring a node into a cluster that already contains a fresher copy of the same node, which I think can lead to data loss.
Furthermore, we don't have any tests that restoring from PIT snapshots works, and although in theory it should work ok, subject to caveats, in practice there are many variables (filesystem bugs, storage-vendor-specific bugs, multiple-data-path complexity, ...) that make me think we should steer clear of mentioning it at all here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about people running on file systems that support PIT and use that to have a back up. I was only thinking from the point of view of the shard data. I agree that from a cluster level perspective this is problematic. I'm fine with leaving as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.
It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.
It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.
It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.
It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.
It is not obvious that a filesystem-level backup may capture an inconsistent set of files that may fail on restore, or (worse) succeed having silently discarded some data. This change spells the out, and reorganises the first page or so of the snapshot/restore docs to make this warning fit more nicely.
* master: (46 commits) Fixing assertions in integration test (elastic#33833) [CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (elastic#33821) HLRC: Delete ML calendar (elastic#33775) Move DocsStats into Engine (elastic#33835) [Docs] Clarify accessing Date methods in painless (elastic#33560) add elasticsearch-shard tool (elastic#32281) Cut over to unwrap segment reader (elastic#33843) SQL: Fix issue with options for QUERY() and MATCH(). (elastic#33828) Emphasize that filesystem-level backups don't work (elastic#33102) Use the global doc id to generate a random score (elastic#33599) Add minimal sanity checks to custom/scripted similarities. (elastic#33564) Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. (elastic#33196) [CCR] Change FollowIndexAction.Request class to be more user friendly (elastic#33810) SQL: day and month name functions tests locale providers enforcement (elastic#33653) TESTS: Set SO_LINGER = 0 for MockNioTransport (elastic#32560) Test: Relax jarhell gradle test (elastic#33787) [CCR] Fail with a descriptive error if leader index does not exist (elastic#33797) Add ES version 6.4.2 (elastic#33831) MINOR: Remove Some Dead Code in Scripting (elastic#33800) Ensure realtime `_get` and `_termvectors` don't run on the network thread (elastic#33814) ...
In elastic#33102 we added a warning against using filesystem backups. Experience has shown that the wording we added was insufficiently general and open to misinterpretation. This commit reworks it to be clearer. This commit also clarifies that snapshots are not incremental across repositories.
In #33102 we added a warning against using filesystem backups. Experience has shown that the wording we added was insufficiently general and open to misinterpretation. This commit reworks it to be clearer. This commit also clarifies that snapshots are not incremental across repositories.
In #33102 we added a warning against using filesystem backups. Experience has shown that the wording we added was insufficiently general and open to misinterpretation. This commit reworks it to be clearer. This commit also clarifies that snapshots are not incremental across repositories.
In #33102 we added a warning against using filesystem backups. Experience has shown that the wording we added was insufficiently general and open to misinterpretation. This commit reworks it to be clearer. This commit also clarifies that snapshots are not incremental across repositories.
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.