Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emphasize that filesystem-level backups don't work #33102

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 37 additions & 12 deletions docs/reference/modules/snapshots.asciidoc
Original file line number Diff line number Diff line change
@@ -1,22 +1,47 @@
[[modules-snapshots]]
== Snapshot And Restore

You can store snapshots of individual indices or an entire cluster in
a remote repository like a shared file system, S3, or HDFS. These snapshots
are great for backups because they can be restored relatively quickly. However,
snapshots can only be restored to versions of Elasticsearch that can read the
indices:
A snapshot is a backup taken from a running Elasticsearch cluster. You can take
a snapshot of individual indices or of the entire cluster and store it in a
repository on a shared filesystem, and there are plugins that support remote
repositories on S3, HDFS, Azure, Google Cloud Storage and more.

Snapshots are taken incrementally. This means that when creating a snapshot of
an index Elasticsearch will avoid copying any data that is already stored in
the repository as part of an earlier snapshot of the same index. Therefore it
can be efficient to take snapshots of your cluster quite frequently.

Snapshots can be restored into a running cluster via the restore API. When
restoring an index it is possible to alter the name of the restored index as
well as some of its settings, allowing a great deal of flexibility in how the
snapshot and restore functionality can be used.

WARNING: It is not possible to back up an Elasticsearch cluster simply by
taking a copy of the data directories of all of its nodes. Elasticsearch may be
making changes to the contents of its data directories while it is running, and
this means that copying its data directories cannot be expected to capture a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a caveat here saying "copying it's data directories without guaranteeing a point in time snapshot".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to do so. If we mention PIT snapshots like this then we are sorta hinting that this is a recognised way of taking snapshots, and I think it's unwise to imply this. For instance, restoring a cluster from PIT snapshots of each of its nodes breaks assumptions in Zen2 which could result in a split-brain in the restored cluster. Also there's a good chance that someone would end up accidentally restoring a node into a cluster that already contains a fresher copy of the same node, which I think can lead to data loss.

Furthermore, we don't have any tests that restoring from PIT snapshots works, and although in theory it should work ok, subject to caveats, in practice there are many variables (filesystem bugs, storage-vendor-specific bugs, multiple-data-path complexity, ...) that make me think we should steer clear of mentioning it at all here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about people running on file systems that support PIT and use that to have a back up. I was only thinking from the point of view of the shard data. I agree that from a cluster level perspective this is problematic. I'm fine with leaving as is.

consistent picture of their contents. Attempts to restore a cluster from such a
backup may fail, reporting corruption and/or missing files, or may appear to
have succeeded having silently lost some of its data. The only reliable way to
back up a cluster is by using the snapshot and restore functionality.

[float]
=== Version compatibility

A snapshot contains a copy of the on-disk data structures that make up an
index. This means that snapshots can only be restored to versions of
Elasticsearch that can read the indices:

* A snapshot of an index created in 5.x can be restored to 6.x.
* A snapshot of an index created in 2.x can be restored to 5.x.
* A snapshot of an index created in 1.x can be restored to 2.x.

Conversely, snapshots of indices created in 1.x **cannot** be restored to
5.x or 6.x, and snapshots of indices created in 2.x **cannot** be restored
to 6.x.
Conversely, snapshots of indices created in 1.x **cannot** be restored to 5.x
or 6.x, and snapshots of indices created in 2.x **cannot** be restored to 6.x.

Snapshots are incremental and can contain indices created in various
versions of Elasticsearch. If any indices in a snapshot were created in an
Each snapshot can contain indices created in various versions of Elasticsearch,
and when restoring a snapshot it must be possible to restore all of the indices
into the target cluster. If any indices in a snapshot were created in an
incompatible version, you will not be able restore the snapshot.

IMPORTANT: When backing up your data prior to an upgrade, keep in mind that you
Expand All @@ -28,8 +53,8 @@ that is incompatible with the version of the cluster you are currently running,
you can restore it on the latest compatible version and use
<<reindex-from-remote,reindex-from-remote>> to rebuild the index on the current
version. Reindexing from remote is only possible if the original index has
source enabled. Retrieving and reindexing the data can take significantly longer
than simply restoring a snapshot. If you have a large amount of data, we
source enabled. Retrieving and reindexing the data can take significantly
longer than simply restoring a snapshot. If you have a large amount of data, we
recommend testing the reindex from remote process with a subset of your data to
understand the time requirements before proceeding.

Expand Down