Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on searchable snaps costs #77607

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Sep 13, 2021

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.

Closes #74385

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.
@DaveCTurner DaveCTurner added >docs General docs changes :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.16.0 v7.15.1 v7.14.2 labels Sep 13, 2021
@elasticmachine elasticmachine added Team:Docs Meta label for docs team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Sep 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @DaveCTurner.

I left some suggestions to improve the structure, but your current copy is fine. Feel free to ignore my suggestions or cherry-pick as wanted.

Comment on lines 201 to 235
[discrete]
[[searchable-snapshots-costs]]
=== Reducing running costs with {search-snaps}

Using {search-snaps} can significantly reduce the running costs of your {es}
cluster. Searchable snapshot indices do not need to be replicated for
resilience since {es} will recover any missing shards from the snapshot
repository after a node failure. In contrast, regular indices must always be
replicated to multiple nodes to ensure resilience. This means that using
{search-snaps} reduces the number of shard copies that you need for your
infrequently-accessed data by a factor of two. Your cold data tier therefore
needs half of the disk space and half of the number of nodes that they would
need if not using {search-snaps}. The partially-mounted indices in your frozen
tier will need even fewer resources.

Furthermore when a fully-mounted {search-snap} index is mounted or relocated
its contents are copied from the repository rather than from another node in
your cluster. Retrieving data from the snapshot repository is usually very
cheap. In contrast, the contents of regular indices are copied from another
node in the cluster. Transferring data from a node in a different zone often
carries a significant cost.

NOTE: You can realise these cost savings in most environments, including on all
major cloud platforms, but take note that they do not apply to every
environment. For example, if retrieving data from your snapshot repository
carries a high cost then you may find {search-snaps} to be more expensive than
regular indices. Ensure that the cost structure of your operating environment
is compatible with {search-snaps} before using them.

WARNING: Most cloud providers charge significant fees for data transferred
between regions and for data transferred out of their platforms. You should
only mount snapshots into a cluster that is in the same region as the snapshot
repository. If you wish to search data across multiple regions, configure
multiple clusters and use <<modules-cross-cluster-search,{ccs}>> or
<<xpack-ccr,{ccr}>> instead of {search-snaps}.
Copy link
Contributor

@jrodewig jrodewig Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to restructure this so that:

  • Users can get a high-level takeway from the first paragraph
  • The information about replicas and repo data retrieval is in separate sections

I think this makes it a little easier to parse, but the copy you current have is fine.

Suggested change
[discrete]
[[searchable-snapshots-costs]]
=== Reducing running costs with {search-snaps}
Using {search-snaps} can significantly reduce the running costs of your {es}
cluster. Searchable snapshot indices do not need to be replicated for
resilience since {es} will recover any missing shards from the snapshot
repository after a node failure. In contrast, regular indices must always be
replicated to multiple nodes to ensure resilience. This means that using
{search-snaps} reduces the number of shard copies that you need for your
infrequently-accessed data by a factor of two. Your cold data tier therefore
needs half of the disk space and half of the number of nodes that they would
need if not using {search-snaps}. The partially-mounted indices in your frozen
tier will need even fewer resources.
Furthermore when a fully-mounted {search-snap} index is mounted or relocated
its contents are copied from the repository rather than from another node in
your cluster. Retrieving data from the snapshot repository is usually very
cheap. In contrast, the contents of regular indices are copied from another
node in the cluster. Transferring data from a node in a different zone often
carries a significant cost.
NOTE: You can realise these cost savings in most environments, including on all
major cloud platforms, but take note that they do not apply to every
environment. For example, if retrieving data from your snapshot repository
carries a high cost then you may find {search-snaps} to be more expensive than
regular indices. Ensure that the cost structure of your operating environment
is compatible with {search-snaps} before using them.
WARNING: Most cloud providers charge significant fees for data transferred
between regions and for data transferred out of their platforms. You should
only mount snapshots into a cluster that is in the same region as the snapshot
repository. If you wish to search data across multiple regions, configure
multiple clusters and use <<modules-cross-cluster-search,{ccs}>> or
<<xpack-ccr,{ccr}>> instead of {search-snaps}.
[discrete]
[[searchable-snapshots-costs]]
=== Reduce costs with {search-snaps}
Before using {search-snaps}, ensure they're cost effective in your environment.
In most cases, {search-snaps} reduce the costs of running a cluster by removing
the need for replica shards. However, if it's particularly expensive to retrieve
data from a snapshot repository, {search-snaps} may be more costly than
regular indices.
[discrete]
[[replica-costs]]
==== Replica costs
For resiliency, a regular index requires multiple replica shards across multiple
nodes. If a node fails, {es} uses these replicas to recover any missing data.
A {search-snap} index doesn't require replicas. If a node containing a
{search-snap} index fails, {es} can recover missing data from the snapshot
repository.
Without replicas, {search-snap} indices require far fewer resources. A cold data
tier that contains only fully-mounted {search-snap} indices requires half the
nodes and disk space of a tier containing equivalent regular indices. The frozen tier,
which contains only partially-mounted {search-snap} indices, requires even fewer
resources.
[discrete]
[[snapshot-retrieval-costs]]
==== Snapshot retrieval costs
// tag::search-snap-costs[]
When {es} mounts or reallocates a fully-mounted {search-snap} index, it copies
the index's data from the snapshot repository. The cost of retrieving this data
is typically low but may be higher in some environments.
// end::search-snap-costs[]
Many cloud providers charge significant fees for data transfers between regions
or out of their platforms. To avoid these fees, only mount
{search-snap} indices on clusters in the same region as the snapshot repository.
To search across multiple regions, use <<modules-cross-cluster-search,{ccs}>> or
<<xpack-ccr,{ccr}>> instead.

Comment on lines 34 to 38
NOTE: Mounting and relocating the shards of {search-snap} indices involves
copying data from the snapshot repository. This may incur different costs from
the copying between nodes in your cluster that happens with regular indices.
These costs are typically lower, but in some environments they may be higher.
See <<searchable-snapshots-costs>> for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use my other suggestion, you can reuse some of that content here.

Either way, I'd consider moving this up the page (right after the paragraph ending ... corresponding data tier.). I'd also consider removing the other two IMPORTANT and NOTE admonitions on the page. Having a stack of admonitions made it more likely for me to glaze over each.

Suggested change
NOTE: Mounting and relocating the shards of {search-snap} indices involves
copying data from the snapshot repository. This may incur different costs from
the copying between nodes in your cluster that happens with regular indices.
These costs are typically lower, but in some environments they may be higher.
See <<searchable-snapshots-costs>> for more details.
[IMPORTANT]
====
include::{es-repo-dir}/searchable-snapshots/index.asciidoc[tag=search-snap-costs]
See <<searchable-snapshots-costs>>.
====

Copy link
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you very much, David!

@DaveCTurner
Copy link
Contributor Author

Thanks both for the reviews and suggestions. I like the idea to split it up into more sections, I didn't take it as-is but did something similar in 3e78904. @jrodewig would you take another look and see if any final polish is needed?

Preview at https://elasticsearch_77607.docs-preview.app.elstc.co/diff

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some non-blocking nits. Thanks, @DaveCTurner!

docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
@DaveCTurner DaveCTurner added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Sep 15, 2021
@elasticsearchmachine elasticsearchmachine merged commit f2a5706 into elastic:master Sep 15, 2021
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 15, 2021
* Add docs on searchable snaps costs

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.

* Split into sections

Co-authored-by: James Rodewig <[email protected]>

* data -> the shard contents

* More wording tweaks

* Apply suggestions from code review

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 15, 2021
* Add docs on searchable snaps costs

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.

* Split into sections

Co-authored-by: James Rodewig <[email protected]>

* data -> the shard contents

* More wording tweaks

* Apply suggestions from code review

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.x
7.15
7.14

@DaveCTurner DaveCTurner deleted the 2021-09-13-searchable-snaps-costs-docs branch September 15, 2021 07:29
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 15, 2021
* Add docs on searchable snaps costs

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.

* Split into sections

Co-authored-by: James Rodewig <[email protected]>

* data -> the shard contents

* More wording tweaks

* Apply suggestions from code review

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Sep 15, 2021
* Add docs on searchable snaps costs

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.

* Split into sections

Co-authored-by: James Rodewig <[email protected]>

* data -> the shard contents

* More wording tweaks

* Apply suggestions from code review

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Sep 15, 2021
* Add docs on searchable snaps costs

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.

* Split into sections

Co-authored-by: James Rodewig <[email protected]>

* data -> the shard contents

* More wording tweaks

* Apply suggestions from code review

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Sep 15, 2021
* Add docs on searchable snaps costs

Adds a note on why searchable snapshots is cheaper, including warnings
that it might be more expensive too.

* Split into sections

Co-authored-by: James Rodewig <[email protected]>

* data -> the shard contents

* More wording tweaks

* Apply suggestions from code review

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: James Rodewig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >docs General docs changes Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. Team:Docs Meta label for docs team v7.14.2 v7.15.1 v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify migration from warm to cold tier
6 participants