Skip to content

Commit

Permalink
Round up shard allocation / recovery / relocation concepts (#109943) (#…
Browse files Browse the repository at this point in the history
…112292)

(cherry picked from commit 50bccf5)
  • Loading branch information
shainaraskas authored Aug 28, 2024
1 parent 0c45cc7 commit f2babe7
Show file tree
Hide file tree
Showing 8 changed files with 103 additions and 22 deletions.
2 changes: 1 addition & 1 deletion docs/reference/cat/recovery.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The cat recovery API returns information about shard recoveries, both
ongoing and completed. It is a more compact view of the JSON
<<indices-recovery,index recovery>> API.

include::{es-ref-dir}/indices/recovery.asciidoc[tag=shard-recovery-desc]
include::{es-ref-dir}/modules/shard-recovery-desc.asciidoc[]


[[cat-recovery-path-params]]
Expand Down
3 changes: 2 additions & 1 deletion docs/reference/high-availability/cluster-design.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,8 @@ accumulate into a noticeable performance penalty. An unreliable network may
have frequent network partitions. {es} will automatically recover from a
network partition as quickly as it can but your cluster may be partly
unavailable during a partition and will need to spend time and resources to
resynchronize any missing data and rebalance itself once the partition heals.
<<shard-recovery,resynchronize any missing data>> and <<shards-rebalancing-settings,rebalance>>
itself once the partition heals.
Recovering from a failure may involve copying a large amount of data between
nodes so the recovery time is often determined by the available bandwidth.

Expand Down
18 changes: 2 additions & 16 deletions docs/reference/indices/recovery.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,21 +35,7 @@ index, or alias.
Use the index recovery API to get information about ongoing and completed shard
recoveries.

// tag::shard-recovery-desc[]
Shard recovery is the process of initializing a shard copy, such as restoring a
primary shard from a snapshot or syncing a replica shard from a primary shard.
When a shard recovery completes, the recovered shard is available for search
and indexing.

Recovery automatically occurs during the following processes:

* Node startup. This type of recovery is called a local store recovery.
* Primary shard replication.
* Relocation of a shard to a different node in the same cluster.
* <<snapshots-restore-snapshot,Snapshot restore>> operation.
* <<indices-clone-index,Clone>>, <<indices-shrink-index,shrink>>, or
<<indices-split-index,split>> operation.
// end::shard-recovery-desc[]
include::{es-ref-dir}/modules/shard-recovery-desc.asciidoc[]

The index recovery API reports information about completed recoveries only for
shard copies that currently exist in the cluster. It only reports the last
Expand Down Expand Up @@ -360,7 +346,7 @@ The API returns the following response:
"index1" : {
"shards" : [ {
"id" : 0,
"type" : "STORE",
"type" : "EXISTING_STORE",
"stage" : "DONE",
"primary" : true,
"start_time" : "2014-02-24T12:38:06.349",
Expand Down
4 changes: 1 addition & 3 deletions docs/reference/modules/cluster.asciidoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
[[modules-cluster]]
=== Cluster-level shard allocation and routing settings

_Shard allocation_ is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or
when nodes are added or removed.
include::{es-ref-dir}/modules/shard-allocation-desc.asciidoc[]

One of the main roles of the master is to decide which shards to allocate to
which nodes, and when to move shards between nodes in order to rebalance the
Expand Down
2 changes: 2 additions & 0 deletions docs/reference/modules/shard-allocation-desc.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Shard allocation is the process of assigning shard copies to nodes. This can
happen during initial recovery, replica allocation, rebalancing, when nodes are added to or removed from the cluster, or when cluster or index settings that impact allocation are updated.
75 changes: 75 additions & 0 deletions docs/reference/modules/shard-ops.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
[[shard-allocation-relocation-recovery]]
=== Shard allocation, relocation, and recovery

Each <<documents-indices,index>> in Elasticsearch is divided into one or more <<scalability,shards>>.
Each document in an index belongs to a single shard.

A cluster can contain multiple copies of a shard. Each shard has one distinguished shard copy called the _primary_, and zero or more non-primary copies called _replicas_. The primary shard copy serves as the main entry point for all indexing operations. The operations on the primary shard copy are then forwarded to its replicas.

Replicas maintain redundant copies of your data across the <<modules-node,nodes>> in your cluster, protecting against hardware failure and increasing capacity to serve read requests like searching or retrieving a document. If the primary shard copy fails, then a replica is promoted to primary and takes over the primary's responsibilities.

Over the course of normal operation, Elasticsearch allocates shard copies to nodes, relocates shard copies across nodes to balance the cluster or satisfy new allocation constraints, and recovers shards to initialize new copies. In this topic, you'll learn how these operations work and how you can control them.

TIP: To learn about optimizing the number and size of shards in your cluster, refer to <<size-your-shards,Size your shards>>. To learn about how read and write operations are replicated across shards and shard copies, refer to <<docs-replication,Reading and writing documents>>.

[[shard-allocation]]
==== Shard allocation

include::{es-ref-dir}/modules/shard-allocation-desc.asciidoc[]

By default, the primary and replica shard copies for an index can be allocated to any node in the cluster, and may be relocated to rebalance the cluster.

===== Adjust shard allocation settings

You can control how shard copies are allocated using the following settings:

- <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to allocate nodes availability zones, or prevent certain nodes from being used so you can perform maintenance.

- <<index-modules-allocation,Index-level shard allocation settings>>: Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes.

===== Monitor shard allocation

If a shard copy is unassigned, it means that the shard copy is not allocated to any node in the cluster. This can happen if there are not enough nodes in the cluster to allocate the shard copy, or if the shard copy can't be allocated to any node that satisfies the shard allocation filtering rules. When a shard copy is unassigned, your cluster is considered unhealthy and returns a yellow or red cluster health status.

You can use the following APIs to monitor shard allocation:

- <<cluster-allocation-explain,Cluster allocation explain>>
- <<cat-allocation,cat allocation>>
- <<cluster-health,cluster health>>

<<red-yellow-cluster-status,Learn more about troubleshooting unassigned shard copies and recovering your cluster health>>.

[[shard-recovery]]
==== Shard recovery

include::{es-ref-dir}/modules/shard-recovery-desc.asciidoc[]

===== Adjust shard recovery settings

To control how shards are recovered, for example the resources that can be used by recovery operations, and which indices should be prioritized for recovery, you can adjust the following settings:

- <<recovery,Index recovery settings>>
- <<modules-cluster,Cluster-level shard allocation settings>>
- <<index-modules-allocation,Index-level shard allocation settings>>, including <<delayed-allocation,delayed allocation>> and <<recovery-prioritization,index recovery prioritization>>

Shard recovery operations also respect general shard allocation settings.

===== Monitor shard recovery

You can use the following APIs to monitor shard allocation:

- View a list of in-progress and completed recoveries using the <<cat-recovery,cat recovery API>>
- View detailed information about a specific recovery using the <<indices-recovery,index recovery API>>

[[shard-relocation]]
==== Shard relocation

Shard relocation is the process of moving shard copies from one node to another. This can happen when a node joins or leaves the cluster, or when the cluster is rebalancing.

When a shard copy is relocated, it is created as a new shard copy on the target node. When the shard copy is fully allocated and recovered, the old shard copy is deleted. If the shard copy being relocated is a primary, then the new shard copy is marked as primary before the old shard copy is deleted.

===== Adjust shard relocation settings

You can control how and when shard copies are relocated. For example, you can adjust the rebalancing settings that control when shard copies are relocated to balance the cluster, or the high watermark for disk-based shard allocation that can trigger relocation. These settings are part of the <<modules-cluster,cluster-level shard allocation settings>>.

Shard relocation operations also respect shard allocation and recovery settings.
16 changes: 16 additions & 0 deletions docs/reference/modules/shard-recovery-desc.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Shard recovery is the process of initializing a shard copy, such as restoring a
primary shard from a snapshot or creating a replica shard from a primary shard.
When a shard recovery completes, the recovered shard is available for search
and indexing.

Recovery automatically occurs during the following processes:

* When creating an index for the first time.
* When a node rejoins the cluster and starts up any missing primary shard copies using the data that it holds in its data path.
* Creation of new replica shard copies from the primary.
* Relocation of a shard copy to a different node in the same cluster.
* A <<snapshots-restore-snapshot,snapshot restore>> operation.
* A <<indices-clone-index,clone>>, <<indices-shrink-index,shrink>>, or
<<indices-split-index,split>> operation.
You can determine the cause of a shard recovery using the <<indices-recovery,recovery>> or <<cat-recovery,cat recovery>> APIs.
5 changes: 4 additions & 1 deletion docs/reference/setup.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ include::setup/configuration.asciidoc[]

include::setup/important-settings.asciidoc[]


include::setup/secure-settings.asciidoc[]

include::settings/audit-settings.asciidoc[]
Expand Down Expand Up @@ -82,6 +81,8 @@ include::modules/indices/search-settings.asciidoc[]

include::settings/security-settings.asciidoc[]

include::modules/shard-ops.asciidoc[]

include::modules/indices/request_cache.asciidoc[]

include::settings/snapshot-settings.asciidoc[]
Expand All @@ -93,7 +94,9 @@ include::modules/threadpool.asciidoc[]
include::settings/notification-settings.asciidoc[]

include::setup/advanced-configuration.asciidoc[]

include::setup/sysconfig.asciidoc[]

include::setup/bootstrap-checks.asciidoc[]

include::setup/bootstrap-checks-xes.asciidoc[]
Expand Down

0 comments on commit f2babe7

Please sign in to comment.