From 50bccf560998e1b1514648ea953115a9288f828a Mon Sep 17 00:00:00 2001 From: shainaraskas <58563081+shainaraskas@users.noreply.github.com> Date: Thu, 25 Jul 2024 14:44:57 -0400 Subject: [PATCH] Round up shard allocation / recovery / relocation concepts (#109943) --- docs/reference/cat/recovery.asciidoc | 2 +- .../high-availability/cluster-design.asciidoc | 3 +- docs/reference/indices/recovery.asciidoc | 18 +---- docs/reference/modules/cluster.asciidoc | 4 +- .../modules/shard-allocation-desc.asciidoc | 2 + docs/reference/modules/shard-ops.asciidoc | 75 +++++++++++++++++++ .../modules/shard-recovery-desc.asciidoc | 16 ++++ docs/reference/setup.asciidoc | 5 +- 8 files changed, 103 insertions(+), 22 deletions(-) create mode 100644 docs/reference/modules/shard-allocation-desc.asciidoc create mode 100644 docs/reference/modules/shard-ops.asciidoc create mode 100644 docs/reference/modules/shard-recovery-desc.asciidoc diff --git a/docs/reference/cat/recovery.asciidoc b/docs/reference/cat/recovery.asciidoc index 058f4e69ae8e3..c3292fc9971ee 100644 --- a/docs/reference/cat/recovery.asciidoc +++ b/docs/reference/cat/recovery.asciidoc @@ -39,7 +39,7 @@ The cat recovery API returns information about shard recoveries, both ongoing and completed. It is a more compact view of the JSON <> API. -include::{es-ref-dir}/indices/recovery.asciidoc[tag=shard-recovery-desc] +include::{es-ref-dir}/modules/shard-recovery-desc.asciidoc[] [[cat-recovery-path-params]] diff --git a/docs/reference/high-availability/cluster-design.asciidoc b/docs/reference/high-availability/cluster-design.asciidoc index 6c17a494f36ae..105c8b236b0b1 100644 --- a/docs/reference/high-availability/cluster-design.asciidoc +++ b/docs/reference/high-availability/cluster-design.asciidoc @@ -246,7 +246,8 @@ accumulate into a noticeable performance penalty. An unreliable network may have frequent network partitions. {es} will automatically recover from a network partition as quickly as it can but your cluster may be partly unavailable during a partition and will need to spend time and resources to -resynchronize any missing data and rebalance itself once the partition heals. +<> and <> +itself once the partition heals. Recovering from a failure may involve copying a large amount of data between nodes so the recovery time is often determined by the available bandwidth. diff --git a/docs/reference/indices/recovery.asciidoc b/docs/reference/indices/recovery.asciidoc index b4e4bd33f819a..06b4d9d92e49f 100644 --- a/docs/reference/indices/recovery.asciidoc +++ b/docs/reference/indices/recovery.asciidoc @@ -35,21 +35,7 @@ index, or alias. Use the index recovery API to get information about ongoing and completed shard recoveries. -// tag::shard-recovery-desc[] -Shard recovery is the process of initializing a shard copy, such as restoring a -primary shard from a snapshot or syncing a replica shard from a primary shard. -When a shard recovery completes, the recovered shard is available for search -and indexing. - -Recovery automatically occurs during the following processes: - -* Node startup. This type of recovery is called a local store recovery. -* Primary shard replication. -* Relocation of a shard to a different node in the same cluster. -* <> operation. -* <>, <>, or -<> operation. -// end::shard-recovery-desc[] +include::{es-ref-dir}/modules/shard-recovery-desc.asciidoc[] The index recovery API reports information about completed recoveries only for shard copies that currently exist in the cluster. It only reports the last @@ -360,7 +346,7 @@ The API returns the following response: "index1" : { "shards" : [ { "id" : 0, - "type" : "STORE", + "type" : "EXISTING_STORE", "stage" : "DONE", "primary" : true, "start_time" : "2014-02-24T12:38:06.349", diff --git a/docs/reference/modules/cluster.asciidoc b/docs/reference/modules/cluster.asciidoc index 4b9ede5450683..b3eaa5b47c238 100644 --- a/docs/reference/modules/cluster.asciidoc +++ b/docs/reference/modules/cluster.asciidoc @@ -1,9 +1,7 @@ [[modules-cluster]] === Cluster-level shard allocation and routing settings -_Shard allocation_ is the process of allocating shards to nodes. This can -happen during initial recovery, replica allocation, rebalancing, or -when nodes are added or removed. +include::{es-ref-dir}/modules/shard-allocation-desc.asciidoc[] One of the main roles of the master is to decide which shards to allocate to which nodes, and when to move shards between nodes in order to rebalance the diff --git a/docs/reference/modules/shard-allocation-desc.asciidoc b/docs/reference/modules/shard-allocation-desc.asciidoc new file mode 100644 index 0000000000000..426ad0da72e1b --- /dev/null +++ b/docs/reference/modules/shard-allocation-desc.asciidoc @@ -0,0 +1,2 @@ +Shard allocation is the process of assigning shard copies to nodes. This can +happen during initial recovery, replica allocation, rebalancing, when nodes are added to or removed from the cluster, or when cluster or index settings that impact allocation are updated. \ No newline at end of file diff --git a/docs/reference/modules/shard-ops.asciidoc b/docs/reference/modules/shard-ops.asciidoc new file mode 100644 index 0000000000000..c0e5ee6a220f0 --- /dev/null +++ b/docs/reference/modules/shard-ops.asciidoc @@ -0,0 +1,75 @@ +[[shard-allocation-relocation-recovery]] +=== Shard allocation, relocation, and recovery + +Each <> in Elasticsearch is divided into one or more <>. +Each document in an index belongs to a single shard. + +A cluster can contain multiple copies of a shard. Each shard has one distinguished shard copy called the _primary_, and zero or more non-primary copies called _replicas_. The primary shard copy serves as the main entry point for all indexing operations. The operations on the primary shard copy are then forwarded to its replicas. + +Replicas maintain redundant copies of your data across the <> in your cluster, protecting against hardware failure and increasing capacity to serve read requests like searching or retrieving a document. If the primary shard copy fails, then a replica is promoted to primary and takes over the primary's responsibilities. + +Over the course of normal operation, Elasticsearch allocates shard copies to nodes, relocates shard copies across nodes to balance the cluster or satisfy new allocation constraints, and recovers shards to initialize new copies. In this topic, you'll learn how these operations work and how you can control them. + +TIP: To learn about optimizing the number and size of shards in your cluster, refer to <>. To learn about how read and write operations are replicated across shards and shard copies, refer to <>. + +[[shard-allocation]] +==== Shard allocation + +include::{es-ref-dir}/modules/shard-allocation-desc.asciidoc[] + +By default, the primary and replica shard copies for an index can be allocated to any node in the cluster, and may be relocated to rebalance the cluster. + +===== Adjust shard allocation settings + +You can control how shard copies are allocated using the following settings: + +- <>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to allocate nodes availability zones, or prevent certain nodes from being used so you can perform maintenance. + +- <>: Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes. + +===== Monitor shard allocation + +If a shard copy is unassigned, it means that the shard copy is not allocated to any node in the cluster. This can happen if there are not enough nodes in the cluster to allocate the shard copy, or if the shard copy can't be allocated to any node that satisfies the shard allocation filtering rules. When a shard copy is unassigned, your cluster is considered unhealthy and returns a yellow or red cluster health status. + +You can use the following APIs to monitor shard allocation: + +- <> +- <> +- <> + +<>. + +[[shard-recovery]] +==== Shard recovery + +include::{es-ref-dir}/modules/shard-recovery-desc.asciidoc[] + +===== Adjust shard recovery settings + +To control how shards are recovered, for example the resources that can be used by recovery operations, and which indices should be prioritized for recovery, you can adjust the following settings: + +- <> +- <> +- <>, including <> and <> + +Shard recovery operations also respect general shard allocation settings. + +===== Monitor shard recovery + +You can use the following APIs to monitor shard allocation: + + - View a list of in-progress and completed recoveries using the <> + - View detailed information about a specific recovery using the <> + +[[shard-relocation]] +==== Shard relocation + +Shard relocation is the process of moving shard copies from one node to another. This can happen when a node joins or leaves the cluster, or when the cluster is rebalancing. + +When a shard copy is relocated, it is created as a new shard copy on the target node. When the shard copy is fully allocated and recovered, the old shard copy is deleted. If the shard copy being relocated is a primary, then the new shard copy is marked as primary before the old shard copy is deleted. + +===== Adjust shard relocation settings + +You can control how and when shard copies are relocated. For example, you can adjust the rebalancing settings that control when shard copies are relocated to balance the cluster, or the high watermark for disk-based shard allocation that can trigger relocation. These settings are part of the <>. + +Shard relocation operations also respect shard allocation and recovery settings. \ No newline at end of file diff --git a/docs/reference/modules/shard-recovery-desc.asciidoc b/docs/reference/modules/shard-recovery-desc.asciidoc new file mode 100644 index 0000000000000..67eaceb528962 --- /dev/null +++ b/docs/reference/modules/shard-recovery-desc.asciidoc @@ -0,0 +1,16 @@ +Shard recovery is the process of initializing a shard copy, such as restoring a +primary shard from a snapshot or creating a replica shard from a primary shard. +When a shard recovery completes, the recovered shard is available for search +and indexing. + +Recovery automatically occurs during the following processes: + +* When creating an index for the first time. +* When a node rejoins the cluster and starts up any missing primary shard copies using the data that it holds in its data path. +* Creation of new replica shard copies from the primary. +* Relocation of a shard copy to a different node in the same cluster. +* A <> operation. +* A <>, <>, or +<> operation. + +You can determine the cause of a shard recovery using the <> or <> APIs. \ No newline at end of file diff --git a/docs/reference/setup.asciidoc b/docs/reference/setup.asciidoc index 64626aafb2441..b346fddc5e5a1 100644 --- a/docs/reference/setup.asciidoc +++ b/docs/reference/setup.asciidoc @@ -33,7 +33,6 @@ include::setup/configuration.asciidoc[] include::setup/important-settings.asciidoc[] - include::setup/secure-settings.asciidoc[] include::settings/audit-settings.asciidoc[] @@ -82,6 +81,8 @@ include::modules/indices/search-settings.asciidoc[] include::settings/security-settings.asciidoc[] +include::modules/shard-ops.asciidoc[] + include::modules/indices/request_cache.asciidoc[] include::settings/snapshot-settings.asciidoc[] @@ -93,7 +94,9 @@ include::modules/threadpool.asciidoc[] include::settings/notification-settings.asciidoc[] include::setup/advanced-configuration.asciidoc[] + include::setup/sysconfig.asciidoc[] + include::setup/bootstrap-checks.asciidoc[] include::setup/bootstrap-checks-xes.asciidoc[]