elastic · jrodewig · Apr 28, 2021 · Apr 21, 2021 · Apr 23, 2021 · Apr 23, 2021
diff --git a/docs/reference/how-to.asciidoc b/docs/reference/how-to.asciidoc
@@ -25,6 +25,8 @@ include::how-to/search-speed.asciidoc[]
 
 include::how-to/disk-usage.asciidoc[]
 
+include::how-to/fix-common-cluster-issues.asciidoc[]
+
 include::how-to/size-your-shards.asciidoc[]
 
 include::how-to/use-elasticsearch-for-time-series-data.asciidoc[]
diff --git a/docs/reference/how-to/fix-common-cluster-issues.asciidoc b/docs/reference/how-to/fix-common-cluster-issues.asciidoc
@@ -0,0 +1,381 @@
+[[fix-common-cluster-issues]]
+== Fix common cluster issues
+
+This guide describes how to fix common problems with {es} clusters.
+
+[discrete]
+[[circuit-breaker-errors]]
+=== Circuit breaker errors
+
+{es} uses <<circuit-breaker,circuit breakers>> to prevent nodes from running out
+of JVM heap memory. If Elasticsearch estimates an operation would exceed a
+circuit breaker, it stops the operation and returns an error.
+
+By default, the <<parent-circuit-breaker,parent circuit breaker>> triggers at
+95% JVM memory usage. To prevent errors, we recommend taking steps to reduce
+memory pressure if usage consistently exceeds 85%.
+
+[discrete]
+[[diagnose-circuit-breaker-errors]]
+==== Diagnose circuit breaker errors
+
+**Error messages**
+
+If a request triggers a circuit breaker, {es} returns an error.
+
+[source,js]
+----
+"error": {
+  "root_cause": [
+    {
+      "type": "circuit_breaking_exception",
+      "reason": "[parent] Data too large, data for [<transport_request>] would be [num/numGB], which is larger than the limit of [num/numGB], usages [request=0/0b, fielddata=num/numKB, in_flight_requests=num/numGB, accounting=num/numGB]"
+    }
+  ]
+}
+----
+// NOTCONSOLE
+
+{es} also writes circuit breaker errors to <<logging,`elasticsearch.log`>>. This
+is helpful when automated processes, such as allocation, trigger a circuit
+breaker.
+
+[source,txt]
+----
+Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [num/numGB], which is larger than the limit of [num/numGB], usages [request=0/0b, fielddata=num/numKB, in_flight_requests=num/numGB, accounting=num/numGB]
+----
+
+**Check JVM memory usage**
+
+If you've enabled Stack Monitoring, you can view JVM memory usage in {kib}. In
+the main menu, click **Stack Monitoring**. On the Stack Monitoring **Overview**
+page, click **Nodes**. The **JVM Heap** column lists the current memory usage
+for each node.
+
+You can also use the <<cat-nodes,cat nodes API>> to get the current
+`heap.percent` for each node.
+
+[source,console]
+----
+GET /_cat/nodes?v=true&h=name,node*,heap*
+----
+
+To get the JVM memory usage for each circuit breaker, use the
+<<cluster-nodes-stats,node stats API>>.
+
+[source,console]
+----
+GET _nodes/stats/breaker
+----
+
+[discrete]
+[[prevent-circuit-breaker-errors]]
+==== Prevent circuit breaker errors
+
+High JVM memory pressure often causes circuit breaker errors. See
+<<high-jvm-memory-pressure>>.
+
+[discrete]
+[[high-jvm-memory-pressure]]
+=== High JVM memory pressure
+
+High JVM memory usage can degrade cluster performance and trigger
+<<circuit-breaker-errors,circuit breaker errors>>. To prevent this, we recommend
+taking steps to reduce memory pressure if a node's JVM memory usage consistently
+exceeds 85%.
+
+[discrete]
+[[diagnose-high-jvm-memory-pressure]]
+==== Diagnose high JVM memory pressure
+
+**Check JVM memory pressure**
+
+include::{es-repo-dir}/tab-widgets/code.asciidoc[]
+include::{es-repo-dir}/tab-widgets/jvm-memory-pressure-widget.asciidoc[]
+
+**Check garbage collection logs**
+
+As memory usage increases, garbage collection becomes more frequent and takes
+longer. You can track the frequency and length of garbage collection events in
+<<logging,`elasticsearch.log`>>.
+
+[source,log]
+----
+[timestamp_short_interval_from_last][INFO ][o.e.m.j.JvmGcMonitorService] [node_id] [gc][number] overhead, spent [21s] collecting in the last [40s]
+----
+
+[discrete]
+[[reduce-jvm-memory-pressure]]
+==== Reduce JVM memory pressure
+
+**Reduce your shard count**
+
+Every shard uses memory. In most cases, a small set of large shards uses fewer
+resources than many small shards. For tips on reducing your shard count, see
+<<size-your-shards>>.
+
+**Avoid expensive searches**
+
+Expensive searches can use large amounts of memory. To better track expensive
+searches on your cluster, enable <<index-modules-slowlog,slow logs>>.
+
+Expensive searches may have a large <<paginate-search-results,`size` argument>>,
+use aggregations with a large number of buckets, or include
+<<query-dsl-allow-expensive-queries,expensive queries>>. To prevent expensive
+searches, consider the following setting changes:
+
+* Lower the `size` limit using the
+<<index-max-result-window,`index.max_result_window`>> index setting.
+
+* Decrease the maximum number of allowed aggregation buckets using the
+<<search-settings-max-buckets,search.max_buckets>> cluster setting.
+
+* Disable expensive queries using the
+<<query-dsl-allow-expensive-queries,`search.allow_expensive_queries`>> cluster
+setting.
+
+[source,console]
+----
+PUT _all/_settings
+{
+  "index.max_result_window": 5000
+}
+
+PUT _cluster/settings
+{
+  "persistent": {
+    "search.max_buckets": 20000,
+    "search.allow_expensive_queries": false
+  }
+}
+----
+// TEST[setup:my_index]
+
+**Prevent mapping explosions**
+
+Defining too many fields or nesting fields too deeply can lead to
+<<mapping-limit-settings,mapping explosions>> that use large amounts of memory.
+To prevent mapping explosions, use the <<mapping-settings-limit,mapping limit
+settings>> to limit the number of field mappings.
+
+**Upgrade node memory**
+
+Heavy indexing and search loads can cause high JVM memory pressure. To better
+handle heavy workloads, upgrade your nodes to increase their memory capacity.
+
+[discrete]
+[[red-yellow-cluster-status]]
+=== Red or yellow cluster status
+
+A red or yellow cluster status indicates one or more shards are missing or
+unallocated. These unassigned shards increase your risk of data loss and can
+degrade cluster performance.
+
+[discrete]
+[[diagnose-cluster-status]]
+==== Diagnose your cluster status
+
+**Check your cluster status**
+
+[source,console]
+----
+GET _cluster/health?filter_path=status,*_shards
+----
+
+A healthy cluster has a green `status` and zero `unassigned_shards`. A yellow
+status means only replicas are unassigned. A red status means one or
+more primary shards are unassigned.
+
+**View unassigned shards**
+
+To view unassigned shards, use the <<cat-shards,cat shards API>>.
+
+[source,console]
+----
+GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
+----
+
+Unassigned shards have a `state` of `UNASSIGNED`. The `prirep` value is `p` for
+primary shards and `r` for replicas. The `unassigned.reason` describes why the
+shard remains unassigned.
+
+To get a more in-depth explanation of each shard's allocation status, use the
+<<cluster-allocation-explain,cluster allocation explanation API>>.
+
+[source,console]
+----
+GET _cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*
+----
+// TEST[catch:bad_request]
+
+You can often use details from the response to resolve the issue. If the cluster
+contains no unassigned shards, the API returns a `400` error.
+
+[discrete]
+[[fix-red-yellow-cluster-status]]
+==== Fix a red or yellow cluster status
+
+A shard can become unassigned for several reasons. The following tips outline the
+most common causes and their solutions.
+
+**Re-enable shard allocation**
+
+You typically disable allocation during a <<restart-cluster,restart>> or other
+cluster maintenance. If you forgot to re-enable allocation afterward, {es} will
+be unable to assign shards. To re-enable allocation, reset the
+`cluster.routing.allocation.enable` cluster setting.
+
+[source,console]
+----
+PUT _cluster/settings
+{
+  "persistent" : {
+    "cluster.routing.allocation.enable" : null
+  }
+}
+----
+
+**Recover lost nodes**
+
+Shards often become unassigned when a data node leaves the cluster. This can
+occur for several reasons, ranging from connectivity issues to hardware failure.
+After you resolve the issue and recover the node, it will rejoin the cluster.
+{es} will then automatically allocate the unassigned shards.
+
+To avoid wasting resources on temporary issues, {es} <<delayed-allocation,delays
+allocation>> by one minute by default. If you've recovered a node and don’t want
+to wait for the delay period, you can call the <<cluster-reroute,cluster reroute
+API>> with no arguments. This request starts the allocation process, which runs
+asynchronously in the background.
+
+[source,console]
+----
+POST /_cluster/reroute
+----
+
+**Fix allocation settings**
+
+Misconfigured allocation settings can result in an unassigned primary shard.
+These settings include:
+
+* <<shard-allocation-filtering,Shard allocation>> index settings
+* <<cluster-shard-allocation-filtering,Allocation filtering>> cluster settings
+* <<shard-allocation-awareness,Allocation awareness>> cluster settings
+
+To review your allocation settings, use the <<indices-get-settings,get index
+settings>> and <<cluster-get-settings,get cluster settings>> APIs.
+
+[source,console]
+----
+GET my-index/_settings?flat_settings=true&include_defaults=true
+
+GET _cluster/settings?flat_settings=true&include_defaults=true
+----
+// TEST[s/^/PUT my-index\n/]
+
+You can change the settings using the <<indices-update-settings,update index
+settings>> and <<cluster-update-settings,update cluster settings>> APIs.
+
+**Allocate or reduce replicas**
+
+To protect against hardware failure, {es} will not assign a replica to the same
+node as its primary shard. If no other data nodes are available to host the
+replica, it remains unassigned. To fix this, you can:
+
+* Add a data node to the same tier to host the replica.
+
+* Change the `index.number_of_replicas` index setting to reduce the number of
+replicas per primary shard. We recommend keeping at least one replica for each
+primary shard.
+
+[source,console]
+----
+PUT my-index/_settings
+{
+  "index.number_of_replicas": 1
+}
+----
+// TEST[s/^/PUT my-index\n/]
+
+**Free up or increase disk space**
+
+{es} uses a <<disk-based-shard-allocation,low disk watermark>> to ensure data
+nodes have enough disk space for incoming shards. By default, {es} does not
+allocate shards to nodes using more than 85% of disk space.
+
+To check the current disk space of your nodes, use the <<cat-allocation,cat
+allocation API>>.
+
+[source,console]
+----
+GET _cat/allocation?v=true&h=node,shards,disk.*
+----
+
+If your nodes are running low on disk space, you have a few options:
+
+* Delete unneeded indices to free up space. If you use
+<<index-lifecycle-management,{ilm-init}>>, you can update your lifecycle policy
+to use <<ilm-searchable-snapshot,searchable snapshots>> or add a
+delete phase. If you no longer need to search the data, you can use a
+<<snapshot-restore,snapshot>> to store it off-cluster.
+
+* If your node has a large disk capacity, you can increase the low disk
+watermark or set it to an explicit byte value.
++
+[source,console]
+----
+PUT _cluster/settings
+{
+  "persistent": {
+    "cluster.routing.allocation.disk.watermark.low": "30gb"
+  }
+}
+----
+// TEST[s/"30gb"/null/]
+
+* Upgrade your node hardware to increase disk space.
+
+**Reduce JVM memory pressure**
+
+Shard allocation requires JVM heap memory. High JVM memory pressure can trigger
+<<circuit-breaker,circuit breakers>> that stop allocation and leave shards
+unassigned. See <<high-jvm-memory-pressure>>.
+
+**Recover data for a lost primary shard**
+
+If a node containing a primary shard is lost, {es} can typically replace it
+using a replica on another node. If no replicas exist and you can't recover the
+node, you'll need to re-add the missing data from a
+<<snapshot-restore,snapshot>> or the original data source.
+
+WARNING: Only use this option if node recovery is no longer possible. This
+process allocates an empty primary shard. If the node later rejoins the cluster,
+{es} will overwrite its primary shard with data from this newer empty shard,
+resulting in data loss.
+
+Use the <<cluster-reroute,cluster reroute API>> to manually allocate the
+unassigned primary shard to another data node in the same tier. Set
+`accept_data_loss` to `true`.
+
+[source,console]
+----
+POST _cluster/reroute
+{
+  "commands": [
+    {
+      "allocate_empty_primary": {
+        "index": "my-index",
+        "shard": 0,
+        "node": "my-node",
+        "accept_data_loss": "true"
+      }
+    }
+  ]
+}
+----
+// TEST[s/^/PUT my-index\n/]
+// TEST[catch:bad_request]
+
+If you backed up the missing index data to a snapshot, use the
+<<restore-snapshot-api,restore snapshot API>> to restore it. Alternatively, you
+can index the missing data from the original data source.