From ceb54ed65586209f22fc5adef956ec65e3afb941 Mon Sep 17 00:00:00 2001
From: Tim Brooks <tim@uncontended.net>
Date: Mon, 20 Jul 2020 19:35:26 -0600
Subject: [PATCH] Add indexing pressure documentation (#59456)

This commit adds documentation about the new indexing pressure memory
limit setting and exposure of this metrics in node stats.
---
 docs/reference/cluster/nodes-stats.asciidoc   | 79 +++++++++++++++++++
 docs/reference/docs/data-replication.asciidoc | 17 +++-
 docs/reference/index-modules.asciidoc         |  6 ++
 .../index-modules/indexing-pressure.asciidoc  | 73 +++++++++++++++++
 4 files changed, 172 insertions(+), 3 deletions(-)
 create mode 100644 docs/reference/index-modules/indexing-pressure.asciidoc
diff --git a/docs/reference/cluster/nodes-stats.asciidoc b/docs/reference/cluster/nodes-stats.asciidoc
index 15c49e675ca6a..b5116fcc8995f 100644
--- a/docs/reference/cluster/nodes-stats.asciidoc
+++ b/docs/reference/cluster/nodes-stats.asciidoc
@@ -58,6 +58,9 @@ using metrics.
   `http`::
       HTTP connection information.
 
+  `indexing_pressure`::
+        Statistics about the node's indexing load and related rejections.
+
   `indices`::
       Indices stats about size, document count, indexing and deletion times,
       search times, field cache size, merges and flushes.
@@ -2085,6 +2088,82 @@ Number of failed operations for the processor.
 =======
 ======
 
+[[cluster-nodes-stats-api-response-body-indexing-pressure]]
+`indexing_pressure`::
+(object)
+Contains <<index-modules-indexing-pressure,indexing pressure>> statistics for the node.
++
+.Properties of `indexing_pressure`
+[%collapsible%open]
+======
+`total`::
+(object)
+Contains statistics for the cumulative indexing load since the node started.
++
+.Properties of `<total>`
+[%collapsible%open]
+=======
+`combined_coordinating_and_primary_bytes`::
+(integer)
+Bytes consumed by indexing requests in the coordinating or primary stage. This
+value is not the sum of coordinating_bytes and primary_bytes as a node can reuse
+the coordinating bytes if the primary stage is executed locally.
+
+`coordinating_bytes`::
+(integer)
+Bytes consumed by indexing requests in the coordinating stage.
+
+`primary_bytes`::
+(integer)
+Bytes consumed by indexing requests in the primary stage.
+
+`all_bytes`::
+(integer)
+Bytes consumed by indexing requests in the coordinating, primary, or replica stage.
+
+`coordinating_rejections`::
+(integer)
+Number of indexing requests rejected in the coordinating stage.
+
+`primary_rejections`::
+(integer)
+Number of indexing requests rejected in the primary stage.
+
+`replica_rejections`::
+(integer)
+Number of indexing requests rejected in the replica stage.
+=======
+`current`::
+(object)
+Contains statistics for current indexing load.
++
+.Properties of `<current>`
+[%collapsible%open]
+=======
+`combined_coordinating_and_primary_bytes`::
+(integer)
+Bytes consumed by indexing requests in the coordinating or primary stage. This
+value is not the sum of coordinating_bytes and primary_bytes as a node can reuse
+the coordinating bytes if the primary stage is executed locally.
+
+`coordinating_bytes`::
+(integer)
+Bytes consumed by indexing requests in the coordinating stage.
+
+`primary_bytes`::
+(integer)
+Bytes consumed by indexing requests in the primary stage.
+
+`replica_bytes`::
+(integer)
+Bytes consumed by indexing requests in the replica stage.
+
+`all_bytes`::
+(integer)
+Bytes consumed by indexing requests in the coordinating, primary, or replica stage.
+=======
+======
+
 [[cluster-nodes-stats-api-response-body-adaptive-selection]]
 `adaptive_selection`::
 (object)
diff --git a/docs/reference/docs/data-replication.asciidoc b/docs/reference/docs/data-replication.asciidoc
index 969e3dfd54ce2..b4bf8c85cad1c 100644
--- a/docs/reference/docs/data-replication.asciidoc
+++ b/docs/reference/docs/data-replication.asciidoc
@@ -20,12 +20,15 @@ responsible for replicating the operation to the other copies.
 This purpose of this section is to give a high level overview of the Elasticsearch replication model and discuss the implications
 it has for various interactions between write and read operations.
 
-[float]
+[discrete]
+[[basic-write-model]]
 ==== Basic write model
 
 Every indexing operation in Elasticsearch is first resolved to a replication group using <<index-routing,routing>>,
-typically based on the document ID. Once the replication group has been determined,
-the operation is forwarded internally to the current _primary shard_ of the group. The primary shard is responsible
+typically based on the document ID. Once the replication group has been determined, the operation is forwarded
+internally to the current _primary shard_ of the group. This stage of indexing is referred to as the _coordinating stage_.
+
+The next stage of indexing is the _primary stage_, performed on the primary shard. The primary shard is responsible
 for validating the operation and forwarding it to the other replicas. Since replicas can be offline, the primary
 is not required to replicate to all replicas. Instead, Elasticsearch maintains a list of shard copies that should
 receive the operation. This list is called the _in-sync copies_ and is maintained by the master node. As the name implies,
@@ -42,6 +45,14 @@ The primary shard follows this basic flow:
 . Once all replicas have successfully performed the operation and responded to the primary, the primary acknowledges the successful
    completion of the request to the client.
 
+Each in-sync replica copy performs the indexing operation locally so that it has a copy. This stage of indexing is the
+_replica stage_.
+
+These indexing stages (coordinating, primary, and replica) are sequential. To enable internal retries, the lifetime of each stage
+encompasses the lifetime of each subsequent stage. For example, the coordinating stage is not complete until each primary
+stage, which may be spread out across different primary shards, has completed. Each primary stage will not complete until the
+in-sync replicas have finished indexing the docs locally and responded to the replica requests.
+
 [float]
 ===== Failure handling
 
diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc
index 3531aaa5df0af..6373706d6b876 100644
--- a/docs/reference/index-modules.asciidoc
+++ b/docs/reference/index-modules.asciidoc
@@ -281,6 +281,10 @@ Other index settings are available in index modules:
 
     Control over the retention of a history of operations in the index.
 
+<<index-modules-indexing-pressure,Indexing pressure>>::
+
+    Configure indexing back pressure limits.
+
 [float]
 [[x-pack-index-settings]]
 === [xpack]#{xpack} index settings#
@@ -311,3 +315,5 @@ include::index-modules/translog.asciidoc[]
 include::index-modules/history-retention.asciidoc[]
 
 include::index-modules/index-sorting.asciidoc[]
+
+include::index-modules/indexing-pressure.asciidoc[]
diff --git a/docs/reference/index-modules/indexing-pressure.asciidoc b/docs/reference/index-modules/indexing-pressure.asciidoc
new file mode 100644
index 0000000000000..2e7124d057a37
--- /dev/null
+++ b/docs/reference/index-modules/indexing-pressure.asciidoc
@@ -0,0 +1,73 @@
+[[index-modules-indexing-pressure]]
+== Indexing pressure
+
+Indexing documents into {es} introduces system load in the form of memory and
+CPU load. Each indexing operation includes coordinating, primary, and replica
+stages. These stages can be performed across multiple nodes in a cluster.
+
+Indexing pressure can build up through external operations, such as indexing
+requests, or internal mechanisms, such as recoveries and {ccr}. If too much
+indexing work is introduced into the system, the cluster can become saturated.
+This can adversely impact other operations, such as search, cluster
+coordination, and background processing.
+
+To prevent these issues, {es} internally monitors indexing load. When the load
+exceeds certain limits, new indexing work is rejected
+
+[discrete]
+[[indexing-stages]]
+=== Indexing stages
+
+External indexing operations go through three stages: coordinating, primary, and
+replica. See <<basic-write-model>>.
+
+[discrete]
+[[memory-limits]]
+=== Memory limits
+
+The `indexing_pressure.memory.limit` node setting restricts the number of bytes
+available for outstanding indexing requests. This setting defaults to 10% of
+the heap.
+
+At the beginning of each indexing stage, {es} accounts for the
+bytes consumed by an indexing request. This accounting is only released at the
+end of the indexing stage. This means that upstream stages will account for the
+request overheard until all downstream stages are complete. For example, the
+coordinating request will remain accounted for until primary and replica
+stages are complete. The primary request will remain accounted for until each
+in-sync replica has responded to enable replica retries if necessary.
+
+A node will start rejecting new indexing work at the coordinating or primary
+stage when the number of outstanding coordinating, primary, and replica indexing
+bytes exceeds the configured limit.
+
+A node will start rejecting new indexing work at the replica stage when the
+number of outstanding replica indexing bytes exceeds 1.5x the configured limit.
+This design means that as indexing pressure builds on nodes, they will naturally
+stop accepting coordinating and primary work in favor of outstanding replica
+work.
+
+The `indexing_pressure.memory.limit` setting's 10% default limit is generously
+sized. You should only change it after careful consideration. Only indexing
+requests contribute to this limit. This means there is additional indexing
+overhead (buffers, listeners, etc) which also require heap space. Other
+components of {es} also require memory. Setting this limit too high can deny
+operating memory to other operations and components.
+
+[discrete]
+[[indexing-pressure-monitoring]]
+=== Monitoring
+
+You can use the
+<<cluster-nodes-stats-api-response-body-indexing-pressure,node stats API>> to
+retrieve indexing pressure metrics.
+
+[discrete]
+[[indexing-pressure-settings]]
+=== Indexing pressure settings
+
+`indexing_pressure`::
+  Number of outstanding bytes that may be consumed by indexing requests. When
+  this limit is reached or exceeded, the node will reject new coordinating and
+  primary operations. When replica operations consume 1.5x this limit, the node
+  will reject new replica operations. Defaults to 10% of the heap.