Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexing pressure documentation #59456

Merged
merged 12 commits into from
Jul 21, 2020
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions docs/reference/cluster/nodes-stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ using metrics.
`http`::
HTTP connection information.

`indexing_pressure`::
Indexing pressure statistics about current and total indexing load and
indexing rejections.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

`indices`::
Indices stats about size, document count, indexing and deletion times,
search times, field cache size, merges and flushes.
Expand Down Expand Up @@ -2099,6 +2103,62 @@ Number of failed operations for the processor.
=======
======

[[cluster-nodes-stats-api-response-body-indexing-pressure]]
`indexing_pressure`::
(object)
Contains <<index-modules-indexing-pressure,indexing pressure>> statistics for the node.
+
.Properties of `indexing_pressure`
[%collapsible%open]
======
`total`::
(object)
Contains statistics for cumulative indexing load since the node started.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
+
.Properties of `<total>`
[%collapsible%open]
=======
`coordinating_and_primary_bytes`::
(integer)
Bytes consumed by indexing requests in the coordinating or primary stage.

`replica_bytes`::
(integer)
Bytes consumed by indexing requests in the replica stage.

`all_bytes`::
(integer)
Bytes consumed by indexing requests in the coordinating, primary, or replica stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are human readable versions of these stats available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not. My understanding is that human readable versions is disabled by default. So I think we can add human readable interpretations in 7.10? @ywelsch I assume it is too late to make this change after feature freeze since it is kind of a feature?


`coordinating_and_primary_memory_limit_rejections`::
(integer)
Rejections of indexing requests in the coordinating or primary stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Rejections of indexing requests in the coordinating or primary stage.
Number of indexing requests rejected in the coordinating or primary stage.


`replica_memory_limit_rejections`::
(integer)
Rejections of indexing requests in the replica stage.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
=======
`current`::
(object)
Contains statistics for current indexing load.
+
.Properties of `<current>`
[%collapsible%open]
=======
`coordinating_and_primary_bytes`::
(integer)
Bytes consumed by indexing requests in the coordinating or primary stage.

`replica_bytes`::
(integer)
Bytes consumed by indexing requests in the replica stage.

`all_bytes`::
(integer)
Bytes consumed by indexing requests in the coordinating, primary, or replica stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question re: human readable stats

=======
======

[[cluster-nodes-stats-api-response-body-adaptive-selection]]
`adaptive_selection`::
(object)
Expand Down
17 changes: 14 additions & 3 deletions docs/reference/docs/data-replication.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,15 @@ responsible for replicating the operation to the other copies.
This purpose of this section is to give a high level overview of the Elasticsearch replication model and discuss the implications
it has for various interactions between write and read operations.

[float]
[float,id="basic-write-model"]
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
==== Basic write model

Every indexing operation in Elasticsearch is first resolved to a replication group using <<index-routing,routing>>,
typically based on the document ID. Once the replication group has been determined,
the operation is forwarded internally to the current _primary shard_ of the group. The primary shard is responsible
typically based on the document ID. Once the replication group has been determined, the operation is forwarded
internally to the current _primary shard_ of the group. This stage of indexing is referred to as the coordinating
stage.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

The next stage of indexing is the primary stage which is performed on the primary shard. The primary shard is responsible
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
for validating the operation and forwarding it to the other replicas. Since replicas can be offline, the primary
is not required to replicate to all replicas. Instead, Elasticsearch maintains a list of shard copies that should
receive the operation. This list is called the _in-sync copies_ and is maintained by the master node. As the name implies,
Expand All @@ -42,6 +45,14 @@ The primary shard follows this basic flow:
. Once all replicas have successfully performed the operation and responded to the primary, the primary acknowledges the successful
completion of the request to the client.

Each in-sync replica copy performs the indexing operation locally so that it has a copy. This stage of indexing is the replication
stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each in-sync replica copy performs the indexing operation locally so that it has a copy. This stage of indexing is the replication
stage.
Each in-sync replica copy performs the indexing operation locally so that it has a copy. This stage of indexing is the _replica
stage_.


These indexing stages (coordinating, primary, and replication) are sequential. However, each upstream stage is inclusive of the
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
downstream stages. The coordinating stage is not complete until each primary stage (which might be spread out across different
primary shards) has completed. Each primary stage will not complete until the in-sync replicas have completed replication and
responded to the replication requests.

[float]
===== Failure handling

Expand Down
60 changes: 60 additions & 0 deletions docs/reference/index-modules/indexing-pressure.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
[[index-modules-indexing-pressure]]
== Indexing Pressure

Indexing documents into Elasticsearch introduces system load in the form of
memory and CPU load. Each indexing operation includes coordinating, primary, and
replica stages. These stages can be performed across multiple nodes in the
cluster. If too much indexing work is introduced into the system, the cluster
can become saturated. This can adversely impact other components of the system
such as searches, cluster coordination, background processing, etc.

Indexing pressure is primarily generated by external operations such as indexing
requests or internally by mechanisms such recoveries and cross-cluster
replication.

Elasticsearch internally monitors indexing load. When the load exceeds
certain limits, new indexing work will be rejected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-ordered this to try to improve the flow.

Suggested change
Indexing documents into Elasticsearch introduces system load in the form of
memory and CPU load. Each indexing operation includes coordinating, primary, and
replica stages. These stages can be performed across multiple nodes in the
cluster. If too much indexing work is introduced into the system, the cluster
can become saturated. This can adversely impact other components of the system
such as searches, cluster coordination, background processing, etc.
Indexing pressure is primarily generated by external operations such as indexing
requests or internally by mechanisms such recoveries and cross-cluster
replication.
Elasticsearch internally monitors indexing load. When the load exceeds
certain limits, new indexing work will be rejected.
Indexing documents into {es} introduces system load in the form of memory and
CPU load. Each indexing operation includes coordinating, primary, and replica
stages. These stages can be performed across multiple nodes in a cluster.
Indexing pressure can build up through external operations, such as indexing
requests, or internal mechanisms, such as recoveries and {ccr}. If too much
indexing work is introduced into the system, the cluster can become saturated.
This can adversely impact other operations, such as search, cluster
coordination, and background processing.
To prevent these issues, {es} internally monitors indexing load. When the load
exceeds certain limits, new indexing work is rejected.

[float]
=== Indexing Stages
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
ywelsch marked this conversation as resolved.
Show resolved Hide resolved

External indexing operations go through three stages: coordinating, primary, and
replication. This write model is explain <<basic-write-model,here>>.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

[float]
=== Memory Limits

Elasticsearch exposes a node setting `indexing_pressure.memory.limit` which
restricts the number of bytes for outstanding indexing requests. Be default,
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
this setting is configured to be 10% of the heap.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

At the beginning of each <<indexing stage,here>>, Elasticsearch accounts for the
bytes consumed by an indexing request. This accounting is only released at the
end of the indexing stage. This means that upstream stages will account for the
request overheard until all downstream stages are complete. For example, the
coordinating request will remain accounted for until primary and replication
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved
stages are complete. The primary request will remain accounted for until the
replication stage is complete.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

A node will start rejecting new indexing work at the coordinating or primary
stage when the number of outstanding coordinating, primary, and replica indexing
bytes are greater than the configured limit.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

A node will start rejecting new indexing work at the replication stage when the
number of outstanding replication indexing bytes are greater than 1.5x the
configured limit. This design means that as indexing pressure builds on nodes,
they will naturally stop accepting coordinating and primary work in favor of
outstanding replication work.

The default limit `indexing_pressure.memory.limit` (10%) is generously sized and
should only be modified after careful consideration. Only indexing requests
contribute to this limit meaning that there is additional indexing overhead
(buffers, listeners, etc) which also require heap space. Finally, other
components of Elasticsearch also require memory. Configuring this limit to be
too high can starve other system components of operating memory.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

[float]
=== Monitoring
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved

Indexing pressure metrics are exposed by the
<<cluster-nodes-stats-api-response-body-indexing-pressure,Node Stats API>>.
Tim-Brooks marked this conversation as resolved.
Show resolved Hide resolved