-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve pending indexing metrics and back pressure #59263
Labels
:Distributed Indexing/CRUD
A catch all label for issues around indexing, updating and getting a doc by id. Not search.
>enhancement
Meta
release highlight
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Comments
Tim-Brooks
added
>enhancement
:Distributed Indexing/CRUD
A catch all label for issues around indexing, updating and getting a doc by id. Not search.
labels
Jul 8, 2020
Pinging @elastic/es-distributed (:Distributed/CRUD) |
elasticmachine
added
the
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
label
Jul 8, 2020
Tim-Brooks
changed the title
Improve Indexing Back pressure
Improve pending indexing metrics and back pressure
Jul 8, 2020
Tim-Brooks
added a commit
that referenced
this issue
Jul 14, 2020
This commit increases the default write queue size to 10000. This is to allow a greater number of pending indexing requests. This work is safe as we have added additional memory limits. Relates to #59263.
Tim-Brooks
added a commit
to Tim-Brooks/elasticsearch
that referenced
this issue
Jul 14, 2020
This commit increases the default write queue size to 10000. This is to allow a greater number of pending indexing requests. This work is safe as we have added additional memory limits. Relates to elastic#59263.
This was referenced Jul 14, 2020
Tim-Brooks
added a commit
that referenced
this issue
Jul 15, 2020
This commit increases the default write queue size to 10000. This is to allow a greater number of pending indexing requests. This work is safe as we have added additional memory limits. Relates to #59263.
Tim-Brooks
added a commit
to Tim-Brooks/elasticsearch
that referenced
this issue
Aug 13, 2020
This is related to elastic#59263.
Hi! I'm curious if the related CPU-based enhancement ever landed in 7.10?
|
Hi @tbrooks8, could I know that why do we set |
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Indexing/CRUD
A catch all label for issues around indexing, updating and getting a doc by id. Not search.
>enhancement
Meta
release highlight
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Currently indexing back pressure is limited to the size of the write queue. This does not effectively reflect the amount of outstanding indexing work for a node. We would like to add new mechanisms which better reflect the amount of outstanding work.
Target 7.9
Indexing metrics and back pressure
In 7.9 we are adding metrics about the number of indexing request bytes outstanding at each point in the indexing process (coordinating, primary, and replication). These metrics will be exposed in the node stats API. Additionally, we will introduce a new setting
indexing_pressure.memory.limit
which allows a maximum number of bytes to be outstanding. This setting will be 10% of the heap by default. Once 10% of a node's heap is consumed by outstanding indexing bytes, we will start rejecting new coordinating and primary requests.Additionally, since a failed replication operation can fail a replica, we will assign 1.5X limit for the number of replication bytes. Additionally, only replication bytes can trigger this limit. So if replication bytes increase to high levels, the node will stop accepting new coordinating and primary operations until the replication work load has dropped.
WriteMemoryLimits
#58885)7.9 Node stats API with human readable enabled
Replication Retries
In order to mitigate the potential of transient disruptions failing a replica, we will enable replication retries at the primary level. When an operation fails because of connection error, circuit breaking, rejected, etc we the primary will retry until the new timeout setting is exhausted (
indices.replication.retry_timeout
).Target 7.10
The text was updated successfully, but these errors were encountered: