Avoid overshooting watermarks during relocation #46128

DaveCTurner · 2019-08-29T12:37:08Z

Today the DiskThresholdDecider attempts to account for already-relocating
shards when deciding how to allocate or relocate a shard. Its goal is to stop
relocating shards onto a node before that node exceeds the low watermark, and
to stop relocating shards away from a node as soon as the node drops below the
high watermark.

The decider handles multiple data paths by only accounting for relocating
shards that affect the appropriate data path. However, this mechanism does not
correctly account for new relocating shards, which are unwittingly ignored.
This means that we may evict far too many shards from a node above the high
watermark, and may relocate far too many shards onto a node causing it to blow
right past the low watermark and potentially other watermarks too.

There are in fact two distinct issues that this PR fixes. New incoming shards
have an unknown data path until the ClusterInfoService refreshes its
statistics. New outgoing shards have a known data path, but we fail to account
for the change of the corresponding ShardRouting from STARTED to
RELOCATING, meaning that we fail to find the correct data path and treat the
path as unknown here too.

This PR also reworks the MockDiskUsagesIT test to avoid using fake data paths
for all shards. With the changes here, the data paths are handled in tests as
they are in production, except that their sizes are fake.

Fixes #45177
Backport of #46079

Today the `DiskThresholdDecider` attempts to account for already-relocating shards when deciding how to allocate or relocate a shard. Its goal is to stop relocating shards onto a node before that node exceeds the low watermark, and to stop relocating shards away from a node as soon as the node drops below the high watermark. The decider handles multiple data paths by only accounting for relocating shards that affect the appropriate data path. However, this mechanism does not correctly account for _new_ relocating shards, which are unwittingly ignored. This means that we may evict far too many shards from a node above the high watermark, and may relocate far too many shards onto a node causing it to blow right past the low watermark and potentially other watermarks too. There are in fact two distinct issues that this PR fixes. New incoming shards have an unknown data path until the `ClusterInfoService` refreshes its statistics. New outgoing shards have a known data path, but we fail to account for the change of the corresponding `ShardRouting` from `STARTED` to `RELOCATING`, meaning that we fail to find the correct data path and treat the path as unknown here too. This PR also reworks the `MockDiskUsagesIT` test to avoid using fake data paths for all shards. With the changes here, the data paths are handled in tests as they are in production, except that their sizes are fake. Fixes elastic#45177 Backport of elastic#46079

elasticmachine · 2019-08-29T12:37:10Z

Pinging @elastic/es-distributed

DaveCTurner · 2019-08-29T12:39:59Z

Just™ a backport, no need for a review.

DaveCTurner added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) backport v6.8.3 labels Aug 29, 2019

tyvm

025ec79

DaveCTurner merged commit 0c48c0e into elastic:6.8 Aug 29, 2019

DaveCTurner deleted the 2019-08-29-disk-based-allocator-overshoot-6.8 branch August 29, 2019 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid overshooting watermarks during relocation #46128

Avoid overshooting watermarks during relocation #46128

DaveCTurner commented Aug 29, 2019

elasticmachine commented Aug 29, 2019

DaveCTurner commented Aug 29, 2019

Avoid overshooting watermarks during relocation #46128

Avoid overshooting watermarks during relocation #46128

Conversation

DaveCTurner commented Aug 29, 2019

elasticmachine commented Aug 29, 2019

DaveCTurner commented Aug 29, 2019