[BUG] DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions #5386

RS146BIJAY · 2022-11-28T08:00:19Z

Describe the bug
OpenSearch relocates shards away from the node on which it has breached high disk watermark. While selecting the target node for a relocating shard, DiskThresholdDecider considers that allocating this shard on a node will not bring the target node above the high watermark. Since relocation can happen concurrently, it is possible that DiskThresholdDecider selects the same node as the target node for multiple relocations. Here, it is possible that even though individual checks for shard relocation decision can pass, relocating all the shards (across concurrent relocations) can cause free disk space on the target node to become 0.

Expected behavior
While selecting the target node, DiskThresholdDecider should consider shard size of all the shards that are migrated to this node, not just the current relocating shard.

adnapibar · 2022-11-29T16:47:20Z

@RS146BIJAY Thanks for reporting the issue, can you please provide details on how to reproduce this or at least a test for this scenario?

jayeshathila · 2022-12-01T09:45:42Z

As per my understanding of the issue , we are not failing fast before starting the allocation in case of disk space breach.

If no one has started on the issue, I can pick this one.

RS146BIJAY · 2022-12-01T10:34:34Z

@jayeshathila we are already working on this fix.

shwetathareja · 2023-06-13T08:53:53Z

Since relocation can happen concurrently, it is possible that DiskThresholdDecider selects the same node as the target node for multiple relocations. Here, it is possible that even though individual checks for shard relocation decision can pass, relocating all the shards (across concurrent relocations) can cause free disk space on the target node to become 0.

@RS146BIJAY In The DiskThresholdDecider, it iterates over all the relocating shards which are in "INITIALIZING" state so it does handle multiple recoveries in single reroute operation.

OpenSearch/server/src/main/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDecider.java

Line 133 in 4b4d84e

    
           final List<ShardRouting> initializingShards = node.shardsWithState(ShardRoutingState.INITIALIZING);

Can you expand when you say concurrent relocation? Any change in the shard allocation/ movement is executed by the active leader which processes them sequentially so there will never be concurrent relocations triggered from multiple threads.

Now there could be issues with

estimation logic for expected shard size during recovery
shards which are taking active write, their size can change during recovery itself.

RS146BIJAY · 2023-06-13T09:58:06Z

@shwetathareja yeah we crossed check this and validated that parallel relocation may not be possible. We have kept this issue on hold as of now. We will revisit this issue again once we have a bit more data points around what may have caused the issue on the affected domain.

RS146BIJAY added bug Something isn't working untriaged labels Nov 28, 2022

minalsha added the distributed framework label Nov 29, 2022

andrross added enhancement Enhancement or improvement to existing feature or request and removed untriaged labels Nov 29, 2022

Bukhtawar assigned RS146BIJAY Dec 13, 2022

RS146BIJAY mentioned this issue Jan 5, 2023

[META] Insufficient guardrails leading to disk going full on nodes #5712

Open

8 tasks

anasalkouz added Migration:Backlog and removed Migration:Backlog labels Mar 17, 2023

shwetathareja added the discuss Issues intended to help drive brainstorming and decision making label Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions #5386

[BUG] DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions #5386

RS146BIJAY commented Nov 28, 2022

adnapibar commented Nov 29, 2022

jayeshathila commented Dec 1, 2022

RS146BIJAY commented Dec 1, 2022

shwetathareja commented Jun 13, 2023

RS146BIJAY commented Jun 13, 2023

[BUG] DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions #5386

[BUG] DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions #5386

Comments

RS146BIJAY commented Nov 28, 2022

adnapibar commented Nov 29, 2022

jayeshathila commented Dec 1, 2022

RS146BIJAY commented Dec 1, 2022

shwetathareja commented Jun 13, 2023

RS146BIJAY commented Jun 13, 2023