Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions #5386

Open
RS146BIJAY opened this issue Nov 28, 2022 · 5 comments
Assignees
Labels
bug Something isn't working discuss Issues intended to help drive brainstorming and decision making distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@RS146BIJAY
Copy link
Contributor

Describe the bug
OpenSearch relocates shards away from the node on which it has breached high disk watermark. While selecting the target node for a relocating shard, DiskThresholdDecider considers that allocating this shard on a node will not bring the target node above the high watermark. Since relocation can happen concurrently, it is possible that DiskThresholdDecider selects the same node as the target node for multiple relocations. Here, it is possible that even though individual checks for shard relocation decision can pass, relocating all the shards (across concurrent relocations) can cause free disk space on the target node to become 0.

Expected behavior
While selecting the target node, DiskThresholdDecider should consider shard size of all the shards that are migrated to this node, not just the current relocating shard.

@RS146BIJAY RS146BIJAY added bug Something isn't working untriaged labels Nov 28, 2022
@adnapibar
Copy link
Contributor

@RS146BIJAY Thanks for reporting the issue, can you please provide details on how to reproduce this or at least a test for this scenario?

@andrross andrross added enhancement Enhancement or improvement to existing feature or request and removed untriaged labels Nov 29, 2022
@jayeshathila
Copy link
Contributor

As per my understanding of the issue , we are not failing fast before starting the allocation in case of disk space breach.

If no one has started on the issue, I can pick this one.

@RS146BIJAY
Copy link
Contributor Author

@jayeshathila we are already working on this fix.

@RS146BIJAY RS146BIJAY changed the title DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions [BUG] DiskThresholdDecider should consider shard size of all relocating shards on the target node while making relocation decisions Jan 5, 2023
@shwetathareja
Copy link
Member

Since relocation can happen concurrently, it is possible that DiskThresholdDecider selects the same node as the target node for multiple relocations. Here, it is possible that even though individual checks for shard relocation decision can pass, relocating all the shards (across concurrent relocations) can cause free disk space on the target node to become 0.

@RS146BIJAY In The DiskThresholdDecider, it iterates over all the relocating shards which are in "INITIALIZING" state so it does handle multiple recoveries in single reroute operation.

final List<ShardRouting> initializingShards = node.shardsWithState(ShardRoutingState.INITIALIZING);

Can you expand when you say concurrent relocation? Any change in the shard allocation/ movement is executed by the active leader which processes them sequentially so there will never be concurrent relocations triggered from multiple threads.

Now there could be issues with

  1. estimation logic for expected shard size during recovery
  2. shards which are taking active write, their size can change during recovery itself.

@RS146BIJAY
Copy link
Contributor Author

@shwetathareja yeah we crossed check this and validated that parallel relocation may not be possible. We have kept this issue on hold as of now. We will revisit this issue again once we have a bit more data points around what may have caused the issue on the affected domain.

@shwetathareja shwetathareja added the discuss Issues intended to help drive brainstorming and decision making label Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discuss Issues intended to help drive brainstorming and decision making distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

No branches or pull requests

7 participants