[BUG] Cluster fails to rebalance itself when skew_factor in NodeLoadAwareAllocationDecider is set to 0 #3497
Labels
bug
Something isn't working
pending backport
Identifies an issue or PR that still needs to be backported
Describe the bug
After a partial outage in a cluster with 3 availability zone setup and
skew_factor
set to0
, we expect that all shards get assigned again once the cluster recovers assuming cluster had indices withn
primary and1
replica. But we see transiently that shards remain unassigned even after cluster has recovered and all nodes are up.To Reproduce
Steps to reproduce the behavior:
cluster.routing.allocation.awareness.force.zone.values
and assign3
zones to itcluster.routing.allocation.load_awareness.skew_factor
and assign0.0
to ittest-index-1
with30 primaries
and1 replica
3 nodes
in a particular zonetest-index-2
with30 primaries
and1 replica
Expected behavior
After the partial zonal failure recovers, i.e. all the configured nodes are up and running then all the shards should have been assigned and cluster should be green.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: