kv: ranges don't get upreplicated, despite other nodes being around #47620
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
A-kv-distribution
Relating to rebalancing and leasing.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Saw the following take place out in wild.
The cluster had lost a node, and the operator had added different one to it. We had then replicated all ranges except for about 4 of them. 3 of them continued to stay under-replicated when checking in about two weeks later. What I suspect is happening here is that we're not upreplicating quiesced ranges (the fourth range may have seen some activity in the interim causing it to upreplicate).
Somewhat surprisingly, and perhaps orthogonal to this issue, running the under-replicated range through our enqueue range queues didn't actually do anything. Are we simply ignore quiesced ranges there? I don't think we should be.
We should sanity check what our behavior is here, we end up in a pretty fragile state running the scenario above.
(The operator was running v19.2.3)
The text was updated successfully, but these errors were encountered: