kvcoord: fail-fast when all replicas of a range are unavailable #74503
Labels
A-kv-client
Relating to the KV client and the KV interface.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
With #33007, when a range loses quorum, we will generally have SQL clients experience fail-fast behavior: access to the unavailable range will immediately result in an error, as opposed to hanging indefinitely (as is the case in 21.2 and before). However, when a range has lost all replicas (or if all replicas are unreachable) I believe that DistSender will keep retrying forever:
While we do try to be resilient to network blips, there is probably value in a heuristic where if a request has been attempted twice for each possible replica, it's time to give up.
We will want to return a
RangeUnavailableError
in this case (similar to #74500) and have similar SQL UX (#74502).Jira issue: CRDB-12121
The text was updated successfully, but these errors were encountered: