kvserver: optimize voting replica placement in databases with region failure #59650

aayushshah15 · 2021-02-01T15:55:12Z

The zone config extensions introduced thus far achieve our broad goals of configuring a database to tolerate region or zone failure. We haven't yet introduced mechanisms to optimally decide the placement of the non-leaseholder voting replicas. Under region failure tolerance, all writes would incur cross-region replication latencies, since we cannot place a quorum of voting replicas under any single region. Thus, it matters where we place the non-leaseholder voting replicas, as that directly affects write latencies on the database.

The allocator currently lacks a way to place replicas based on their latencies to the leaseholder. We could consider adding a new zone config attribute which we tentatively call survivability and introducing a new latency-based heuristic to the allocator. The survivability attribute will initially only be allowed to take one value: "region", but could be extended to work with all types of locality tiers in the future as needed. With the survivability attribute set, the new latency-based heuristic is intended to have the effect of packing all the voting replicas in 3 regions, 1 being the primary region (with the leaseholder) and the 2 next closest regions.

The desire to "pack" voters in the regions closest to the leaseholder also notably makes the total number of replicas a dynamic value based on the physical state of the cluster. To see this, consider a 7 region cluster (call the regions A, B, C...G), and num_voters = 5. Say the primary region of the table is region A, and that the next two closest regions are B and C (in that order). Ideally, we'd want a 2-2-1 placement configuration (in regions A, B, C) for the 5 voting replicas. If voting replicas were to be placed in this manner, we'd like the 4 other regions to have a non-voting replica each. Thus, the total number of replicas would be 9 (5 voting and 4 non-voting). However, this requires that region A and B have at least 2 nodes each. What if those regions do not have enough nodes to make that possible? For instance, if each of the 7 regions in the cluster only had 1 node each, then the only possible configuration for the voting replicas would be 1-1-1-1-1 (1 voter in each of the 4 regions closest to the primary region A). Under this configuration, we would only need a total of 7 replicas (5 voting and 2 non-voting) in order to meet our goal of providing low latency follower reads from all regions. Thus, in addition to the above extensions, we plan on investigating the viability of letting the num_replicas field in the zone configs be set to some value like auto, and then letting the allocator dynamically figure out the number of replicas that are needed for fulfilling the specified constraints (specifically, requirement of having at least one replica per region for the sake of low latency follower reads from everywhere).

Jira issue: CRDB-3261

The text was updated successfully, but these errors were encountered:

aayushshah15 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Feb 1, 2021

aayushshah15 self-assigned this Feb 1, 2021

aayushshah15 added the A-kv-replication-constraints label Feb 1, 2021

exalate-issue-sync bot unassigned aayushshah15 Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: optimize voting replica placement in databases with region failure #59650

kvserver: optimize voting replica placement in databases with region failure #59650

aayushshah15 commented Feb 1, 2021 •

edited by cockroach-jira-scripts

Loading

kvserver: optimize voting replica placement in databases with region failure #59650

kvserver: optimize voting replica placement in databases with region failure #59650

Comments

aayushshah15 commented Feb 1, 2021 • edited by cockroach-jira-scripts Loading

aayushshah15 commented Feb 1, 2021 •

edited by cockroach-jira-scripts

Loading