Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: optimize voting replica placement in databases with region failure #59650

Open
aayushshah15 opened this issue Feb 1, 2021 · 0 comments
Labels
A-kv-replication-constraints C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@aayushshah15
Copy link
Contributor

aayushshah15 commented Feb 1, 2021

The zone config extensions introduced thus far achieve our broad goals of configuring a database to tolerate region or zone failure. We haven't yet introduced mechanisms to optimally decide the placement of the non-leaseholder voting replicas. Under region failure tolerance, all writes would incur cross-region replication latencies, since we cannot place a quorum of voting replicas under any single region. Thus, it matters where we place the non-leaseholder voting replicas, as that directly affects write latencies on the database.

The allocator currently lacks a way to place replicas based on their latencies to the leaseholder. We could consider adding a new zone config attribute which we tentatively call survivability and introducing a new latency-based heuristic to the allocator. The survivability attribute will initially only be allowed to take one value: "region", but could be extended to work with all types of locality tiers in the future as needed. With the survivability attribute set, the new latency-based heuristic is intended to have the effect of packing all the voting replicas in 3 regions, 1 being the primary region (with the leaseholder) and the 2 next closest regions.

The desire to "pack" voters in the regions closest to the leaseholder also notably makes the total number of replicas a dynamic value based on the physical state of the cluster. To see this, consider a 7 region cluster (call the regions A, B, C...G), and num_voters = 5. Say the primary region of the table is region A, and that the next two closest regions are B and C (in that order). Ideally, we'd want a 2-2-1 placement configuration (in regions A, B, C) for the 5 voting replicas. If voting replicas were to be placed in this manner, we'd like the 4 other regions to have a non-voting replica each. Thus, the total number of replicas would be 9 (5 voting and 4 non-voting). However, this requires that region A and B have at least 2 nodes each. What if those regions do not have enough nodes to make that possible? For instance, if each of the 7 regions in the cluster only had 1 node each, then the only possible configuration for the voting replicas would be 1-1-1-1-1 (1 voter in each of the 4 regions closest to the primary region A). Under this configuration, we would only need a total of 7 replicas (5 voting and 2 non-voting) in order to meet our goal of providing low latency follower reads from all regions. Thus, in addition to the above extensions, we plan on investigating the viability of letting the num_replicas field in the zone configs be set to some value like auto, and then letting the allocator dynamically figure out the number of replicas that are needed for fulfilling the specified constraints (specifically, requirement of having at least one replica per region for the sake of low latency follower reads from everywhere).

Jira issue: CRDB-3261

@aayushshah15 aayushshah15 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Feb 1, 2021
@aayushshah15 aayushshah15 self-assigned this Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication-constraints C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

1 participant