allocator: avoid balancing around the mean #83493

irfansharif · 2022-06-28T03:04:30Z

Is your feature request related to a problem? Please describe.

CRDB's allocation strategy is broadly "for an equivalent set of stores, keep '# of batch requests' within X% of the mean". It's worth exploring whether that's a sensible goal for the system to have, especially if we consider allocation across different resource dimensions (#83490). What does it mean to keep CPU use nearly identical, especially for heterogenous hardware or regions? I understand that this approach is motivated by wanting as much headroom as possible in order to absorb a burst of activity before throttling (and/or until allocation kicks in again if integrated with resource-throttling), but we ought to evaluate how effective it is for its stated goal, measured perhaps by the how much burst of activity it makes room to absorb pre-throttling in real clusters, with the cost being the number of snapshot bytes transferred to keep things in balance. We could compare this to an idealized “lazy allocator” that only moves leases/replicas around after experiencing throttling of some form.

Realistically we're always going to have some form of "keep things balanced" to create reasonable amounts of headroom, but in terms of priorities, it's a secondary concern to maximizing good resource use by avoiding throttling (#83490), and could perhaps benefit from a separate implementation entirely given it's different goals.

Additional context

This is a (partially) speculative issue, one that we should engage with if/when we reconsidering the signals we use to allocate.

Jira issue: CRDB-17100

github-actions · 2024-02-12T11:04:21Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Jun 28, 2022

blathers-crl bot added the T-kv KV Team label Jun 28, 2022

github-actions bot added the no-issue-activity label Feb 12, 2024

github-actions bot added the X-stale label Feb 26, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2024

exalate-issue-sync bot closed this as completed Feb 26, 2024

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Closed in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allocator: avoid balancing around the mean #83493

allocator: avoid balancing around the mean #83493

irfansharif commented Jun 28, 2022 •

edited

Loading

github-actions bot commented Feb 12, 2024

allocator: avoid balancing around the mean #83493

allocator: avoid balancing around the mean #83493

Comments

irfansharif commented Jun 28, 2022 • edited Loading

github-actions bot commented Feb 12, 2024

irfansharif commented Jun 28, 2022 •

edited

Loading