-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: refactor rebalanceCandidates selection to do less comparing across localities #20751
Comments
On the other hand, sometimes we do want to compare across localities. if there are four localities, and A and B have 2 nodes each and C and D have 4, we'd want things to adjust so that each node is equally full even across localities (in the absence of other constraints). |
@a-robinson , could I have a try? |
Yeah, that's where something like the sets of "equivalent" localities that I mentioned would have to come into play. For example, if a range had replicas in localities A, B, and C, and didn't in D, we might use equivalence classes {A, D}, {B, D}, {C, D} when deciding whether a rebalance would be beneficial.
Of course! If you want to talk more about the problem at all, just let me know. |
@a-robinson if we use And what do you mean Thanks! |
This is the core work required for cockroachdb#19985 (but without exposing it to users yet), and fixes cockroachdb#20751 along the way. Release note: None
This is the core work required for cockroachdb#19985 (but without exposing it to users yet), and fixes cockroachdb#20751 along the way. Release note: None
This is the core work required for cockroachdb#19985 (but without exposing it to users yet), and fixes cockroachdb#20751 along the way. Release note: None
This is the core work required for cockroachdb#19985 (but without exposing it to users yet), and fixes cockroachdb#20751 along the way. Release note: None
This creates a new allowable format for zone config constraints, which lets the user apply different sets of constraints to different numbers of replicas within a zone. See the included tests for what the new format looks like. Fixes cockroachdb#19985 It also improves the handling of localities that are more or less full than one another for some reason outside the system's control (e.g. if one has more nodes than the others or if some constraints have required more ranges to be in one). In such cases, this does a better job of balancing ranges amongst the nodes that are more or less full because we compare them amongst each other now rather than comparing them to the StoreList averages of all the stores in the cluster. Fixes cockroachdb#20751 This also deprecates positive (non-required, non-prohibited) constraints. New positive constraints cannot be set, and existing positive constraints will be ignored. Release note (cli change): Replication zone constraints can now be specified on a per-replica basis, meaning you can configure some replicas in a zone's ranges to follow one set of constraints and other replicas to follow other constraints. Release note (backwards-incompatible change): Positive constraints in replication zone configs no longer work. They have never been documented or officially supported, but will no longer be allowed at all. Any existing positive constraints will be ignored.
This creates a new allowable format for zone config constraints, which lets the user apply different sets of constraints to different numbers of replicas within a zone. See the included tests for what the new format looks like. Fixes cockroachdb#19985 It also improves the handling of localities that are more or less full than one another for some reason outside the system's control (e.g. if one has more nodes than the others or if some constraints have required more ranges to be in one). In such cases, this does a better job of balancing ranges amongst the nodes that are more or less full because we compare them amongst each other now rather than comparing them to the StoreList averages of all the stores in the cluster. Fixes cockroachdb#20751 This also deprecates positive (non-required, non-prohibited) constraints. New positive constraints cannot be set, and existing positive constraints will be ignored. Release note (cli change): Replication zone constraints can now be specified on a per-replica basis, meaning you can configure some replicas in a zone's ranges to follow one set of constraints and other replicas to follow other constraints. Release note (backwards-incompatible change): Positive constraints in replication zone configs no longer work. They have never been documented or officially supported, but will no longer be allowed at all. Any existing positive constraints will be ignored.
When deciding whether or not to rebalance from one store to another, our comparisons of which stores are too full or too empty don't do a great job of considering localities and diversity. We construct a single list of all non-dead stores in the cluster, then compute various averages based on those stores to use in picking rebalance candidates.
However, this can be very unfair if the number of nodes in each locality is very different. For example, if localities A and B have 2 nodes each and locality C has 4 nodes, we'd expect the nodes in locality C to be half as full as the nodes in A and B. It's inaccurate to directly compare them, and can lead to situations where we're too eager to rebalance between nodes in C and not very willing to rebalance between nodes within A or B.
#20241 is a great reproduction of this. I've been fixing the replica thrashing in that by extending #18364 and will send out that change soon, but it still leaves the issue of us being less willing to rebalance between nodes in the smaller localities. Thinking about the problem has made clear that the current structure of the code isn't really aligned with the problem when localities are in play. Rather than one large
StoreList
containing all the stores, I think we want something like separateStoreList
s per locality (or set of "equivalent" localities) so that we don't label all stores in A and B as overfull and all stores in C as underfull.@cuongdo - this could make for a good introduction to the allocator code for someone - it should take on the order of a week or so. It'd be really nice to get into 2.0 given the focus on global deployments, but isn't absolutely necessary.
The text was updated successfully, but these errors were encountered: