-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: stats-based rebalancing can't handle large numbers of splits and scatters #17671
Comments
Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.
It'd be nice if the screenshots also included the "Keys Written per Second per Store" graph, but without that data to go on I expect that what's going on here is a mismatch of expectations between scatter and stats-based rebalancing. We balance on three dimensions -- replica count, fraction of disk used, and writes per second. We require a rebalance to improve two of those dimensions before we decide to do it, so if a cluster is well balanced in the latter two dimensions than an imbalanced range count won't motivate a rebalance. Doing a lot of splits only raises the replica count without affecting the fraction of disk used or writes per second, so it makes sense that the allocator doesn't see the need to rebalance afterwards. Stats-based rebalancing has its issues (such as #17691), but this is more of a case of the general-purpose stats-based scoring logic not working well for the specific use case of wanting to spread out a particular subset of ranges. |
@a-robinson This is easily reproducible on |
Weird, github never emailed me about your response here. Yes, I agree that it would be best if SCATTER used different thresholds/settings than normal rebalancing. Its goals are notably different from general rebalancing - it wants the ranges from the provided key space as spread out as possible in preparation for some incoming load on them. What I need to think about more, though, is whether there's something simple to do for 1.1, because I don't think an entirely different rebalancing strategy is in the cards. |
Right now the thresholds for stats-based rebalancing (and whether it is enabled) is done via cluster settings. Could we instead pass in an |
What if scatter simply plumbed "don't use stats" into the allocator? |
Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.
Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.
That would be better than the current behavior, but there will still be times when that won't do anything. For example, if node x has the lease for the range being split and scattered, but it has many fewer ranges on it than other nodes in the cluster (due to its ranges having more data and higher write throughput), then even switching to a non-stats-based mode might not move anything. Given that stats-based rebalancing will be disabled by default in 1.1 (#17968), I think we can push off a real fix to 1.2. |
@a-robinson Sorry, I couldn't quite understand what does the scatter mean here? I just can understand as you said |
The intent of the person calling |
So what we should do here is to decrease the thresholds for |
I think there are quite a few possible solutions here, and I'm not settled on any particular one. Disabling stats-based rebalancing when scattering would be one such solution :) |
@a-robinson I just test it manually.
I start four nodes with command
and the I run the split command to split my test table to 20000 ranges: then I run
I found the replica starts to spread through the cluster. Then I found in the system table
So it seems the PR in #18426 starts to work. So I wonder how did the setting work in CockroachDB? |
@a6802739 the reason that worked is because you actually changed the cluster-wide setting for |
Discovered in #17644. Here's a RESTORE with stats-based rebalancing enabled:
And without:
The text was updated successfully, but these errors were encountered: