-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Avoid replica thrashing when localities are different sizes #20752
Conversation
Release note: None
Release note: None
fee64bc
to
ff35505
Compare
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3. pkg/storage/allocator.go, line 540 at r3 (raw file):
This could use a comment. When are pkg/storage/allocator.go, line 582 at r4 (raw file):
This name isn't terribly descriptive, but I can't think of a better name. A comment describing what it does would be helpful. Comments from Reviewable |
Reviewed 4 of 4 files at r4. pkg/storage/allocator.go, line 582 at r4 (raw file): Previously, petermattis (Peter Mattis) wrote…
pkg/storage/allocator.go, line 591 at r4 (raw file):
Shouldn't equal store IDs be caught earlier in the process? This is the "can't have two replicas per store (or even per node)" rule, not "we'd immediately remove it". Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/storage/allocator_scorer.go
Outdated
// balanceScore ties and it's a workable stop-gap on the way to something | ||
// like #20751. | ||
avgRangeCount := float64(c.rangeCount+o.rangeCount) / 2.0 | ||
overfullThreshold := math.Max(overfullRangeThreshold(options, avgRangeCount), avgRangeCount+1.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain the definition of overfullThreshold
here? Thank you very much.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3. Comments from Reviewable |
Skipping the simulation when raftStatus.Progress is nil can make for undesirable thrashing of replicas, as seen when testing cockroachdb#20241. It's better to run the simulation without properly filtering replicas than to not run it at all. Release note: None
Fixes cockroachdb#20241 Release note (bug fix): avoid rebalance thrashing when localities have very different numbers of nodes
ff35505
to
33e1553
Compare
TFTRs! Review status: 0 of 4 files reviewed at latest revision, 4 unresolved discussions. pkg/storage/allocator.go, line 540 at r3 (raw file): Previously, petermattis (Peter Mattis) wrote…
I saw Added a comment, either way. pkg/storage/allocator.go, line 582 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Changed to pkg/storage/allocator.go, line 591 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
My bad variable naming confused you. The variable previously named pkg/storage/allocator_scorer.go, line 167 at r4 (raw file): Previously, a6802739 (songhao) wrote…
Added a comment. Comments from Reviewable |
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. pkg/storage/allocator.go, line 540 at r3 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Yes, that sounds like leader-not-leaseholder. Progress is only present on the leader. Comments from Reviewable |
Fixes #20241
Release note (bug fix): avoid rebalance thrashing when localities have
very different numbers of nodes
This could use some careful review. A more comprehensive refactoring as described in #20751 would be nice, but this fixes the immediate problem as seen in #20241.