-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: stop miscomputing equivalence classes during replica rebalancing #73597
kvserver: stop miscomputing equivalence classes during replica rebalancing #73597
Conversation
8826737
to
dcde60c
Compare
First commit from #73614 |
8db1993
to
1544b9a
Compare
@nvanbenschoten I've removed the commit that strips node ids out like we discussed. This PR should now be ready for a look. |
1544b9a
to
8050265
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 3 of 3 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/allocator_scorer.go, line 915 at r6 (raw file):
diversityScore: diversityScore, } if !cand.less(existing) && cand.store.StoreID != existing.store.StoreID {
Should we wait until we get all the way here in this loop before continuing? Would it make more sense to move this up to the top of the loop?
pkg/kv/kvserver/allocator_scorer.go, line 967 at r6 (raw file):
// replicas in terms of diversity and disk fullness, check whether the // existing replicas' stats are divergent enough to justify a rebalance. if len(sl.stores) > 0 && options.shouldRebalanceBasedOnThresholds(ctx, existing.store, sl) {
When do we expect to hit the len(sl.stores) > 0
case? Is that when we don't have an equivalence class for an existing store?
8050265
to
5bccf47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/allocator_scorer.go, line 915 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we wait until we get all the way here in this loop before continuing? Would it make more sense to move this up to the top of the loop?
moved up.
pkg/kv/kvserver/allocator_scorer.go, line 967 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
When do we expect to hit the
len(sl.stores) > 0
case? Is that when we don't have an equivalence class for an existing store?
we'd expect len(candidateSL)
> 0 if there are any replacement candidates for the existing store that are at least as good in terms of diversity. Otherwise, we expect that slice to be empty. I moved the length check inside the shouldRebalanceBasedOnThresholds
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
The existing nomenclature was confusing and was making it harder for cockroachdb#72296 to explain its approach. Release note: None
Previously, while computing the stores that belong inside the same equivalence class, we were incorrectly including an existing store in the list of candidate stores. For instance, an equivalence class that should look like the following: ``` eqClass: existing: [n1, n2] candidates: [n3, n4] ``` was instead being computed as the following: ``` eqClass: existing: [n1, n2] candidates: [n1, n3, n4] ``` This was throwing things off in the logic implemented by cockroachdb#72296 Release note: None
5bccf47
to
c5f374a
Compare
TFTR bors r+ |
Build succeeded: |
// Only process replacement candidates, not existing stores. | ||
if store.StoreID == existing.store.StoreID { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the after-merge comments, why not filter full existingStores
in here~? (I'm just learning cockroachdb code, maybe not right - - ~ thanks
For example: existing
is n1 ,existingStores
is [n1, n2, n3], equivalenceClasses.candidates
still have chance to be [n2, n3]
if we only filter existing
here?
although line 1021 https://github.com/cockroachdb/cockroach/pull/73597/files#diff-272ccd4799645a1875c052c6cdafe1192be7d7be30035475226f679d79465662R1021 can filter those equivalenceClass
But may this cause the maybe valuable "n1 -> [n4]" rebalanceOption doesn't be considered?(because n1->[n2,n3] is more "better" in this loop but be filter in later loop) 🤔
kvserver: fix bug with computing equivalence class
Previously, while computing the stores that belong inside the same equivalence
class, we were incorrectly including an existing store in the list of candidate
stores.
For instance, an equivalence class that should look like the following:
was instead being computed as the following:
This was throwing things off in the logic implemented by #72296
Release note: None