Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: stop miscomputing equivalence classes during replica rebalancing #73597

Merged

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Dec 8, 2021

kvserver: fix bug with computing equivalence class

Previously, while computing the stores that belong inside the same equivalence
class, we were incorrectly including an existing store in the list of candidate
stores.

For instance, an equivalence class that should look like the following:

eqClass:
	existing: [n1, n2]
	candidates: [n3, n4]

was instead being computed as the following:

eqClass:
	existing: [n1, n2]
	candidates: [n1, n3, n4]

This was throwing things off in the logic implemented by #72296

Release note: None

@aayushshah15 aayushshah15 requested a review from a team as a code owner December 8, 2021 08:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

First commit from #73614

@aayushshah15 aayushshah15 force-pushed the 20211208_fixEquivalenceClassComputation branch 3 times, most recently from 8db1993 to 1544b9a Compare December 10, 2021 03:48
@aayushshah15
Copy link
Contributor Author

@nvanbenschoten I've removed the commit that strips node ids out like we discussed. This PR should now be ready for a look.

@aayushshah15 aayushshah15 force-pushed the 20211208_fixEquivalenceClassComputation branch from 1544b9a to 8050265 Compare December 13, 2021 20:45
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 3 of 3 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/allocator_scorer.go, line 915 at r6 (raw file):

				diversityScore: diversityScore,
			}
			if !cand.less(existing) && cand.store.StoreID != existing.store.StoreID {

Should we wait until we get all the way here in this loop before continuing? Would it make more sense to move this up to the top of the loop?


pkg/kv/kvserver/allocator_scorer.go, line 967 at r6 (raw file):

			// replicas in terms of diversity and disk fullness, check whether the
			// existing replicas' stats are divergent enough to justify a rebalance.
			if len(sl.stores) > 0 && options.shouldRebalanceBasedOnThresholds(ctx, existing.store, sl) {

When do we expect to hit the len(sl.stores) > 0 case? Is that when we don't have an equivalence class for an existing store?

@aayushshah15 aayushshah15 force-pushed the 20211208_fixEquivalenceClassComputation branch from 8050265 to 5bccf47 Compare December 16, 2021 21:57
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/allocator_scorer.go, line 915 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we wait until we get all the way here in this loop before continuing? Would it make more sense to move this up to the top of the loop?

moved up.


pkg/kv/kvserver/allocator_scorer.go, line 967 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

When do we expect to hit the len(sl.stores) > 0 case? Is that when we don't have an equivalence class for an existing store?

we'd expect len(candidateSL) > 0 if there are any replacement candidates for the existing store that are at least as good in terms of diversity. Otherwise, we expect that slice to be empty. I moved the length check inside the shouldRebalanceBasedOnThresholds methods.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

The existing nomenclature was confusing and was making it harder for cockroachdb#72296
to explain its approach.

Release note: None
Previously, while computing the stores that belong inside the same equivalence
class, we were incorrectly including an existing store in the list of candidate
stores.

For instance, an equivalence class that should look like the following:
```
eqClass:
	existing: [n1, n2]
	candidates: [n3, n4]
```
was instead being computed as the following:
```
eqClass:
	existing: [n1, n2]
	candidates: [n1, n3, n4]
```

This was throwing things off in the logic implemented by cockroachdb#72296

Release note: None
@aayushshah15 aayushshah15 force-pushed the 20211208_fixEquivalenceClassComputation branch from 5bccf47 to c5f374a Compare December 17, 2021 22:21
@aayushshah15
Copy link
Contributor Author

TFTR

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 17, 2021

Build succeeded:

@craig craig bot merged commit eae333b into cockroachdb:master Dec 17, 2021
// Only process replacement candidates, not existing stores.
if store.StoreID == existing.store.StoreID {
continue
}
Copy link

@moonsphere moonsphere Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the after-merge comments, why not filter full existingStores in here~? (I'm just learning cockroachdb code, maybe not right - - ~ thanks

For example: existing is n1 ,existingStores is [n1, n2, n3], equivalenceClasses.candidates still have chance to be [n2, n3] if we only filter existing here?

although line 1021 https://github.com/cockroachdb/cockroach/pull/73597/files#diff-272ccd4799645a1875c052c6cdafe1192be7d7be30035475226f679d79465662R1021 can filter those equivalenceClass

But may this cause the maybe valuable "n1 -> [n4]" rebalanceOption doesn't be considered?(because n1->[n2,n3] is more "better" in this loop but be filter in later loop) 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants