-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocator: select a good enough store for decom/recovery #86267
allocator: select a good enough store for decom/recovery #86267
Conversation
2ccbb7f
to
3c30e03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if you have run any benchmarks for the change - I think we should validate in in roachprod but otherwise LGTM.
Reviewed 3 of 3 files at r1, 6 of 7 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @AlexTalks, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer_test.go
line 230 at r2 (raw file):
allocRand := makeAllocatorRand(rand.NewSource(0)) for ii, tc := range testCases { t.Logf("ii=%d", ii)
What thist.LogF
is for?
3c30e03
to
16d4269
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, see the issue #86265
thanks!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @AlexTalks, and @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer_test.go
line 230 at r2 (raw file):
Previously, kvoli (Austen) wrote…
What this
t.LogF
is for?
oops, leftovers.
removed.
Nice results! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some comments about the setting name but this looks overall really good to me!
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @AlexTalks, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 98 at r3 (raw file):
var enableRecoverToGoodEnoughStores = settings.RegisterBoolSetting( settings.SystemOnly, "kv.allocator.recover_to_good_enough_stores.enabled",
nit: does this setting (and name) make sense? perhaps we should have a setting that is something like:
kv.allocator.recovery_store_selector
where the options are BEST, GOOD, ANY
or something? I worry about the "good enough" terminology a bit, but if this makes sense to all I won't block on this of course.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 897 at r3 (raw file):
// selectWorst randomly chooses one of the worst candidate stores from a sorted // (by score reversed) candidate list using the provided random generator. func (cl candidateList) selectWorst(randGen allocatorRand) *candidate {
FYI - just curious, what is this used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 714 at r3 (raw file):
nil, nil, Dead, // Dead and Decommissioning should behave the same here
nit: can we randomly assign Dead
or Decommissioning
here to assert that they do indeed behave the same (just to prevent future regressions here).
In preparation for adding a new selection function for a good enough candidate, rename the existing "good" to "best". Release note: None
Until now, when decommissioning a node, or when recovering from a dead node, the allocator tries to pick one of the best possible stores as the target for the recovery. Because of that, we sometimes see multiple stores recover replicas to the same store, for example, when decommissioning a node and at the same time adding a new node. This PR changes the way we select a destination store by choosing a random store out of all the stores that are "good enough" for the replica. The risk diversity is still enforced, but we may recover a replica to a store that is considered "over full", for example. Note that during upreplication the allocator will still try to use one of the "best" stores as targets. Fixes: cockroachdb#86265 Release note: None Release justification: a relatively small change, and it can be reverted by setting kv.allocator.recovery_store_selector=best.
16d4269
to
3ce0fd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np at all, thanks both! PTAL.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 98 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: does this setting (and name) make sense? perhaps we should have a setting that is something like:
kv.allocator.recovery_store_selector
where the options areBEST, GOOD, ANY
or something? I worry about the "good enough" terminology a bit, but if this makes sense to all I won't block on this of course.
Done.
I'm not worried about naming here because this setting should not be used (unless we really broke something), and I'm not sure we will have other strategies for selecting a destination store so I thought a bool is good enough.. but either way works (or, both names are good enough 😄 )
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 897 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
FYI - just curious, what is this used for?
when removing a replica we want to remove the worst.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go
line 714 at r3 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
nit: can we randomly assign
Dead
orDecommissioning
here to assert that they do indeed behave the same (just to prevent future regressions here).
Done (alternate the 2 instead of random).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 98 at r3 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
Done.
I'm not worried about naming here because this setting should not be used (unless we really broke something), and I'm not sure we will have other strategies for selecting a destination store so I thought a bool is good enough.. but either way works (or, both names are good enough 😄 )
LGTM
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 897 at r3 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
when removing a replica we want to remove the worst.
OK, got it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the review!
bors r+
Reviewable status: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Until now, when decommissioning a node, or when recovering from a dead
node, the allocator tries to pick one of the best possible stores as
the target for the recovery.
Because of that, we sometimes see multiple stores recover replicas
to the same store, for example, when decommissioning a node and
at the same time adding a new node.
This PR changes the way we select a destination store by choosing
a random store out of all the stores that are "good enough" for
the replica. The risk diversity is still enforced, but we may
recover a replica to a store that is considered "over full", for
example.
Note that during upreplication the allocator will still try to use
one of the "best" stores as targets.
Fixes: #86265
Release note: None
Release justification: a relatively small change, and it can be
reverted by setting kv.allocator.recovery_store_selector=best.