-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: fix constraint analysis on replica replacement #94810
kvserver: fix constraint analysis on replica replacement #94810
Conversation
2dff994
to
d24c606
Compare
4f8e335
to
80d6a10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments/questions. The linter is making noise in a few areas.
Reviewed 4 of 4 files at r1, 8 of 8 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks)
-- commits
line 22 at r2:
nit: beingn
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1326 at r2 (raw file):
conf roachpb.SpanConfig, existingVoters, existingNonVoters []roachpb.ReplicaDescriptor, replacing *roachpb.ReplicaDescriptor,
I'm not a huge fan of passing in a reference that could be mutated. I understand we discussed before that a list wasn't great either since it implied you could be replacing more than 1 replica at a time. Is there an alternative?
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 1907 at r2 (raw file):
// If so, then only replacing it with a satisfying store is valid to ensure // that the constraint stays fully satisfied. if necessary && satisfiedByExistingStore && !satisfiedByCandidateStore {
Nice stuff here - this is clear to understand.
pkg/kv/kvserver/allocator_impl_test.go
line 112 at r1 (raw file):
Locality: roachpb.Locality{ Tiers: []roachpb.Tier{ {
It looks like these are added in the 1st commit then edited in the 2nd commit to remove some flags. Is this necessary?
80d6a10
to
b856e5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1326 at r2 (raw file):
Previously, kvoli (Austen) wrote…
I'm not a huge fan of passing in a reference that could be mutated. I understand we discussed before that a list wasn't great either since it implied you could be replacing more than 1 replica at a time. Is there an alternative?
I can pass in a concrete roachpb.ReplicaDescriptor
and use Validate
(or some new IsValid
function) to check if it has a value of course. I prefer using nil
to indicate that we don't have a replica being replaced, but I understand the concern here. Let me know if you definitely think this should be changed.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
line 1907 at r2 (raw file):
Previously, kvoli (Austen) wrote…
Nice stuff here - this is clear to understand.
Thanks - I was worried we might need additional cases here but hopefully this makes sense. This is also the only case where we can return valid = false, necessary = true
, but that is primarily done so we can log it in the ranking process.
pkg/kv/kvserver/allocator_impl_test.go
line 112 at r1 (raw file):
Previously, kvoli (Austen) wrote…
It looks like these are added in the 1st commit then edited in the 2nd commit to remove some flags. Is this necessary?
Not necessary, should be fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r3, 8 of 8 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1326 at r2 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
I can pass in a concrete
roachpb.ReplicaDescriptor
and useValidate
(or some newIsValid
function) to check if it has a value of course. I prefer usingnil
to indicate that we don't have a replica being replaced, but I understand the concern here. Let me know if you definitely think this should be changed.
It doesn't seem worth changing over it.
pkg/kv/kvserver/allocator_impl_test.go
line 112 at r1 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Not necessary, should be fixed now.
Did this update push? I can still see them being removed in c2.
b856e5d#diff-a380886513365b4c5d0612c9bb1f2ee62a11571d189141fcb56d4c41182f4dbfL112
pkg/kv/kvserver/store_test.go
line 3595 at r4 (raw file):
{NodeID: 5, StoreID: 5, ReplicaID: 5}, }, zoneConfig: zonepb.DefaultSystemZoneConfigRef(),
s/zoneConfig/spanConfig/
here and few other spots.
b856e5d
to
32eec11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1326 at r2 (raw file):
Previously, kvoli (Austen) wrote…
It doesn't seem worth changing over it.
Done.
pkg/kv/kvserver/allocator_impl_test.go
line 112 at r1 (raw file):
Previously, kvoli (Austen) wrote…
Did this update push? I can still see them being removed in c2.
b856e5d#diff-a380886513365b4c5d0612c9bb1f2ee62a11571d189141fcb56d4c41182f4dbfL112
Fixed!
pkg/kv/kvserver/store_test.go
line 3595 at r4 (raw file):
Previously, kvoli (Austen) wrote…
s/zoneConfig/spanConfig/
here and few other spots.
Bad rebase, sorry!
This change adds tests using `store.AllocatorCheckRange(..)` to investigate the behavior or attempting to decommission a node when doing so would cause constraints to be broken. This test is needed because it was uncovered recently that a partially constrained range (i.e. a range configured with `num_replicas = 3, constraint = '{<some constraint>: 1}'` may not have the constraint fully respected if the only node that satisfies said constraint is being decommissioned. Part of cockroachdb#94809 Depends on cockroachdb#94024. Release note: None
This changes the allocator to be fully aware of the replica that is beingn replaced when allocating new target replicas due to a dead or decommissioning node. This ensures that if constraints were respected before the dead or decommissioning node left the cluster, they will still be respected afterwards, particularly in the case at issue, which is when partial constraints are set on a range (e.g. `num_replicas = 3, <some_constraint>: 1`). This is achieved by rejecting candidate stores to allocate the replacement on when they do not satisfy a constraint that was satisfied by the existing store. In doing so, this means that some node decommissions that would previously be able to execute will now not allow the user to decommission the node and violate the configured constraints. Fixes cockroachdb#94809. Release note (bug fix): Decommissions that would violate constraints set on a subset of replicas for a range (e.g. `num_replicas = 3, <constraint>: 1`) will no longer be able to execute, respecting constraints during and after the decommission.
32eec11
to
f7db7e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1216 at r8 (raw file):
// should be considered of otherwise equal validity, with candidate ranking // chosing the best of the available options. var decommissioningReplica *roachpb.ReplicaDescriptor
FYI @kvoli I added this after some thought and reviewing a failing multiregion test. I think it makes sense, but if you don't like this logic being in the allocator (and would rather have it in the replicate queue), let me know. Given that I think our goal is to consolidate more logic into the allocator, it made sense to me to live here.
(The issue is, briefly, if we have a multi-region cluster configured to survive region failure, and all nodes in a region die, we want to ensure that replacement of their replicas proceeds despite their replacement effectively breaking constraints).
bors r+ |
Build succeeded: |
This changes the allocator to be fully aware of the replica that is
beingn replaced when allocating new target replicas due to a dead or
decommissioning node. This ensures that if constraints were respected
before the dead or decommissioning node left the cluster, they will
still be respected afterwards, particularly in the case at issue, which
is when partial constraints are set on a range (e.g.
num_replicas = 3, <some_constraint>: 1
). This is achieved by rejecting candidatestores to allocate the replacement on when they do not satisfy a
constraint that was satisfied by the existing store. In doing so, this
means that some node decommissions that would previously be able to
execute will now not allow the user to decommission the node and violate
the configured constraints.
Fixes #94809.
Depends on #94024.
Release note (bug fix): Decommissions that would violate constraints set
on a subset of replicas for a range (e.g.
num_replicas = 3, <constraint>: 1
) will no longer be able to execute, respectingconstraints during and after the decommission.