Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: fix constraint analysis on replica replacement #94810

Merged
merged 2 commits into from
Feb 15, 2023

Conversation

AlexTalks
Copy link
Contributor

@AlexTalks AlexTalks commented Jan 6, 2023

This changes the allocator to be fully aware of the replica that is
beingn replaced when allocating new target replicas due to a dead or
decommissioning node. This ensures that if constraints were respected
before the dead or decommissioning node left the cluster, they will
still be respected afterwards, particularly in the case at issue, which
is when partial constraints are set on a range (e.g. num_replicas = 3, <some_constraint>: 1). This is achieved by rejecting candidate
stores to allocate the replacement on when they do not satisfy a
constraint that was satisfied by the existing store. In doing so, this
means that some node decommissions that would previously be able to
execute will now not allow the user to decommission the node and violate
the configured constraints.

Fixes #94809.

Depends on #94024.

Release note (bug fix): Decommissions that would violate constraints set
on a subset of replicas for a range (e.g. num_replicas = 3, <constraint>: 1) will no longer be able to execute, respecting
constraints during and after the decommission.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@AlexTalks AlexTalks force-pushed the allocator_fix_unconstrained branch 2 times, most recently from 2dff994 to d24c606 Compare January 10, 2023 03:25
@AlexTalks AlexTalks force-pushed the allocator_fix_unconstrained branch 2 times, most recently from 4f8e335 to 80d6a10 Compare February 6, 2023 20:38
@AlexTalks AlexTalks marked this pull request as ready for review February 6, 2023 20:39
@AlexTalks AlexTalks requested a review from a team as a code owner February 6, 2023 20:39
@AlexTalks AlexTalks requested a review from kvoli February 6, 2023 20:40
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments/questions. The linter is making noise in a few areas.

Reviewed 4 of 4 files at r1, 8 of 8 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks)


-- commits line 22 at r2:
nit: beingn


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1326 at r2 (raw file):

	conf roachpb.SpanConfig,
	existingVoters, existingNonVoters []roachpb.ReplicaDescriptor,
	replacing *roachpb.ReplicaDescriptor,

I'm not a huge fan of passing in a reference that could be mutated. I understand we discussed before that a list wasn't great either since it implied you could be replacing more than 1 replica at a time. Is there an alternative?


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 1907 at r2 (raw file):

		// If so, then only replacing it with a satisfying store is valid to ensure
		// that the constraint stays fully satisfied.
		if necessary && satisfiedByExistingStore && !satisfiedByCandidateStore {

Nice stuff here - this is clear to understand.


pkg/kv/kvserver/allocator_impl_test.go line 112 at r1 (raw file):

			Locality: roachpb.Locality{
				Tiers: []roachpb.Tier{
					{

It looks like these are added in the 1st commit then edited in the 2nd commit to remove some flags. Is this necessary?

@AlexTalks AlexTalks force-pushed the allocator_fix_unconstrained branch from 80d6a10 to b856e5d Compare February 8, 2023 23:49
Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1326 at r2 (raw file):

Previously, kvoli (Austen) wrote…

I'm not a huge fan of passing in a reference that could be mutated. I understand we discussed before that a list wasn't great either since it implied you could be replacing more than 1 replica at a time. Is there an alternative?

I can pass in a concrete roachpb.ReplicaDescriptor and use Validate (or some new IsValid function) to check if it has a value of course. I prefer using nil to indicate that we don't have a replica being replaced, but I understand the concern here. Let me know if you definitely think this should be changed.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 1907 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Nice stuff here - this is clear to understand.

Thanks - I was worried we might need additional cases here but hopefully this makes sense. This is also the only case where we can return valid = false, necessary = true, but that is primarily done so we can log it in the ranking process.


pkg/kv/kvserver/allocator_impl_test.go line 112 at r1 (raw file):

Previously, kvoli (Austen) wrote…

It looks like these are added in the 1st commit then edited in the 2nd commit to remove some flags. Is this necessary?

Not necessary, should be fixed now.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r3, 8 of 8 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1326 at r2 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

I can pass in a concrete roachpb.ReplicaDescriptor and use Validate (or some new IsValid function) to check if it has a value of course. I prefer using nil to indicate that we don't have a replica being replaced, but I understand the concern here. Let me know if you definitely think this should be changed.

It doesn't seem worth changing over it.


pkg/kv/kvserver/allocator_impl_test.go line 112 at r1 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Not necessary, should be fixed now.

Did this update push? I can still see them being removed in c2.

b856e5d#diff-a380886513365b4c5d0612c9bb1f2ee62a11571d189141fcb56d4c41182f4dbfL112


pkg/kv/kvserver/store_test.go line 3595 at r4 (raw file):

				{NodeID: 5, StoreID: 5, ReplicaID: 5},
			},
			zoneConfig: zonepb.DefaultSystemZoneConfigRef(),

s/zoneConfig/spanConfig/ here and few other spots.

@AlexTalks AlexTalks force-pushed the allocator_fix_unconstrained branch from b856e5d to 32eec11 Compare February 9, 2023 19:37
Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1326 at r2 (raw file):

Previously, kvoli (Austen) wrote…

It doesn't seem worth changing over it.

Done.


pkg/kv/kvserver/allocator_impl_test.go line 112 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Did this update push? I can still see them being removed in c2.

b856e5d#diff-a380886513365b4c5d0612c9bb1f2ee62a11571d189141fcb56d4c41182f4dbfL112

Fixed!


pkg/kv/kvserver/store_test.go line 3595 at r4 (raw file):

Previously, kvoli (Austen) wrote…

s/zoneConfig/spanConfig/ here and few other spots.

Bad rebase, sorry!

This change adds tests using `store.AllocatorCheckRange(..)` to
investigate the behavior or attempting to decommission a node when doing
so would cause constraints to be broken. This test is needed because it
was uncovered recently that a partially constrained range (i.e. a range
configured with `num_replicas = 3, constraint = '{<some constraint>:
1}'` may not have the constraint fully respected if the only node that
satisfies said constraint is being decommissioned.

Part of cockroachdb#94809

Depends on cockroachdb#94024.

Release note: None
This changes the allocator to be fully aware of the replica that is
beingn replaced when allocating new target replicas due to a dead or
decommissioning node. This ensures that if constraints were respected
before the dead or decommissioning node left the cluster, they will
still be respected afterwards, particularly in the case at issue, which
is when partial constraints are set on a range (e.g. `num_replicas = 3,
<some_constraint>: 1`). This is achieved by rejecting candidate
stores to allocate the replacement on when they do not satisfy a
constraint that was satisfied by the existing store. In doing so, this
means that some node decommissions that would previously be able to
execute will now not allow the user to decommission the node and violate
the configured constraints.

Fixes cockroachdb#94809.

Release note (bug fix): Decommissions that would violate constraints set
on a subset of replicas for a range (e.g. `num_replicas = 3,
<constraint>: 1`) will no longer be able to execute, respecting
constraints during and after the decommission.
@AlexTalks AlexTalks force-pushed the allocator_fix_unconstrained branch from 32eec11 to f7db7e3 Compare February 15, 2023 00:28
Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1216 at r8 (raw file):

	// should be considered of otherwise equal validity, with candidate ranking
	// chosing the best of the available options.
	var decommissioningReplica *roachpb.ReplicaDescriptor

FYI @kvoli I added this after some thought and reviewing a failing multiregion test. I think it makes sense, but if you don't like this logic being in the allocator (and would rather have it in the replicate queue), let me know. Given that I think our goal is to consolidate more logic into the allocator, it made sense to me to live here.

(The issue is, briefly, if we have a multi-region cluster configured to survive region failure, and all nodes in a region die, we want to ensure that replacement of their replicas proceeds despite their replacement effectively breaking constraints).

@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 15, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: allocating replacement replicas needs to consider fully satisfying constraints
3 participants