Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: pro-actively enqueue less preferred leases into lease queue #120441

Merged
merged 1 commit into from
Mar 25, 2024

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Mar 13, 2024

Leases are checked against lease preferences after application, when a
lease violates the applied preferences it is enqueued into the lease
queue. Leases may also satisfy some preference but not the first one, in
which case they are considered less preferred.

Less preferred leases previously needed to wait for the replica scanner
to enqueue them into the lease queue before the lease would be
considered to be moved to the first preference.

Enqueue less preferred leases after application, similar to leases
violating applied preferences.

Resolves: #116081
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli self-assigned this Mar 13, 2024
@kvoli kvoli force-pushed the 240313.less-preferred-leases branch 7 times, most recently from 02bea05 to 54dd22d Compare March 15, 2024 21:05
@kvoli kvoli changed the title kvserver: [dnm] pro-actively enqueue less preferred leases into lease queue kvserver: pro-actively enqueue less preferred leases into lease queue Mar 15, 2024
@kvoli kvoli marked this pull request as ready for review March 15, 2024 21:24
@kvoli kvoli requested review from a team as code owners March 15, 2024 21:24
@kvoli kvoli requested review from a team, herkolategan, DarrylWong and andrewbaptist and removed request for a team, herkolategan and DarrylWong March 15, 2024 21:24
Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Only the one question about the excludeReplsInNeedOfSnap change

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/cmd/roachtest/tests/lease_preferences.go line 115 at r1 (raw file):

				checkNodes:            []int{1, 3, 4, 5},
				eventFn:               makeStopNodesEventFn(2 /* targets */),
				waitForLessPreferred:  true,

nit: I think you should just remove this parameter now. I don't think the only remaining test that uses it matters anymore. Also can we decrease the postEventWaitDuration on all these tests now that we expect them to move eagerly? Otherwise the 10m is long enough for the regular queue to process them so it doesn't feel like it is testing the eager enqueueing.


pkg/cmd/roachtest/tests/lease_preferences.go line 121 at r1 (raw file):

	})
	r.Add(registry.TestSpec{
		// NB: This test takes down 2(/2) nodes in the most preferred locality. Th

nit: Th -> The


pkg/kv/kvserver/replica_proposal.go line 570 at r1 (raw file):

				"acquired lease is less preferred, enqueuing for transfer [lease=%v preferences=%v]",
				newLease, r.mu.conf.LeasePreferences)
			r.store.leaseQueue.AddAsync(ctx, r, allocatorimpl.TransferLeaseForPreferences.Priority()-1)

nit: Can you create two different preference enum values rather than just subtracting 1 (even if you just use the value 299 in the code). I think it will be more clear and also potentially we will separate these more in the future.


pkg/kv/kvserver/allocator/plan/lease.go line 170 at r1 (raw file):

			repl,
			existingVoters,
			false, /* excludeReplsInNeedOfSnap */

I'm confused by this change. Can you clarify why we want to send this to replicas needing a snapshot? I think that is almost certainly going to fail later on.


pkg/kv/kvserver/lease_queue_test.go line 306 at r1 (raw file):

	const numNodes = 3
	const preferredNode = 1
	zcfg := zonepb.DefaultZoneConfig()

nit: :( - I was trying to get rid of using the DefaultZoneConfigOverride, but I will just add this to my list when I clean these up. I don't think there is a good alternative today other than setting this through SQL, so this is fine for now. I'll prioritize this cleanup soon.


pkg/kv/kvserver/lease_queue_test.go line 341 at r1 (raw file):

			Knobs: knobs,
			Locality: roachpb.Locality{
				Tiers: []roachpb.Tier{{Key: "rack", Value: fmt.Sprintf("%d", i+1)}},

nit: You didn't need to do all this. We create default localities with each node have a locality of region:text, dc:dc. Then you won't need any ServerArgsPerNode.

@kvoli kvoli force-pushed the 240313.less-preferred-leases branch from 54dd22d to d1ebc12 Compare March 22, 2024 15:29
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)


pkg/cmd/roachtest/tests/lease_preferences.go line 115 at r1 (raw file):
The parameter is still using it is required -- the entire first region is being removed so the best option is the less preferred lease.

Also can we decrease the postEventWaitDuration on all these tests now that we expect them to move eagerly? Otherwise the 10m is long enough for the regular queue to process them so it doesn't feel like it is testing the eager enqueueing.

Good idea, lowered it to 5 minutes.


pkg/cmd/roachtest/tests/lease_preferences.go line 121 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: Th -> The

Updated.


pkg/kv/kvserver/replica_proposal.go line 570 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: Can you create two different preference enum values rather than just subtracting 1 (even if you just use the value 299 in the code). I think it will be more clear and also potentially we will separate these more in the future.

The issue is that we don't differentiate between less preferred and violating when recommending a lease transfer in the allocator, if there is an improvement in preferences rank we are satisfying, we recommend TransferLeaseForPreferences. Adding a new enum would require also enumerating that difference in the allocator. I'll leave it as a TODO.


pkg/kv/kvserver/allocator/plan/lease.go line 170 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

I'm confused by this change. Can you clarify why we want to send this to replicas needing a snapshot? I think that is almost certainly going to fail later on.

We don't. Its the case described in the comment above:

	// If we don't find a suitable target, but we own a lease violating the
	// lease preferences, and there is a more suitable target, return an error
	// to place the replica in purgatory and retry sooner. This typically
	// happens when we've just acquired a violating lease and we eagerly
	// enqueue the replica before we've received Raft leadership, which
	// prevents us from finding appropriate lease targets since we can't
	// determine if any are behind.

This will place the replica into purgatory so the lease transfer may be retried. Not excluding replicas in need of a snapshot is to determine this, there's no change in logic w.r.t this property. The change is to include all replicas where there exists a better leaseholder target according to preferences (less preferred), as opposed to just violating.


pkg/kv/kvserver/lease_queue_test.go line 306 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: :( - I was trying to get rid of using the DefaultZoneConfigOverride, but I will just add this to my list when I clean these up. I don't think there is a good alternative today other than setting this through SQL, so this is fine for now. I'll prioritize this cleanup soon.

Sounds good.


pkg/kv/kvserver/lease_queue_test.go line 341 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: You didn't need to do all this. We create default localities with each node have a locality of region:text, dc:dc. Then you won't need any ServerArgsPerNode.

Ah nice, updated to use these instead.

@kvoli kvoli force-pushed the 240313.less-preferred-leases branch 5 times, most recently from e1fc5b9 to 9c42da7 Compare March 22, 2024 20:31
Leases are checked against lease preferences after application, when a
lease violates the applied preferences it is enqueued into the lease
queue. Leases may also satisfy some preference but not the first one, in
which case they are considered less preferred.

Less preferred leases previously needed to wait for the replica scanner
to enqueue them into the lease queue before the lease would be
considered to be moved to the first preference.

Enqueue less preferred leases after application, similar to leases
violating applied preferences.

Resolves: cockroachdb#116081
Release note: None
@kvoli kvoli force-pushed the 240313.less-preferred-leases branch from 9c42da7 to ba13d19 Compare March 22, 2024 21:13
@kvoli
Copy link
Collaborator Author

kvoli commented Mar 25, 2024

bors r=andrewbaptist

@craig
Copy link
Contributor

craig bot commented Mar 25, 2024

@craig craig bot merged commit d0c3317 into cockroachdb:master Mar 25, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: eagerly enqueue less preferred leases for transfer
3 participants