Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: enqueue replicas into lease queue when IO overloaded #119776

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Feb 29, 2024

Only the last commit should be reviewed.

Previously, when the allocator lease IO overload threshold enforcement
was set to shed, leases would take up to 10 minutes to drain from an
IO overloaded store. This was because the shed pacing mechanism was the
replica scanner, which by default takes up to 10 minutes.

This commit introduces (1) proactive gossip of the store capacity when
it both increases over the last value and is greater than an absolute
value (0.2), and (2) enqueues leaseholder replicas into the lease queue
when receiving a gossip update, if the gossip update is for the local
store and the store is IO overloaded.

In order to prevent wasted resources, the mass enqueue on overload only
occurs at most once every
kv.allocator.min_io_overload_lease_shed_interval, which defaults to
30s.

The combination of (1) proactive gossip on IO overload increase, and (2)
enqueuing into the lease queue on shed conditions being met, results in
leases shedding from an IO overloaded node in under a second.

Part of: #118866
Release note: None

Copy link

blathers-crl bot commented Feb 29, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch 3 times, most recently from 41190c1 to 7a25702 Compare March 4, 2024 20:43
@kvoli kvoli self-assigned this Mar 4, 2024
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch from 7a25702 to 0c766db Compare March 4, 2024 21:24
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch 3 times, most recently from 3276462 to 9df2170 Compare March 5, 2024 16:09
@kvoli kvoli marked this pull request as ready for review March 6, 2024 15:58
@kvoli kvoli requested a review from a team as a code owner March 6, 2024 15:58
@kvoli kvoli requested review from a team March 6, 2024 15:58
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch 2 times, most recently from ffed777 to 7758c59 Compare March 11, 2024 19:19
@kvoli kvoli requested a review from a team as a code owner March 11, 2024 19:19
@kvoli kvoli requested review from srosenberg and renatolabs and removed request for a team March 11, 2024 19:19
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch 2 times, most recently from 441f018 to 03d28d9 Compare March 11, 2024 21:48
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch from 03d28d9 to 45e8776 Compare March 12, 2024 21:08
@kvoli kvoli requested a review from a team as a code owner March 12, 2024 21:08
@kvoli kvoli requested a review from a team March 12, 2024 21:08
@kvoli kvoli requested review from a team as code owners March 12, 2024 21:08
@kvoli kvoli requested review from a team as code owners March 12, 2024 21:08
@kvoli kvoli requested review from abarganier and dt and removed request for a team, srosenberg, renatolabs, abarganier and dt March 12, 2024 21:08
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch from 45e8776 to 16679ea Compare March 12, 2024 21:10
@kvoli kvoli requested a review from andrewbaptist March 13, 2024 19:55
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch 4 times, most recently from d30ca28 to b904814 Compare March 18, 2024 14:18
Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Lets chat briefly about the changes in store_gossip, but otherwise this looks OK.

Reviewed 4 of 7 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/store.go line 2603 at r1 (raw file):

	s.ioThreshold.Unlock()

	ioThresholdMax := *ioThreshold

nit: use proto.Clone to make a deep clone here instead of just dereference. This works fine since it is a shallow struct now, but in case it changes in the future this code would break. It would also catch if it had uncloneable fields.


pkg/kv/kvserver/store.go line 2620 at r1 (raw file):

func (s *Store) existingLeaseCheckIOOverload(ctx context.Context) bool {
	storeList, _, _ := s.cfg.StorePool.GetStoreList(storepool.StoreFilterNone)
	storeDescriptor, ok := s.cfg.StorePool.GetStoreDescriptor(s.StoreID())

nit: This is a little ugly that you are pulling this out of the StorePool rather than using the StoreDescriptor that is passed into the CapacityChangeFn. The only part of the local StoreDescriptor that is used in the ExistingLeaseCheck is the StoreCapacity, and you have this already. This might work, but it seems a little unfortunate to not pass this directly in and change ExistingLeaseCheck to only take a StoreCapacity.


pkg/kv/kvserver/store.go line 2676 at r1 (raw file):

		if err := s.stopper.RunTask(ctx, "io-overload: shed leases", func(ctx context.Context) {
			newStoreReplicaVisitor(s).Visit(func(repl *Replica) bool {
				s.leaseQueue.maybeAdd(ctx, repl, repl.Clock().NowAsClockTimestamp())

It would be good if there was some sort of stat tracking leases which were shed this way, or maybe update the log message to list the number of leases that are going to be shed.


pkg/kv/kvserver/store.go line 2681 at r1 (raw file):

		}); err != nil {
			log.KvDistribution.Infof(ctx,
				"unable to shed leases due to IO overload: %v", err)

nit: The only time this error ever occurs is during shutdown. If that is true, you might want to make it clear that it didn't run because of shutdown and you don't need to reset the timer.


pkg/kv/kvserver/store_gossip.go line 65 at r1 (raw file):

	// gossipMinMaxIOOverloadScore is the minimum IO overload required that is
	// required to trigger gossip.
	gossipMinMaxIOOverloadScore = 0.2

I think I'm missing why we need to do this and why we are using this hardcoded value rather than the LeaseIOOverloadThreshold. Is this threshold and eager gossiping required at all? I see the advantage of pushing rather than waiting, but this seems like a strange way to do it. Doesn't this have a problem that if it is ramping up slowly over a few requests that we won't trigger? Maybe we can chat briefly.

@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch from b904814 to 35f7d5c Compare March 22, 2024 14:47
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYFTR

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist and @kvoli)


pkg/kv/kvserver/store.go line 2603 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: use proto.Clone to make a deep clone here instead of just dereference. This works fine since it is a shallow struct now, but in case it changes in the future this code would break. It would also catch if it had uncloneable fields.

Updated.


pkg/kv/kvserver/store.go line 2620 at r1 (raw file):

This is a little ugly that you are pulling this out of the StorePool rather than using the StoreDescriptor that is passed into the CapacityChangeFn.

This function is also used in CanTransferLease, without which the lease transfers would get throttled at min_lease_transfer_interval. It is unfortunate its being pulled out of the store pool, but it is necessary to pull out the store list regardless for the mean check.

The only part of the local StoreDescriptor that is used in the ExistingLeaseCheck is the StoreCapacity, and you have this already.

The StoreID is also used in ExistingLeaseCheck and the other lease checks for logging.


pkg/kv/kvserver/store.go line 2676 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

It would be good if there was some sort of stat tracking leases which were shed this way, or maybe update the log message to list the number of leases that are going to be shed.

Good idea, added the log message to mention the number of leases being shed.


pkg/kv/kvserver/store.go line 2681 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: The only time this error ever occurs is during shutdown. If that is true, you might want to make it clear that it didn't run because of shutdown and you don't need to reset the timer.

I've gotten rid of this and updated the comment.


pkg/kv/kvserver/store_gossip.go line 65 at r1 (raw file):

Is this threshold and eager gossiping required at all?

Its required for two reasons:

  1. If we didn't gossip immediately and shed the leases, they wouldn't actually shed (because the allocator uses the storepool value) and if they did shed, they would come back to the store quickly because other stores are unaware.
  2. The gossip mechanism triggers the lease shedding on the recipient side, which is useful because of the aforementioned requirement for other stores to know ahead of time via gossip.

Doesn't this have a problem that if it is ramping up slowly over a few requests that we won't trigger?

It doesn't have this problem, the condition to eagerly gossip is a higher IO threshold max score AND a higher score than previously gossiped. There's no relative (%) here.

[...] why we are using this hardcoded value rather than the LeaseIOOverloadThreshold

To avoid including cluster settings in the struct. I've updated this to use LeaseIIOOverloadThreshold.

Previously, when the allocator lease IO overload threshold enforcement
was set to `shed`, leases would take up to 10 minutes to drain from an
IO overloaded store. This was because the shed pacing mechanism was the
replica scanner, which by default takes up to 10 minutes.

This commit introduces (1) proactive gossip of the store capacity when
it both increases over the last value and is greater than an absolute
value (LeaseIOOverloadThreshold), and (2) enqueues leaseholder replicas
into the lease queue when receiving a gossip update, if the gossip
update is for the local store and the store is IO overloaded.

In order to prevent wasted resources, the mass enqueue on overload only
occurs at most once every
`kv.allocator.min_io_overload_lease_shed_interval`, which defaults to
30s.

The combination of (1) proactive gossip on IO overload increase, and (2)
enqueuing into the lease queue on shed conditions being met, results in
leases shedding from an IO overloaded node in under a second.

Part of: cockroachdb#118866
Release note: None
@kvoli kvoli force-pushed the 240216.mass-enqueue-leases branch from 35f7d5c to 888bc90 Compare March 22, 2024 15:32
@kvoli
Copy link
Collaborator Author

kvoli commented Mar 22, 2024

bors r=andrewbaptist

@craig
Copy link
Contributor

craig bot commented Mar 22, 2024

@craig craig bot merged commit 8d3b129 into cockroachdb:master Mar 22, 2024
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants