-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: enqueue replicas into lease queue when IO overloaded #119776
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
41190c1
to
7a25702
Compare
7a25702
to
0c766db
Compare
3276462
to
9df2170
Compare
ffed777
to
7758c59
Compare
441f018
to
03d28d9
Compare
03d28d9
to
45e8776
Compare
45e8776
to
16679ea
Compare
d30ca28
to
b904814
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets chat briefly about the changes in store_gossip, but otherwise this looks OK.
Reviewed 4 of 7 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/store.go
line 2603 at r1 (raw file):
s.ioThreshold.Unlock() ioThresholdMax := *ioThreshold
nit: use proto.Clone to make a deep clone here instead of just dereference. This works fine since it is a shallow struct now, but in case it changes in the future this code would break. It would also catch if it had uncloneable fields.
pkg/kv/kvserver/store.go
line 2620 at r1 (raw file):
func (s *Store) existingLeaseCheckIOOverload(ctx context.Context) bool { storeList, _, _ := s.cfg.StorePool.GetStoreList(storepool.StoreFilterNone) storeDescriptor, ok := s.cfg.StorePool.GetStoreDescriptor(s.StoreID())
nit: This is a little ugly that you are pulling this out of the StorePool rather than using the StoreDescriptor that is passed into the CapacityChangeFn. The only part of the local StoreDescriptor that is used in the ExistingLeaseCheck
is the StoreCapacity, and you have this already. This might work, but it seems a little unfortunate to not pass this directly in and change ExistingLeaseCheck to only take a StoreCapacity.
pkg/kv/kvserver/store.go
line 2676 at r1 (raw file):
if err := s.stopper.RunTask(ctx, "io-overload: shed leases", func(ctx context.Context) { newStoreReplicaVisitor(s).Visit(func(repl *Replica) bool { s.leaseQueue.maybeAdd(ctx, repl, repl.Clock().NowAsClockTimestamp())
It would be good if there was some sort of stat tracking leases which were shed this way, or maybe update the log message to list the number of leases that are going to be shed.
pkg/kv/kvserver/store.go
line 2681 at r1 (raw file):
}); err != nil { log.KvDistribution.Infof(ctx, "unable to shed leases due to IO overload: %v", err)
nit: The only time this error ever occurs is during shutdown. If that is true, you might want to make it clear that it didn't run because of shutdown and you don't need to reset the timer.
pkg/kv/kvserver/store_gossip.go
line 65 at r1 (raw file):
// gossipMinMaxIOOverloadScore is the minimum IO overload required that is // required to trigger gossip. gossipMinMaxIOOverloadScore = 0.2
I think I'm missing why we need to do this and why we are using this hardcoded value rather than the LeaseIOOverloadThreshold
. Is this threshold and eager gossiping required at all? I see the advantage of pushing rather than waiting, but this seems like a strange way to do it. Doesn't this have a problem that if it is ramping up slowly over a few requests that we won't trigger? Maybe we can chat briefly.
b904814
to
35f7d5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TYFTR
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist and @kvoli)
pkg/kv/kvserver/store.go
line 2603 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
nit: use proto.Clone to make a deep clone here instead of just dereference. This works fine since it is a shallow struct now, but in case it changes in the future this code would break. It would also catch if it had uncloneable fields.
Updated.
pkg/kv/kvserver/store.go
line 2620 at r1 (raw file):
This is a little ugly that you are pulling this out of the StorePool rather than using the StoreDescriptor that is passed into the CapacityChangeFn.
This function is also used in CanTransferLease
, without which the lease transfers would get throttled at min_lease_transfer_interval
. It is unfortunate its being pulled out of the store pool, but it is necessary to pull out the store list regardless for the mean check.
The only part of the local StoreDescriptor that is used in the
ExistingLeaseCheck
is the StoreCapacity, and you have this already.
The StoreID is also used in ExistingLeaseCheck
and the other lease checks for logging.
pkg/kv/kvserver/store.go
line 2676 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
It would be good if there was some sort of stat tracking leases which were shed this way, or maybe update the log message to list the number of leases that are going to be shed.
Good idea, added the log message to mention the number of leases being shed.
pkg/kv/kvserver/store.go
line 2681 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
nit: The only time this error ever occurs is during shutdown. If that is true, you might want to make it clear that it didn't run because of shutdown and you don't need to reset the timer.
I've gotten rid of this and updated the comment.
pkg/kv/kvserver/store_gossip.go
line 65 at r1 (raw file):
Is this threshold and eager gossiping required at all?
Its required for two reasons:
- If we didn't gossip immediately and shed the leases, they wouldn't actually shed (because the allocator uses the storepool value) and if they did shed, they would come back to the store quickly because other stores are unaware.
- The gossip mechanism triggers the lease shedding on the recipient side, which is useful because of the aforementioned requirement for other stores to know ahead of time via gossip.
Doesn't this have a problem that if it is ramping up slowly over a few requests that we won't trigger?
It doesn't have this problem, the condition to eagerly gossip is a higher IO threshold max score AND a higher score than previously gossiped. There's no relative (%) here.
[...] why we are using this hardcoded value rather than the
LeaseIOOverloadThreshold
To avoid including cluster settings in the struct. I've updated this to use LeaseIIOOverloadThreshold
.
Previously, when the allocator lease IO overload threshold enforcement was set to `shed`, leases would take up to 10 minutes to drain from an IO overloaded store. This was because the shed pacing mechanism was the replica scanner, which by default takes up to 10 minutes. This commit introduces (1) proactive gossip of the store capacity when it both increases over the last value and is greater than an absolute value (LeaseIOOverloadThreshold), and (2) enqueues leaseholder replicas into the lease queue when receiving a gossip update, if the gossip update is for the local store and the store is IO overloaded. In order to prevent wasted resources, the mass enqueue on overload only occurs at most once every `kv.allocator.min_io_overload_lease_shed_interval`, which defaults to 30s. The combination of (1) proactive gossip on IO overload increase, and (2) enqueuing into the lease queue on shed conditions being met, results in leases shedding from an IO overloaded node in under a second. Part of: cockroachdb#118866 Release note: None
35f7d5c
to
888bc90
Compare
bors r=andrewbaptist |
Only the last commit should be reviewed.
Previously, when the allocator lease IO overload threshold enforcement
was set to
shed
, leases would take up to 10 minutes to drain from anIO overloaded store. This was because the shed pacing mechanism was the
replica scanner, which by default takes up to 10 minutes.
This commit introduces (1) proactive gossip of the store capacity when
it both increases over the last value and is greater than an absolute
value (0.2), and (2) enqueues leaseholder replicas into the lease queue
when receiving a gossip update, if the gossip update is for the local
store and the store is IO overloaded.
In order to prevent wasted resources, the mass enqueue on overload only
occurs at most once every
kv.allocator.min_io_overload_lease_shed_interval
, which defaults to30s.
The combination of (1) proactive gossip on IO overload increase, and (2)
enqueuing into the lease queue on shed conditions being met, results in
leases shedding from an IO overloaded node in under a second.
Part of: #118866
Release note: None