-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: consider io overload for lease transfers #96508
Labels
A-kv-distribution
Relating to rebalancing and leasing.
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
GA-blocker
T-kv
KV Team
Milestone
Comments
kvoli
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Feb 3, 2023
2 tasks
kvoli
changed the title
kvserver: consider io overload for all lease transfers
kvserver: consider io overload for lease transfers
Feb 10, 2023
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Feb 23, 2023
Previously, the default IO overload threshold for when stores are considered overloaded was 0.8. It is most often the case that IO overload will flap between (0.4,1.x). It is desirable to not rebalance or allocate to these stores if there are non-overloaded options. This commit lowers the threshold to 0.5. Part of: cockroachdb#96508 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Feb 24, 2023
Previously, the allocator would transfer leases without considering candidate's IO overload. When leases would transfer to the IO overloaded stores, service latency tended to degrade. This commit adds health checks prior to lease transfers. The health checks are similar to the IO overload checks for allocating replicas in cockroachdb#97142. The checks work by comparing a candidate store against `kv.allocator.io_overload_threshold` and the mean of other candidates. If the candidate store is equal to or greater than both these values, it is considered IO overloaded. The current leaseholder has to meet a higher bar to be considered IO overloaded. It must have an IO overload score greater or equal to `kv.allocator.io_overload_threshold` + `kv.allocator.io_overload_threshold_enforcement_leases`. The level of enforcement for IO overload is controlled by `kv.allocator.io_overload_threshold_enforcement_leases` controls the action taken when a candidate store for a lease transfer is IO overloaded. - `block_none`: don't exclude stores. - `block_none_log`: don't exclude stores, log an event. - `block`: exclude stores from being considered as leaseholder targets for a range if they exceed The current leaseholder store will NOT be excluded as a candidate for its current range leases. - `shed`: same behavior as block, however the current leaseholder store WILL BE excluded as a candidate for its current range leases i.e. The lease will always transfer to a healthy and valid store if one exists. The default is `block` and a buffer value of `0.4`. Resolves: cockroachdb#96508 Release note: None
Hi @kvoli, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
kvoli
added
the
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
label
Mar 6, 2023
craig bot
pushed a commit
that referenced
this issue
Mar 7, 2023
97587: allocator: check IO overload on lease transfer r=andrewbaptist a=kvoli Previously, the allocator would return lease transfer targets without considering the IO overload of stores involved. When leases would transfer to the IO overloaded stores, service latency tended to degrade. This commit adds IO overload checks prior to lease transfers. The IO overload checks are similar to the IO overload checks for allocating replicas in #97142. The checks work by comparing a candidate store against `kv.allocator.lease_io_overload_threshold` and the mean of other candidates. If the candidate store is equal to or greater than both these values, it is considered IO overloaded. The default value is 0.5. The current leaseholder has to meet a higher bar to be considered IO overloaded. It must have an IO overload score greater or equal to `kv.allocator.lease_shed_io_overload_threshold`. The default value is 0.9. The level of enforcement for IO overload is controlled by `kv.allocator.lease_io_overload_threshold_enforcement` controls the action taken when a candidate store for a lease transfer is IO overloaded. - `ignore`: ignore IO overload scores entirely during lease transfers (effectively disabling this mechanism); - `block_transfer_to`: lease transfers only consider stores that aren't IO overloaded (existing leases on IO overloaded stores are left as is); - `shed`: actively shed leases from IO overloaded stores to less IO overloaded stores (this is a super-set of block_transfer_to). The default is `block_transfer_to`. This commit also updates the existing replica IO overload checks to be prefixed with `Replica`, to avoid confusion between lease and replica IO overload checks. Resolves: #96508 Release note (ops change): Range leases will no longer be transferred to stores which are IO overloaded. 98041: backupccl: fix off by one index in fileSSTSink file extension r=rhu713 a=rhu713 Currently, the logic that extends the last flushed file fileSSTSink does not trigger if there is only one flushed file. This failure to extend the first flushed file can result in file entries in the backup manifest with duplicate start keys. For example, if the first export response written to the sink contains partial entries of a single key `a`, then the span of the first file will be `a-a`, and the span of the subsequent file will always be `a-<end_key>`. The presence of these duplicate start keys breaks the encoding of the external manifest files list SST as the file path + start key combination in the manifest are assumed to be unique. Fixes #97953 Release note: None 98072: backupccl: replace restore2TB and restoretpccInc tests r=lidorcarmel a=msbutler This patch removes the restore2TB* roachtests which ran a 2TB bank restore to benchmark restore performance across a few hardware configurations. This patch also replaces the `restoreTPCCInc/nodes=10` test which tested our ability to handle a backup with a long chain. This patch also adds: 1. `restore/tpce/400GB/aws/nodes=4/cpus=16` to measure how per-node throughput scales when the per node vcpu count doubles relative to default. 2. `restore/tpce/400GB/aws/nodes=8/cpus=8` to measure how per-node throughput scales when the number of nodes doubles relative to default. 3. `restore/tpce/400GB/aws/backupsIncluded=48/nodes=4/cpus=8` to measure restore reliability and performance on 48 length long backup chain relative to default. A future patch will update the fixtures used in the restore node shutdown scripts, and add more perf based tests. Fixes #92699 Release note: None Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Rui Hu <[email protected]> Co-authored-by: Michael Butler <[email protected]>
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Mar 13, 2023
This commit updates the "do nothing" lease IO overload enforcement (`kv.allocator.lease_io_overload_threshold_enforcement`) setting to be correctly spelled "ignore" rather than "ingore". Part of: cockroachdb#96508 Release note (ops change): The `kv.allocator.lease_io_overload_threshold_enforcement` setting value which disables enforcement is updated to be spelled correctly as "ignore" rather than "ingore".
craig bot
pushed a commit
that referenced
this issue
Mar 14, 2023
95789: pkg/util/log: don't falsify tenant ID tag in logs if none in ctx r=andreimatei a=abarganier Previously, I made the decision to always tag a log entry with a tenant ID, even if no tenant ID was found in the context associated with the log entry. In this case, the system tenant ID was used in the tag, instead of omitting a tenant ID tag altogether. I received some feedback that this is confusing. For example, imagine testing a feature, expecting log entries to come from a secondary tenant, and the context being used in that feature is not annotated with a tenant ID. With the previous behavior, the log entry would default to being tagged with the system tenant ID instead of having empty tags (or at least, no tenant ID tag). In this scenario, how do I tell the actual state of the log entry? Did the log entry indeed come from a goroutine belonging to the system tenant? Or was the context just missing the tenant ID annotation, but otherwise came from the correct tenant? This ambiguity is not helpful. By falsifying a tenant ID tag we confuse the log reader about the actual state of the system. Furthermore, our eventual goal should be that almost no context objects in the system exist without a tenant ID (except for perhaps at startup before tenant initialization). Tagging with the system tenant ID in the case of a missing tenant ID annotation in the context makes it difficult to track down offending context objects. This patch removes this default behavior from the logging package. Now, if no tenant ID is found in the context, we do not tag the entry with a tenant ID. Note however that on the *decode* side, we will maintain this default tenant ID tagging behavior. If a log entry does not have a tenant ID tag, then we must assume that only the system tenant has privilege to view said log entry, since the owner is ambiguous. Release note: none Epic CRDB-14486 98175: cdc: show all changefeed jobs in `SHOW CHANGEFEED JOBS` r=HonoreDB a=jayshrivastava ### cdc: show all changefeed jobs in SHOW CHANGEFEED JOBS Release note (general change): Previously, the output of `SHOW CHANGEFEED JOBS` was limited to show unfinished jobs and finished jobs from the last 14 days. This change makes the command show all changefeed jobs, regardless of if they finished and when they finished. Note that jobs still obey the cluster setting `jobs.retention_time`. Completed jobs older than that time are deleted. Fixes: #97883 ### jobs: add virtual index for job_type in crdb_internal.jobs This change adds a virtual index on the `job_type` column of `crdb_internal.jobs`. This change should make queries on that table which filter on job type (such as `SHOW CHANGEFEED JOBS`) more efficient. Release note: None Epic: None 98515: kvserver: deflake test store capacity after split r=andrewbaptist a=kvoli This commit defales `TestStoreCapacityAfterSplit`. Previously it was possible for the replica load stats which underpins Capacity to be reset. The reset caused the recording duration to fall short of min stats duration, which led to a 0 value being reported for writes in store capacity. This commit bumps the manual clock twice and removes redundant leaseholder checks within a retry loop. The combination of these two changes makes the test much less likely to flake. The test is now unskipped. ``` dev test pkg/kv/kvserver -f TestStoreCapacityAfterSplit -v --stress ... 4410 runs so far, 0 failures, over 6m10s ``` Resolves: #92677 Release note: None 98521: ui: don't continue polling endpoints that return 403 errors r=dhartunian a=abarganier It was brought to our attention that endpoints such as `v1/settings` would continue to be polled by DB Console even if they returned 403 errors. If an endpoint returns 403 errors, we should not continue to poll it since the required access is not present for the current user. This patch updates the polling mechanism to short-circuit the `refresh` process if a 403 error is encountered throughout the lifecycle of the poller. Release note: none Fixes: #98356 98536: kvserver: deflake learner joint cfg relocate range r=andrewbaptist a=kvoli Previously, in `TestLearnerOrJointConfigAdminRelocateRange` it was possible for there to be an in-flight snapshot towards a learner, prior to sending `AdminRelocateRange` command. When this occurred, the test would fail as `AdminRelocateRange` returns an error when finding any in-flight snapshots to learners. This situation occurred infrequently, causing the test to flake. This commit updates the `TestLearnerOrJointConfigAdminRelocateRange` test to first assert that there are the expected number of learners, then assert that there are no in-flight snapshots towards learners before beginning the main testing logic. The test is now unskipped. ``` dev test pkg/kv/kvserver \ -f TestLearnerOrJointConfigAdminRelocateRange \ -v --stress ... 5652 runs so far, 0 failures, over 12m30s ``` Resolves: #95500 Release note: None 98542: storage: remove MVCCIterator.Key method r=jbowens a=jbowens The MVCCIterator interface previously exposed two methods for accessing the current iterator postion as a MVCC key—UnsafeKey and Key. Key() was equivalent to UnsafeKey().Clone(). This commit removes the Key() variant, pushing the onus of key copying onto the caller. This reduces the interface surface area, avoids accidental key copying (some of which is addressed within this commit), and does not impose any unreasonable burden on callers. Epic: None Informs #82589. Release note: None 98543: allocator: fix lease io enforcement setting typo r=andrewbaptist a=kvoli This commit updates the "do nothing" lease IO overload enforcement (`kv.allocator.lease_io_overload_threshold_enforcement`) setting to be correctly spelled "ignore" rather than "ingore". Part of: #96508 Release note (ops change): The `kv.allocator.lease_io_overload_threshold_enforcement` setting value which disables enforcement is updated to be spelled correctly as "ignore" rather than "ingore". 98600: server: change conn close error to warning r=knz,abarganier a=dhartunian Resolves: #98523 Epic: None Release note: None Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: David Hartunian <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-distribution
Relating to rebalancing and leasing.
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
GA-blocker
T-kv
KV Team
In #78608 we began checking the io overload (via a similar but not identical stat which should be replaced #85084) for load based (store rebalancer) replica rebalancing.
This however didn't extend to lease transfers:
FollowTheWorkload
,LeaseCountConvergence
andLoadConvergence
.cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
Line 2107 in 3e65660
cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
Line 2147 in 3e65660
This issue is to extend the io overload checks to apply to lease transfer targets in the allocator.
Jira issue: CRDB-24157
The text was updated successfully, but these errors were encountered: