-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: assert on cpu instead of lease count in rebalance/by-load #102823
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
kvoli
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-distribution
Relating to rebalancing and leasing.
labels
May 5, 2023
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
May 5, 2023
The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) cockroachdb#88829. This commit updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This commit updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: cockroachdb#102801 resolves: cockroachdb#102823 Release note: None
craig bot
pushed a commit
that referenced
this issue
May 9, 2023
102824: roachtest: assert on CPU in rebalance/by-load r=aliher1911 a=kvoli The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) #88829. This PR updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This PR updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: #102801 resolves: #102823 Release note: None Co-authored-by: Austen McClernon <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
May 9, 2023
The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) #88829. This commit updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This commit updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: #102801 resolves: #102823 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Jun 27, 2023
The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) cockroachdb#88829. This commit updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This commit updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: cockroachdb#102801 resolves: cockroachdb#102823 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Is your feature request related to a problem? Please describe.
The
rebalance/by-load
roachtests assert that the lease count (for the kv table) is balanced between stores as a proxy for load. Given there are very few ranges, this is an okay however sometimes flaky proxy for load.Describe the solution you'd like
Update the
rebalance/by-load
roachtests to assert on CPU, as opposed to lease count.Related: #102801
Jira issue: CRDB-27670
The text was updated successfully, but these errors were encountered: