-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: evaluate cpu time as a rebalancing signal in place of qps #90590
Comments
SummaryUsing CPU in place of QPS for allocation rebalancing showed significant improvements in CPU balance with skewed workloads and no improvement/error margin for uniform workloads. Likewise, for disk write bandwidth there were moderate improvements for skewed workloads and no improvement for uniform workloads. ResultsTPCETPCCDetails
AllocbenchDetails
Evaluation Questions1. Does using CPU perform better than QPS in balancing CPU usage within a cluster?Using CPU does perform better at balancing CPU usage within a cluster when the When there is significant load, not attributable to replicas, both perform See the below profile of a hot node during TPCE bulk load phase, note the 2. Does using CPU perform better than QPS in balancing disk resources (primarily write bandwidth) within a cluster?Using CPU does perform marginally better than QPS at balancing disk write When there is a mix of read and write operations (kv/r=50/ops=skew) again both 3. To what extent does using CPU prevent admission control overload?In cases where there is CPU resource saturation due to foreground operations In cases where there is write resource saturation, cpu balancing does not 4. To what extent does using CPU disperse admission control overload (same as above).This is much the same as above. In the CPU case, when the CPU contributing to overload is attributed then it is recognized and acted upon. In the inverted LSM case (io overload), read operations tended to consume additional CPU which is again attributed and acted upon - dispersing IO overload. |
This commit switches the default load based rebalancing objective from `qps` to `cpu`. A performance comparison can be found on cockroachdb#90590. resolves: cockroachdb#90582 Release note (ops change): CPU balancing is enabled as the default load based rebalancing objective. This can be reverted by setting `kv.allocator.load_based_rebalancing.objective` to `qps`.
97424: kvserver: enable cpu balancing by default r=nvanbenschoten a=kvoli This commit switches the default load based rebalancing objective from `qps` to `cpu`. A performance comparison can be found on #90590. resolves: #90582 Release note (ops change): CPU balancing is enabled as the default load based rebalancing objective. This can be reverted by setting `kv.allocator.load_based_rebalancing.objective` to `qps`. Co-authored-by: Austen McClernon <[email protected]>
Is your feature request related to a problem? Please describe.
QPS is currently used as the primary metric of load upon a replica and when summed over leaseholder replicas, within a store.
However QPS has challenges in accurately accounting for the resource usage, in cases where requests are not uniformly composed. In such cases, a queries-per-second of 1000 on a store could be enough to saturate CPU, while 10k on the same hardware would not.
This issue is to protoype and evaluate replacing QPS within the allocation system (
store_rebalancer
) with cpu instead.The evaluation of the two approaches should be w.r.t to a standardized benchmark such as in #86661.
The questions which should be answered upon completion of this issue:
admission.io.overload
and queuing latency).Jira issue: CRDB-20851
Epic CRDB-20845
The text was updated successfully, but these errors were encountered: