kvserver: benchmark decomission #77458

lunevalex · 2022-03-07T22:18:35Z

Decommissioning is the standard method for removing a bad node from the cluster and adding a new one in. We have seen people run into a lot of slowness trying to decommission large nodes (i.e. in the TB). It is understood that decommissioning is typically a function of node size, number of nodes and snapshot rates but we don't have any benchmarking or recommendations for how long one should expect to decommission a node.

The ask here is three fold:

create a benchmark (i.e. something in roachperf) to measure the performance of decommission
use the benchmark to identify potential bottlenecks in the decommission process
create a framework/function to calculate the speed of decommissioning given a set of inputs i.e. in a resource unconstrained cluster with 10 nodes, 2TB node size and 256MB/s snapshot rates it should take X minutes to decommission a node.

Jira issue: CRDB-13606

Epic CRDB-14621

nvanbenschoten · 2022-04-04T18:15:32Z

When we build this out, we should start with at least three variants of the test, exercising different node counts and store counts. Here's a strawman proposal:

a 4 node cluster (8 vCPUs per node, 1 TB per node)
a 32 node cluster (8 vCPUs per node, 1 TB per node)
a 32 node cluster with 8 stores per node (32 vCPUs per node, 4 TB per node)

AlexTalks · 2022-09-10T00:35:39Z

Closing this as the decommissionBench benchmarks have been introduced.

lunevalex added the C-investigation Further steps needed to qualify. C-label will change. label Mar 7, 2022

blathers-crl bot added the T-kv KV Team label Mar 7, 2022

nvanbenschoten mentioned this issue Apr 7, 2022

kv: decommissioning is slower when adding nodes concurrently #79560

Closed

lunevalex mentioned this issue Apr 14, 2022

kvserver: benchmark upreplication #79940

Closed

exalate-issue-sync bot assigned AlexTalks May 27, 2022

AlexTalks closed this as completed Sep 10, 2022

AlexTalks closed this as completed Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: benchmark decomission #77458

kvserver: benchmark decomission #77458

lunevalex commented Mar 7, 2022 •

edited by exalate-issue-sync bot

Loading

nvanbenschoten commented Apr 4, 2022

AlexTalks commented Sep 10, 2022

kvserver: benchmark decomission #77458

kvserver: benchmark decomission #77458

Comments

lunevalex commented Mar 7, 2022 • edited by exalate-issue-sync bot Loading

nvanbenschoten commented Apr 4, 2022

AlexTalks commented Sep 10, 2022

lunevalex commented Mar 7, 2022 •

edited by exalate-issue-sync bot

Loading