Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: benchmark decomission #77458

Closed
lunevalex opened this issue Mar 7, 2022 · 2 comments
Closed

kvserver: benchmark decomission #77458

lunevalex opened this issue Mar 7, 2022 · 2 comments
Assignees
Labels
C-investigation Further steps needed to qualify. C-label will change. T-kv KV Team

Comments

@lunevalex
Copy link
Collaborator

lunevalex commented Mar 7, 2022

Decommissioning is the standard method for removing a bad node from the cluster and adding a new one in. We have seen people run into a lot of slowness trying to decommission large nodes (i.e. in the TB). It is understood that decommissioning is typically a function of node size, number of nodes and snapshot rates but we don't have any benchmarking or recommendations for how long one should expect to decommission a node.

The ask here is three fold:

  • create a benchmark (i.e. something in roachperf) to measure the performance of decommission
  • use the benchmark to identify potential bottlenecks in the decommission process
  • create a framework/function to calculate the speed of decommissioning given a set of inputs i.e. in a resource unconstrained cluster with 10 nodes, 2TB node size and 256MB/s snapshot rates it should take X minutes to decommission a node.

Jira issue: CRDB-13606

Epic CRDB-14621

@lunevalex lunevalex added the C-investigation Further steps needed to qualify. C-label will change. label Mar 7, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 7, 2022
@nvanbenschoten
Copy link
Member

When we build this out, we should start with at least three variants of the test, exercising different node counts and store counts. Here's a strawman proposal:

  • a 4 node cluster (8 vCPUs per node, 1 TB per node)
  • a 32 node cluster (8 vCPUs per node, 1 TB per node)
  • a 32 node cluster with 8 stores per node (32 vCPUs per node, 4 TB per node)

@AlexTalks
Copy link
Contributor

Closing this as the decommissionBench benchmarks have been introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-investigation Further steps needed to qualify. C-label will change. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

3 participants