Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: benchmark overhead of replication ac v2 on simple workloads #128033

Open
Tracked by #123509
kvoli opened this issue Jul 31, 2024 · 1 comment
Open
Tracked by #123509

kv: benchmark overhead of replication ac v2 on simple workloads #128033

kvoli opened this issue Jul 31, 2024 · 1 comment
Assignees
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Jul 31, 2024

Once replication AC V2 is introduced, we should benchmark the overhead of running in V2 vs V1 for a simple, operation intensive workload like kv95/50/0.

We want to avoid introducing any related performance regressions, which aren't explicitly considered a worthwhile tradeoff.

Jira issue: CRDB-40745

Epic CRDB-37515

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-replication-admission-control-v2 Related to introduction of replication AC v2 labels Jul 31, 2024
@blathers-crl blathers-crl bot added the T-kv KV Team label Jul 31, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
@kvoli
Copy link
Collaborator Author

kvoli commented Oct 7, 2024

Extending the scope of this issue to include all of:

  • compare pre vs post RACv2 (1) throughput, (2) latency, (3) CPU profile, (4) mutex profile, (5) heap profile against a cluster 3 x (8vCPU,16vCPU,32vCPU) nodes:
    • KV0 uniform : only did 8vCPU
    • KV50 uniform
    • K95 uniform
    • KV0 zipfian

cc @sumeerbhola

Note, not all machine types are necessary because that explodes the combinations, but best effort, perhaps at least 1-2 for each.

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 7, 2024
craig bot pushed a commit that referenced this issue Oct 8, 2024
132137: replica_rac2: prevent unnecessary heap allocation of admitted vector r=pav-kv a=sumeerbhola

Informs #128033

Epic: CRDB-37515

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 9, 2024
- Scratch for []entryFCState for the new entries being appended.
- Scratch for []tokenWaitingHandleInfo in WaitForEval.
- Accumulate the (send or eval) tokens to deduct and then make
  one call to the shared tokenCounter. This avoids repeated calls
  to PhysicalTime() and repeated acquisitons of a possibly contended
  mutex.

Informs cockroachdb#128033

Epic: CRDB-37515

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 9, 2024
…version

We were exercising v1 codepaths that were not doing anything useful but
consuming 1+% of CPU.

Informs cockroachdb#128033

Epic: CRDB-37515

Release note: None
craig bot pushed a commit that referenced this issue Oct 10, 2024
132267: kvserver: disable v1 if the node starts with a recent enough cluster … r=kvoli a=sumeerbhola

…version

We were exercising v1 codepaths that were not doing anything useful but consuming 1+% of CPU.

Informs #128033

Epic: CRDB-37515

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 10, 2024
- Scratch for []entryFCState for the new entries being appended.
- Scratch for []tokenWaitingHandleInfo in WaitForEval.
- Accumulate the (send or eval) tokens to deduct and then make
  one call to the shared tokenCounter. This avoids repeated calls
  to PhysicalTime() and repeated acquisitons of a possibly contended
  mutex.

Informs cockroachdb#128033

Epic: CRDB-37515

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 10, 2024
This is a micro-optimization, plus avoids exposing Raft details that the
three callers are not interested in. BasicProgress only contains
{State, Match, Next} and there is no sorting of peer ids before the
visitor is called.

This method consumes 0.6% in kv0 running with RACv2, of which more than
half is calls from RACv2 code.

Informs cockroachdb#128033

Epic: CRDB-37515

Release note: None
craig bot pushed a commit that referenced this issue Oct 10, 2024
132129: roachtest: add slow disk perturbation test r=kvoli a=andrewbaptist

This change adds a new set of perturbation tests perturbation/*/slowDisk which tests slow disks. We have see support cases where slow disks can cause cluster level availability outages.

Epic: none

Release note: None

132254: rac2: small allocation optimizations in rangeController r=kvoli a=sumeerbhola

- Scratch for []entryFCState for the new entries being appended.
- Scratch for []tokenWaitingHandleInfo in WaitForEval.
- Accumulate the (send or eval) tokens to deduct and then make one call to the shared tokenCounter. This avoids repeated calls to PhysicalTime() and repeated acquisitons of a possibly contended mutex.

Informs #128033

Epic: CRDB-37515

Release note: None

132313: raft,kvserver: add RawNode.WithBasicProgress and BasicProgress struct r=pav-kv,kvoli a=sumeerbhola

This is a micro-optimization, plus avoids exposing Raft details that the three callers are not interested in. BasicProgress only contains {State, Match, Next} and there is no sorting of peer ids before the visitor is called.

This method consumes 0.6% in kv0 running with RACv2, of which more than half is calls from RACv2 code.

Informs #128033

Epic: CRDB-37515

Release note: None

132315: raft: don't panic in Inflights.Add r=kvoli a=pav-kv

In lazy replication mode, the inflights struct should not enforce the in-flight limit. Instead, this policy is implemented at the higher level (RACv2). The tracker nevertheless correctly tracks all the in-flight state, so that it can correctly switch in/out of the lazy replication.

Part of #128779

Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
@kvoli kvoli added GA-blocker branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 14, 2024
craig bot pushed a commit that referenced this issue Oct 17, 2024
132140: replica_rac2: optimize logTracker.admitted r=sumeerbhola a=pav-kv

Improve cache friendliness by checking the `dirty` and scheduling bits before resetting them. Cache the latest admitted vector to avoid computing it on every call. Instead, it is recomputed only if `dirty` bit is true.

Informs #128033

Co-authored-by: Pavel Kalinnikov <[email protected]>
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Nov 13, 2024
Fixed a bug that was using incorrect number of iterations. And made the
benchmark more stable, and useful for smaller number of iterations, by
using OverrideTokenDeduction to deduct 1MiB of tokens for tiny writes.

Informs cockroachdb#128033

Epic: CRDB-37515

Release note: None
craig bot pushed a commit that referenced this issue Nov 14, 2024
135121: kvserver: fixes to BenchmarkFlowControlV2Basic r=kvoli a=sumeerbhola

Fixed a bug that was using incorrect number of iterations. And made the benchmark more stable, and useful for smaller number of iterations, by using OverrideTokenDeduction to deduct 1MiB of tokens for tiny writes.

Informs #128033

Epic: CRDB-37515

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

3 participants