Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: replicate enqueue on span config update is expensive #108724

Closed
kvoli opened this issue Aug 14, 2023 · 2 comments · Fixed by #108725
Closed

kvserver: replicate enqueue on span config update is expensive #108724

kvoli opened this issue Aug 14, 2023 · 2 comments · Fixed by #108725
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Aug 14, 2023

Describe the problem

In #100349, we began enqueuing replicas (into the replicate queue) upon receiving span config updates. The problem is, in clusters with a larger number of replicas per node, the overhead of enqueuing replicas is significant—and occurs regularly, every 10 minutes when the PTS changes.

Expected behavior

Replicas are enqueued into the replicate queue, when there is a span config change which would cause a replication/lease change. The overhead of this enqueuing is less noticeable on nodes with 100k+ leaseholders.

Additional data / screenshots

PTS record updated on span configs every 10 minutes, which causes a spike in CPU due to ShouldPlanChange called on enqueuing into the replicate queue.

image
image

Environment:
Affects master, release-23.1 and release-23.1.9-rc

Jira issue: CRDB-30613

@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-distribution Relating to rebalancing and leasing. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.1.9-rc labels Aug 14, 2023
@kvoli kvoli self-assigned this Aug 14, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Aug 14, 2023
kvoli added a commit to kvoli/cockroach that referenced this issue Aug 14, 2023
Replicas were enqueued into the replicate queue, upon the store
receiving a span config update which could affect the replica. The
replicate queue `shouldQueue` is relatively more expensive than other
queues.

Introduce the cluster setting
`kv.eager_replicate_enqueue_on_span_config_update.enabled`, which when
set to true, enables queuing up replicas on span config updates; when
set to false, disables queuing replicas on span config updates.

By default, this settings is set to false.

Resolves: cockroachdb#108724
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Aug 14, 2023
Replicas were enqueued into the replicate queue, upon the store
receiving a span config update which could affect the replica. The
replicate queue `shouldQueue` is relatively more expensive than other
queues.

Introduce the cluster setting
`kv.eager_replicate_enqueue_on_span_config_update.enabled`, which when
set to true, enables queuing up replicas on span config updates; when
set to false, disables queuing replicas on span config updates.

By default, this settings is set to false.

Resolves: cockroachdb#108724
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Aug 14, 2023
Replicas were enqueued into the replicate queue, upon the store
receiving a span config update which could affect the replica. The
replicate queue `shouldQueue` is relatively more expensive than other
queues.

Introduce the cluster setting
`kv.enqueue_to_replicate_queue_on_span_config_update.enabled`, which when
set to true, enables queuing up replicas on span config updates; when
set to false, disables queuing replicas on span config updates.

By default, this settings is set to false.

Resolves: cockroachdb#108724
Release note (ops change): Introduce the
`kv.enqueue_to_replicate_queue_on_span_config_update.enabled` cluster
setting. When set to `true`, stores in the cluster will enqueue replicas
for replication changes, upon receiving config updates which could
affect the replica. This setting is off by default.
kvoli added a commit to kvoli/cockroach that referenced this issue Aug 14, 2023
Replicas were enqueued into the replicate queue, upon the store
receiving a span config update which could affect the replica. The
replicate queue shouldQueue is relatively more expensive than other
queues.
Introduce the cluster setting
kv.enqueue_in_replicate_queue_on_span_config_update.enabled, which when
set to true, enables queuing up replicas on span config updates; when
set to false, disables queuing replicas on span config updates.
By default, this settings is set to false.

Resolves: cockroachdb#108724

Release note (ops change): Introduce the
kv.enqueue_in_replicate_queue_on_span_config_update.enabled cluster
setting. When set to true, stores in the cluster will enqueue replicas
for replication changes, upon receiving config updates which could
affect the replica. This setting is off by default.
craig bot pushed a commit that referenced this issue Aug 15, 2023
108712: go.mod: bump Pebble to 77e81e806c8b r=RahulAggarwal1016 a=RahulAggarwal1016

77e81e80 pebble: Update Tokenbucket package and use WaitCtx
18e6ad42 pebble: Export keyspan.Fragmenter
f9f63ef2 crossversion: add more comments

Epic: none
Release note: none

108725: kvserver: disable eager replicate enqueue on span cfg r=andrewbaptist,erikgrinaker,arulajmani a=kvoli

Replicas were enqueued into the replicate queue, upon the store
receiving a span config update which could affect the replica. The
replicate queue shouldQueue is relatively more expensive than other
queues.
Introduce the cluster setting
kv.enqueue_in_replicate_queue_on_span_config_update.enabled, which when
set to true, enables queuing up replicas on span config updates; when
set to false, disables queuing replicas on span config updates.
By default, this settings is set to false.

Resolves: #108724

Release note (ops change): Introduce the
kv.enqueue_in_replicate_queue_on_span_config_update.enabled cluster
setting. When set to true, stores in the cluster will enqueue replicas
for replication changes, upon receiving config updates which could
affect the replica. This setting is off by default. Enabling this
setting speeds up how quickly config triggered replication changes
begin, but adds additional CPU overhead. The overhead scales with the
number of leaseholders.

Co-authored-by: Rahul Aggarwal <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
@craig craig bot closed this as completed in 76512f8 Aug 15, 2023
@kvoli kvoli reopened this Aug 15, 2023
@kvoli
Copy link
Collaborator Author

kvoli commented Aug 15, 2023

Will be closed on the backport to 23.1.9-rc.

kvoli added a commit to kvoli/cockroach that referenced this issue Aug 16, 2023
Replicas were enqueued into the replicate queue, upon the store
receiving a span config update which could affect the replica. The
replicate queue shouldQueue is relatively more expensive than other
queues.
Introduce the cluster setting
kv.enqueue_in_replicate_queue_on_span_config_update.enabled, which when
set to true, enables queuing up replicas on span config updates; when
set to false, disables queuing replicas on span config updates.
By default, this settings is set to false.

Resolves: cockroachdb#108724

Release note (ops change): Introduce the
kv.enqueue_in_replicate_queue_on_span_config_update.enabled cluster
setting. When set to true, stores in the cluster will enqueue replicas
for replication changes, upon receiving config updates which could
affect the replica. This setting is off by default. Enabling this
setting speeds up how quickly config triggered replication changes
begin, but adds additional CPU overhead. The overhead scales with the
number of leaseholders.
kvoli added a commit to kvoli/cockroach that referenced this issue Aug 16, 2023
Replicas were enqueued into the replicate queue, upon the store
receiving a span config update which could affect the replica. The
replicate queue shouldQueue is relatively more expensive than other
queues.
Introduce the cluster setting
kv.enqueue_in_replicate_queue_on_span_config_update.enabled, which when
set to true, enables queuing up replicas on span config updates; when
set to false, disables queuing replicas on span config updates.
By default, this settings is set to false.

Resolves: cockroachdb#108724

Release note (ops change): Introduce the
kv.enqueue_in_replicate_queue_on_span_config_update.enabled cluster
setting. When set to true, stores in the cluster will enqueue replicas
for replication changes, upon receiving config updates which could
affect the replica. This setting is off by default. Enabling this
setting speeds up how quickly config triggered replication changes
begin, but adds additional CPU overhead. The overhead scales with the
number of leaseholders.
@kvoli
Copy link
Collaborator Author

kvoli commented Aug 16, 2023

Closed by #108816

@kvoli kvoli closed this as completed Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant