kvflowcontrol: use 'apply_to_elastic' mode by default #110036

irfansharif · 2023-09-05T16:19:17Z

We've been exercising the flow control machinery in the 'apply_to_all' mode for the last few months, to shake out latent bugs and surface performance regressions. Since flow control shapes quorum writes to the rate of IO admission by the slowest live replica, it's not necessarily the scheme we want by default for latency-sensitive foreground writes. It does however make sense for elastic writes, which by definition is not latency sensitive, and write shaping is ok.

With 'apply_to_elastic', regular writes are still subject to admission control. But it happens only on the leaseholder where the write originates. On nodes observing follower writes from regular work, we'll deduct IO tokens without waiting. Crucially, with 'apply_to_elastic', this deduction-without-waiting will not happen for elastic writes, which tend to be bulkier and by definition, lower priority than regular writes.

It might still make sense for users to opt into 'apply_to_all'. Either in this universal cluster setting form, or perhaps when exposed more selectively through zone configs, so it's done on a schema-by-schema basis. Consider setups with heterogenous regions where nodes in particular regions only contain follower replicas and also have lower IO admission rates (by virtue of there being fewer nodes in aggregate, or with hardware provisioned with lesser throughput). Without any write shaping, if the leader-only regions drive at higher throughput, the follower region will fall permanently behind, or will have uncontrolled LSM growth, making it inappropriate for fail overs. In such cases, users may want to in fact shape their quorum writes based on the IO admission rates of the slowest follower replica.

Part of #110036.

Release note: None

blathers-crl · 2023-09-05T16:19:21Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-09-05T16:19:26Z

This change is

sumeerbhola

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi)

We've been exercising the flow control machinery in the 'apply_to_all' mode for the last few months, to shake out latent bugs and surface performance regressions. Since flow control shapes quorum writes to the rate of IO admission by the slowest live replica, it's not necessarily the scheme we want by default for latency-sensitive foreground writes. It does however make sense for elastic writes, which by definition is not latency sensitive, and write shaping is ok. With 'apply_to_elastic', regular writes are still subject to admission control. But it happens only on the leaseholder where the write originates. On nodes observing follower writes from regular work, we'll deduct IO tokens without waiting. Crucially, with 'apply_to_elastic', this deduction-without-waiting will not happen for elastic writes, which tend to be bulkier and by definition, lower priority than regular writes. It might still make sense for users to opt into 'apply_to_all'. Either in this universal cluster setting form, or perhaps when exposed more selectively through zone configs, so it's done on a schema-by-schema basis. Consider setups with heterogenous regions where nodes in particular regions only contain follower replicas and also have lower IO admission rates (by virtue of there being fewer nodes in aggregate, or with hardware provisioned with lesser throughput). Without any write shaping, if the leader-only regions drive at higher throughput, the follower region will fall permanently behind, or will have uncontrolled LSM growth, making it inappropriate for fail overs. In such cases, users may want to in fact shape their quorum writes based on the IO admission rates of the slowest follower replica. Release note: None

irfansharif · 2023-09-05T17:09:33Z

bors r+

craig · 2023-09-05T18:06:52Z

Build succeeded:

Bazel Essential CI (Cockroach)

irfansharif requested a review from sumeerbhola September 5, 2023 16:19

irfansharif requested a review from a team as a code owner September 5, 2023 16:19

irfansharif requested a review from aadityasondhi September 5, 2023 16:19

sumeerbhola approved these changes Sep 5, 2023

View reviewed changes

irfansharif force-pushed the 230905.elastic-only branch from bcd1535 to fb5bd27 Compare September 5, 2023 17:08

irfansharif requested a review from a team as a code owner September 5, 2023 17:08

irfansharif mentioned this pull request Sep 5, 2023

kvflowcontrol,admission: productionize replication admission control #98703

Closed

15 tasks

craig bot merged commit afd0c37 into cockroachdb:master Sep 5, 2023

irfansharif deleted the 230905.elastic-only branch September 5, 2023 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvflowcontrol: use 'apply_to_elastic' mode by default #110036

kvflowcontrol: use 'apply_to_elastic' mode by default #110036

irfansharif commented Sep 5, 2023 •

edited

Loading

blathers-crl bot commented Sep 5, 2023

cockroach-teamcity commented Sep 5, 2023

sumeerbhola left a comment

irfansharif commented Sep 5, 2023

craig bot commented Sep 5, 2023

kvflowcontrol: use 'apply_to_elastic' mode by default #110036

kvflowcontrol: use 'apply_to_elastic' mode by default #110036

Conversation

irfansharif commented Sep 5, 2023 • edited Loading

blathers-crl bot commented Sep 5, 2023

cockroach-teamcity commented Sep 5, 2023

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif commented Sep 5, 2023

craig bot commented Sep 5, 2023

irfansharif commented Sep 5, 2023 •

edited

Loading