-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvflowcontrol: use 'apply_to_elastic' mode by default #110036
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi)
We've been exercising the flow control machinery in the 'apply_to_all' mode for the last few months, to shake out latent bugs and surface performance regressions. Since flow control shapes quorum writes to the rate of IO admission by the slowest live replica, it's not necessarily the scheme we want by default for latency-sensitive foreground writes. It does however make sense for elastic writes, which by definition is not latency sensitive, and write shaping is ok. With 'apply_to_elastic', regular writes are still subject to admission control. But it happens only on the leaseholder where the write originates. On nodes observing follower writes from regular work, we'll deduct IO tokens without waiting. Crucially, with 'apply_to_elastic', this deduction-without-waiting will not happen for elastic writes, which tend to be bulkier and by definition, lower priority than regular writes. It might still make sense for users to opt into 'apply_to_all'. Either in this universal cluster setting form, or perhaps when exposed more selectively through zone configs, so it's done on a schema-by-schema basis. Consider setups with heterogenous regions where nodes in particular regions only contain follower replicas and also have lower IO admission rates (by virtue of there being fewer nodes in aggregate, or with hardware provisioned with lesser throughput). Without any write shaping, if the leader-only regions drive at higher throughput, the follower region will fall permanently behind, or will have uncontrolled LSM growth, making it inappropriate for fail overs. In such cases, users may want to in fact shape their quorum writes based on the IO admission rates of the slowest follower replica. Release note: None
bcd1535
to
fb5bd27
Compare
bors r+ |
Build succeeded: |
We've been exercising the flow control machinery in the 'apply_to_all' mode for the last few months, to shake out latent bugs and surface performance regressions. Since flow control shapes quorum writes to the rate of IO admission by the slowest live replica, it's not necessarily the scheme we want by default for latency-sensitive foreground writes. It does however make sense for elastic writes, which by definition is not latency sensitive, and write shaping is ok.
With 'apply_to_elastic', regular writes are still subject to admission control. But it happens only on the leaseholder where the write originates. On nodes observing follower writes from regular work, we'll deduct IO tokens without waiting. Crucially, with 'apply_to_elastic', this deduction-without-waiting will not happen for elastic writes, which tend to be bulkier and by definition, lower priority than regular writes.
It might still make sense for users to opt into 'apply_to_all'. Either in this universal cluster setting form, or perhaps when exposed more selectively through zone configs, so it's done on a schema-by-schema basis. Consider setups with heterogenous regions where nodes in particular regions only contain follower replicas and also have lower IO admission rates (by virtue of there being fewer nodes in aggregate, or with hardware provisioned with lesser throughput). Without any write shaping, if the leader-only regions drive at higher throughput, the follower region will fall permanently behind, or will have uncontrolled LSM growth, making it inappropriate for fail overs. In such cases, users may want to in fact shape their quorum writes based on the IO admission rates of the slowest follower replica.
Part of #110036.
Release note: None