-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base,kvserver,server: configuration of provisioned bandwidth for a store #86063
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lacks any tests, since I want to get an opinion first on the approach.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
8745de3
to
2a3e63e
Compare
2a3e63e
to
09ad169
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look over the code too closely (will do once ready with tests), but at a high-level this LGTM. We should assume that most deployments are single store and use homogenous setups, so gear the release notes/help text to emphasize the use of the cluster setting and only use the per-store flag if there's non-homogeneity. Users with multi-store setups but homogeneity in provisioned bandwidth can still get away with using just the cluster setting. Other things I'd spell out in the help text is what happens if this limit is too high, or too low (you've already mentioned that this is optional, but worth spelling out the benefits of opting in). Since heterogeneity in provisioned bandwidth is rare (there are likely other problems in such deployments and I don't recall any incidents where I've observed such heterogeneity), and this is all optional, I'd also be completely ok ripping out the per-store flags and just have the cluster setting. It's simpler and less error prone. We can see how useful the per-store flag is over the next release once this is out in the wild.
type ProvisionedRateSpec struct { | ||
// DiskName is the name of the disk observed by the code in disk_counters.go | ||
// when retrieving stats for this store. | ||
DiskName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this DiskName field, I think. Looking at the top-level StoreSpec, we already have a Path to uniquely identify the disk device we're using. Let's continue using that; it's a required field unless using type=mem
, which this doesn't apply to anyway. I'm looking at this test for ex:
cockroach/pkg/base/store_spec_test.go
Line 69 in 2407d3f
{"/mnt/hda1", "", StoreSpec{Path: "/mnt/hda1"}}, |
This also makes the flag syntax easier (we can drop the name
argument).
pkg/base/store_spec.go
Outdated
@@ -273,6 +348,10 @@ var fractionRegex = regexp.MustCompile(`^([-]?([0-9]+\.[0-9]*|[0-9]*\.[0-9]+|[0- | |||
// - 20% -> 20% of the available space | |||
// - 0.2 -> 20% of the available space | |||
// - attrs=xxx:yyy:zzz A colon separated list of optional attributes. | |||
// - provisioned-rate=name=<disk-name>[:bandwidth=<bandwidth-bytes>] The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the name
thing as discussed above, the syntax this gives us is: cockroach start ... --store=<path>,provisioned-rate=:bandwidth=12500000
. Could this instead be cockroach start ... --store=<path>,bandwidth=119MB/s
. Similar to the allowable formats for size
, but with the (optional) /s
suffix. If later we need to learn about provisioned IOPs, we can introduce an iops=42/s
thing.
pkg/kv/kvserver/store.go
Outdated
|
||
// ProvisionedBandwidthForAdmissionControl set a value of the provisioned | ||
// bandwidth for each store in the cluster. | ||
var ProvisionedBandwidthForAdmissionControl = settings.RegisterIntSetting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use RegisterByteSizeSetting instead? Or better yet, add an RegisterByteRateSetting (shouldn't be difficult) to get to SET CLUSTER SETTING kv.store.admission.provisioned_bandwidth = '125 MiB/s;
. There are other cluster settings that should've had this but default to the ByteSize variant (server.consistency_check.max_rate
for ex.), so I'll defer to you if you want to improve the status quo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @sumeerbhola)
pkg/base/store_spec.go
line 168 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
We don't need this DiskName field, I think. Looking at the top-level StoreSpec, we already have a Path to uniquely identify the disk device we're using. Let's continue using that; it's a required field unless using
type=mem
, which this doesn't apply to anyway. I'm looking at this test for ex:cockroach/pkg/base/store_spec_test.go
Line 69 in 2407d3f
{"/mnt/hda1", "", StoreSpec{Path: "/mnt/hda1"}}, This also makes the flag syntax easier (we can drop the
name
argument).
I don't see a way to avoid the name mapping configuration. The stats that we get from disk_counters.go have names like nvme1n1, nvme2n1 (on EBS -- I have not looked at what we see when using PD), so there needs to be a way to map that name to a store. There can also be disks that have nothing to do with any of the stores, and those stats need to be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @sumeerbhola)
pkg/base/store_spec.go
line 168 at r1 (raw file):
Previously, sumeerbhola wrote…
I don't see a way to avoid the name mapping configuration. The stats that we get from disk_counters.go have names like nvme1n1, nvme2n1 (on EBS -- I have not looked at what we see when using PD), so there needs to be a way to map that name to a store. There can also be disks that have nothing to do with any of the stores, and those stats need to be ignored.
Oh, I missed that. Sorry. For a given filename path, the way to retrieve the corresponding device name (the same as seen by disk_counters.go), you can df -P <path>
. I don't know off hand what system calls that makes underneath, or whether there's an importable Go library to retrieve just that, but it should be a thing or we can write such a library.
Sources: https://stackoverflow.com/questions/13403866/how-to-get-a-device-partition-name-of-a-file. That refers to a st_dev
, which I think is https://go.dev/src/syscall/ztypes_linux_amd64.go#L102.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @sumeerbhola)
pkg/base/store_spec.go
line 168 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Oh, I missed that. Sorry. For a given filename path, the way to retrieve the corresponding device name (the same as seen by disk_counters.go), you can
df -P <path>
. I don't know off hand what system calls that makes underneath, or whether there's an importable Go library to retrieve just that, but it should be a thing or we can write such a library.Sources: https://stackoverflow.com/questions/13403866/how-to-get-a-device-partition-name-of-a-file. That refers to a
st_dev
, which I think is https://go.dev/src/syscall/ztypes_linux_amd64.go#L102.
I'm also ok just not doing this work in favor of the cluster setting. Asking users to specify the right name is a bit more work on their end, and more steps to get wrong, if it's something we can retrieve programatically given we have a file path to consult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @sumeerbhola)
pkg/base/store_spec.go
line 168 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I'm also ok just not doing this work in favor of the cluster setting. Asking users to specify the right name is a bit more work on their end, and more steps to get wrong, if it's something we can retrieve programatically given we have a file path to consult.
I don't see a good platform independent library for this. There is some discussion on https://groups.google.com/g/golang-nuts/c/mu8XMmRXMOk which is linux specific. There is also https://github.com/gyuho/linux-inspect, also linux specific (and I don't know whether we want such a dependency anyway).
The name in df does match what disk_counters.go is producing, though there are many things in the latter not in the former e.g. on EBS I see nvme0n1
and nvme0n1p1
, and on PD I see sda
, sda1
, sda14
, which don't appear in df (sdb is the device for /mnt/data1 on PD).
I can take a dependency on that library and make the name optional. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @sumeerbhola)
pkg/base/store_spec.go
line 168 at r1 (raw file):
Previously, sumeerbhola wrote…
I don't see a good platform independent library for this. There is some discussion on https://groups.google.com/g/golang-nuts/c/mu8XMmRXMOk which is linux specific. There is also https://github.com/gyuho/linux-inspect, also linux specific (and I don't know whether we want such a dependency anyway).
The name in df does match what disk_counters.go is producing, though there are many things in the latter not in the former e.g. on EBS I seenvme0n1
andnvme0n1p1
, and on PD I seesda
,sda1
,sda14
, which don't appear in df (sdb is the device for /mnt/data1 on PD).I can take a dependency on that library and make the name optional. Thoughts?
I would prefer just copying some code over instead of taking on a dependency (that library hasn't seen commits in five years so we might as well understand what we're copying over). Realistically we'd be maintaining this dependency, so might as well copy it into tree. We'd also have to make this dependency linux specific so provide fallback behaviour for unsupported platforms. Do you want to defer this work for now by only using the cluster setting? (I ask because I don't feel this is adding much given the headache.)
Yes, I would prefer not taking this on now. |
I'm saying something simpler, instead of introducing the diskname->storeID mapping at all, simply use the summed disk counters instead: cockroach/pkg/server/status/runtime.go Lines 626 to 633 in 7c38417
And perhaps disable this functionality for multi-store nodes until we do the legwork for diskname -> store ID accounting. Does that sound reasonable? This is all entirely opinion on my part. |
Multi-store nodes were one of the important production cases where disk bandwidth was scarce, so I definitely do not want to do that. How about we stick with this current configuration schema, and add a TODO that we want to eventually eliminate/reduce the need to provide the name. All the stuff in StoreSpec added here is optional, so this should not introduce configuration churn. |
SGTM. |
09ad169
to
87b59d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Added tests
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @irfansharif)
pkg/base/store_spec.go
line 168 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I would prefer just copying some code over instead of taking on a dependency (that library hasn't seen commits in five years so we might as well understand what we're copying over). Realistically we'd be maintaining this dependency, so might as well copy it into tree. We'd also have to make this dependency linux specific so provide fallback behaviour for unsupported platforms. Do you want to defer this work for now by only using the cluster setting? (I ask because I don't feel this is adding much given the headache.)
Based on our discussion, I have a added a long TODO comment here about eliminating the DiskName field and making ProvisionedRateSpec even more optional.
pkg/base/store_spec.go
line 351 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Ignoring the
name
thing as discussed above, the syntax this gives us is:cockroach start ... --store=<path>,provisioned-rate=:bandwidth=12500000
. Could this instead becockroach start ... --store=<path>,bandwidth=119MB/s
. Similar to the allowable formats forsize
, but with the (optional)/s
suffix. If later we need to learn about provisioned IOPs, we can introduce aniops=42/s
thing.
I've kept this nested inside provisioned-rate for now, since we do need the disk name and this is all for purposes of rate control by the system. IOPS fits in that naming scheme, which is why it isn't called provisioned-bandwidth.
pkg/kv/kvserver/store.go
line 3964 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Use RegisterByteSizeSetting instead? Or better yet, add an RegisterByteRateSetting (shouldn't be difficult) to get to
SET CLUSTER SETTING kv.store.admission.provisioned_bandwidth = '125 MiB/s;
. There are other cluster settings that should've had this but default to the ByteSize variant (server.consistency_check.max_rate
for ex.), so I'll defer to you if you want to improve the status quo.
Switched to RegisterByteSizeSetting
904e80d
to
c895c48
Compare
A previous PR cockroachdb#85722 added support for disk bandwidth as a bottlneck resource, in the admission control package. To utilize this, admission control needs to be provided the provisioned bandwidth and the observed read and write bytes. This PR adds configuration support for this via the StoreSpec (that uses the --store flag). The StoreSpec now has an optional ProvisionedRateSpec that contains the name of the disk corresponding to the store, and an optional provisioned bandwidth, that are specified as provisioned-rate=name=<disk-name>[:bandwidth=<bandwidth-bytes>]. The disk-name is used to map the DiskStats, retrieved via the existing code in status.GetDiskCounters to the correct Store. These DiskStats contain the read and write bytes. The optional bandwidth is used to override the provisioned bandwidth set via the new cluster setting kv.store.admission.provisioned_bandwidth. Fixes cockroachdb#82898 Release note (ops change): Disk bandwidth constraint can now be used to control admission of elastic writes. This requires configuration for each store, via the --store flag, that now contains an optional provisioned-rate field. The provisioned-rate field, if specified, needs to provide a disk-name for the store and optionally a disk bandwidth. If the disk bandwidth is not provided the cluster setting kv.store.admission.provisioned_bandwidth will be used. The cluster setting defaults to 0 (which means that the disk bandwidth constraint is disabled). If the effective disk bandwidth, i.e., using the possibly overridden cluster setting is 0, there is no disk bandwidth constraint. Additionally, the admission control cluster setting admission.disk_bandwidth_tokens.elastic.enabled (defaults to true) can be used to turn off enforcement even when all the other configuration has been setup. Turning off enforcement will still output all the relevant information about disk bandwidth usage, so can be used to observe part of the mechanism in action. To summarize, to enable this for a cluster with homogenous disk, provide a disk-name in the provisioned-rate field in the store-spec, and set the kv.store.admission.provisioned_bandwidth cluster setting to the bandwidth limit. To only get information about disk bandwidth usage by elastic traffic (currently via logs, not metrics), do the above but also set admission.disk_bandwidth_tokens.elastic.enabled to false. Release justification: Low risk, high benefit change that allows an operator to enable new functionality (disabled by default).
c895c48
to
833a031
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 11 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)
TFTR! |
bors r=irfansharif |
This PR was included in a batch that was canceled, it will be automatically retried |
Build failed (retrying...): |
Build succeeded: |
A previous PR #85722 added support for disk bandwidth as a bottlneck
resource, in the admission control package. To utilize this, admission
control needs to be provided the provisioned bandwidth and the observed
read and write bytes. This PR adds configuration support for this via
the StoreSpec (that uses the --store flag). The StoreSpec now has an
optional ProvisionedRateSpec that contains the name of the disk
corresponding to the store, and an optional provisioned bandwidth,
that are specified as
provisioned-rate=name=[:bandwidth=].
The disk-name is used to map the DiskStats, retrieved via the existing
code in status.GetDiskCounters to the correct Store. These DiskStats
contain the read and write bytes. The optional bandwidth is used to
override the provisioned bandwidth set via the new cluster setting
kv.store.admission.provisioned_bandwidth.
Fixes #82898
Release note (ops change): Disk bandwidth constraint can now be
used to control admission of elastic writes. This requires configuration
for each store, via the --store flag, that now contains an optional
provisioned-rate field. The provisioned-rate field, if specified,
needs to provide a disk-name for the store and optionally a disk
bandwidth. If the disk bandwidth is not provided the cluster setting
kv.store.admission.provisioned_bandwidth will be used. The cluster
setting defaults to 0 (which means that the disk bandwidth constraint
is disabled). If the effective disk bandwidth, i.e., using the possibly
overridden cluster setting is 0, there is no disk bandwidth constraint.
Additionally, the admission control cluster setting
admission.disk_bandwidth_tokens.elastic.enabled (defaults to true)
can be used to turn off enforcement even when all the other
configuration has been setup. Turning off enforcement will still
output all the relevant information about disk bandwidth usage, so
can be used to observe part of the mechanism in action.
To summarize, to enable this for a cluster with homogenous disk,
provide a disk-name in the provisioned-rate field in the store-spec,
and set the kv.store.admission.provisioned_bandwidth cluster setting
to the bandwidth limit. To only get information about disk bandwidth
usage by elastic traffic (currently via logs, not metrics), do the
above but also set admission.disk_bandwidth_tokens.elastic.enabled
to false.
Release justification: Low risk, high benefit change that allows
an operator to enable new functionality (disabled by default).