Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: use WithExplicitX for snapshot rate limits #75728

Open
tbg opened this issue Jan 31, 2022 · 2 comments
Open

kvserver: use WithExplicitX for snapshot rate limits #75728

tbg opened this issue Jan 31, 2022 · 2 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Jan 31, 2022

Is your feature request related to a problem? Please describe.

A customer (internal) accidentally set the snapshot rate limits to 8, which is 8 bytes per second. They intended to set it to 8mb/s.

Describe the solution you'd like

Use the equivalent of

// RegisterPublicDurationSettingWithExplicitUnit defines a new
// public setting with type duration which requires an explicit unit when being
// set.
func RegisterPublicDurationSettingWithExplicitUnit(
class Class, key, desc string, defaultValue time.Duration, validateFn func(time.Duration) error,
) *DurationSettingWithExplicitUnit {

for all of our cluster settings for which it makes sense.

Note that this isn't a backwards-compatible change, but we should make it regardless.

Jira issue: CRDB-12806

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs labels Jan 31, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jan 31, 2022
@rupertharwood-crl rupertharwood-crl added the O-postmortem Originated from a Postmortem action item. label Feb 3, 2022
tbg added a commit to tbg/cockroach that referenced this issue Feb 7, 2022
We've seen recent problems (cockroachdb#75728) related to mis-configured cluster
settings. Preemptively tighten the range of allowable values for the
`kv.replica_circuit_breaker.slow_replication_threshold` setting.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 7, 2022
Related to cockroachdb#75728.

This commit enforces a minimum Raft snapshot rate limit of 1mb/s. We've seen
recent problems (cockroachdb#75728) related to mis-configured cluster settings, so it's
best to protect ourselves from this kind of misconfiguration.

```
[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8 B

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Kib';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8.0 KiB

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Mib';
SET CLUSTER SETTING
```
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 8, 2022
Related to cockroachdb#75728.

This commit enforces a minimum Raft snapshot rate limit of 1mb/s. We've seen
recent problems (cockroachdb#75728) related to mis-configured cluster settings, so it's
best to protect ourselves from this kind of misconfiguration.

```
[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8 B

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Kib';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8.0 KiB

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Mib';
SET CLUSTER SETTING
```
craig bot pushed a commit that referenced this issue Feb 8, 2022
75889: server: use constant for filtering internal stmt stats r=xinhaoz a=xinhaoz

Previously, the internal app name prefix was hard-coded
into the query used to fetch combined stmt/txn stats.
This commit replaces that usage with the pre-existing
internal app prefix defined in constants.

Release note: None

76209: kv: enforce minimum Raft snapshot rate limit r=nvanbenschoten a=nvanbenschoten

Related to #75728.

This commit enforces a minimum Raft snapshot rate limit of 1mb/s. We've seen
recent problems (#75728) related to mis-configured cluster settings, so it's
best to protect ourselves from this kind of misconfiguration.

```
[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8 B

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Kib';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8.0 KiB

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Mib';
SET CLUSTER SETTING
```

76236: tlp: respect test shutdown deadline r=jordanlewis a=jordanlewis

Closes #74675

This commit fixes the TLP test shutdown routine to actually begin
shutting down 5 minutes before the test timeout. Before, the code
that was supposed to check the timeout was not reached reliably.

Release note: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Jordan Lewis <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 8, 2022
Related to cockroachdb#75728.

This commit enforces a minimum Raft snapshot rate limit of 1mb/s. We've seen
recent problems (cockroachdb#75728) related to mis-configured cluster settings, so it's
best to protect ourselves from this kind of misconfiguration.

```
[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8 B

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Kib';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8.0 KiB

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Mib';
SET CLUSTER SETTING
```
craig bot pushed a commit that referenced this issue Feb 11, 2022
76144: kvserver: enforce min duration for slow_replication_threshold r=erikgrinaker a=tbg

We've seen recent problems (#75728) related to mis-configured cluster
settings. Preemptively tighten the range of allowable values for the
`kv.replica_circuit_breaker.slow_replication_threshold` setting.

Touches #33007.

Release note: None


Co-authored-by: Tobias Grieger <[email protected]>
RajivTS pushed a commit to RajivTS/cockroach that referenced this issue Mar 6, 2022
Related to cockroachdb#75728.

This commit enforces a minimum Raft snapshot rate limit of 1mb/s. We've seen
recent problems (cockroachdb#75728) related to mis-configured cluster settings, so it's
best to protect ourselves from this kind of misconfiguration.

```
[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8 B

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Kib';
ERROR: invalid value for kv.snapshot_rebalance.max_rate: snapshot rate cannot be set to a value below 1.0 MiB: 8.0 KiB

[email protected]:26257/movr> set cluster setting kv.snapshot_rebalance.max_rate = '8Mib';
SET CLUSTER SETTING
```
RajivTS pushed a commit to RajivTS/cockroach that referenced this issue Mar 6, 2022
We've seen recent problems (cockroachdb#75728) related to mis-configured cluster
settings. Preemptively tighten the range of allowable values for the
`kv.replica_circuit_breaker.slow_replication_threshold` setting.

Release note: None
@tbg
Copy link
Member Author

tbg commented Mar 21, 2022

This is somewhat awkward: for each setting type that we want this for, we need to implement a new type and special case it here:

case *settings.DurationSettingWithExplicitUnit:
requiredType = types.Interval
// Ensure that the expression contains a unit (i.e can't be a float)
_, err := p.analyzeExpr(
ctx, expr, nil, dummyHelper, types.Float, false, "SET CLUSTER SETTING "+name,
)

I was hoping we could "just" have an .WithExplicitUnit() chainer but it's not as easy.

Since we already set min values for these rate limits, I am not convinced we should be doing more this time around.

It does seem desirable to have better settings infra though, where units can be requested for any setting without doing plumbing in the SQL layer.

cc @mwang1026 not sure where to route this ask.

@mwang1026
Copy link

I think given Nathan's min setting change we're OK to close this for now. re: cluster setting infra let's take that offline

@mwang1026 mwang1026 added the E-quick-win Likely to be a quick win for someone experienced. label May 17, 2022
@nvanbenschoten nvanbenschoten removed O-postmortem Originated from a Postmortem action item. E-quick-win Likely to be a quick win for someone experienced. labels Mar 13, 2023
@nvanbenschoten nvanbenschoten added the P-3 Issues/test failures with no fix SLA label Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-kv KV Team
Projects
None yet
Development

No branches or pull requests

4 participants