-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: disable {split,replicate,mvccGC} queues until... #98422
kvserver: disable {split,replicate,mvccGC} queues until... #98422
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only happened for a short window after nodes with existing replicas were restarted
It seems as though it could possibly occur if the range feed couldn't be established as well? Is that unlikely to occur in practice?
Overall . Some minor nits. Looks like the patch needs crlfmt, update setting names and implement *mockSpanConfigSubscriber.LastUpdated
to pass CI.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @irfansharif)
-- commits
line 16 at r1:
Could you explain this further? If there are no split keys being computed how could there be a burst of splitting post node restart?
Code quote:
- For the split queue, we wouldn't actually compute any split keys if
unsubscribed, so the missing check was somewhat benign. But we would
be obtusely applying the default range sizes [128MiB,512MiB], so for
clusters configured with larger range sizes, this could lead to a
burst of splitting post node-restart.
-- commits
line 27 at r1:
nit: Generally it affected constraints, not just voter.
cockroach/pkg/kv/kvserver/store_rebalancer.go
Lines 1097 to 1109 in 1a58583
for i := 0; i < len(finalNonVoterTargets); i++ { | |
add, remove, _, shouldRebalance := sr.allocator.RebalanceTarget( | |
ctx, | |
sr.storePool, | |
rbCtx.conf, | |
rbCtx.candidateReplica.RaftStatus(), | |
finalVoterTargets, | |
finalNonVoterTargets, | |
rbCtx.candidateReplica.RangeUsageInfo(), | |
storepool.StoreFilterSuspect, | |
allocatorimpl.NonVoterTarget, | |
options, | |
) |
pkg/kv/kvserver/store_rebalancer.go
line 158 at r1 (raw file):
storePool = rq.store.cfg.StorePool }
nit: empty line added.
pkg/kv/kvserver/kvserverbase/base.go
line 43 at r1 (raw file):
var ReplicateQueueEnabled = settings.RegisterBoolSetting( settings.SystemOnly, "kv.replicate.queue_enabled",
nit: change these to be kv.xxxx.queue.enabled
3e341fd
to
aa64776
Compare
...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in cockroachdb#78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation. - For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in cockroachdb#98385. It was possible then for us to ignore configured voter/non-voter/lease constraints. - For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart. - For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data. - For the store rebalancer, ignoring span configs could result in violating lease preferences and voter/non-voter constraints. Fixes cockroachdb#98421. Fixes cockroachdb#98385. While here, we also introduce the following non-public cluster settings to selectively enable/disable KV queues: - kv.mvcc_gc_queue.enabled - kv.split_queue.enabled - kv.replicate_queue.enabled Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This could only happen for a short window after nodes with existing replicas were restarted (measured in seconds), and self-rectified (also within seconds). This manifested in a few ways: - If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication. - If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas. - If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations". - If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage. - If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.
aa64776
to
234bcd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only mentioning the node restart point since it's a release note -- rangefeeds are an internal mechanism. But yes, it can also happen if rangefeeds reliably fail to get established, say when there's loss of quorum. But these are centralized system database ranges, and if you've lost quorum there, things are pretty bad. I don't think it's worth mentioning, and I think it's rare given our routine 5x replication.
Repurposed queueConfig.needsSystemConfig
to capture this needs-subscription-logic; will wait for CI to get green.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @kvoli)
Previously, kvoli (Austen) wrote…
Could you explain this further? If there are no split keys being computed how could there be a burst of splitting post node restart?
The burst of splitting would be due to the larger configured range sizes, not due to computed split keys (which BTW, are only used to figure out schema boundaries or under coalesced ranges, the points where adjacent schemas have different span configs). So if the restarted store has 4GiB ranges, post-restart we might try to slice them back down to 512MiB ranges.
bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! I see this is already in the bors queue, so no need to stop the train for me, but please do address comments in a followup. Specifically, the one around expanding our testing.
Reviewed 19 of 19 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @kvoli)
pkg/kv/kvserver/mvcc_gc_queue.go
line 686 at r2 (raw file):
) (processed bool, err error) { if !mgcq.enabled() { log.VEventf(ctx, 2, "skipping mvcc gc: queue has been disabled")
nit: Consider logging louder, but doing so log.Every(10 seconds)
.
pkg/kv/kvserver/store.go
line 2066 at r2 (raw file):
} // GetConfReader exposes access to a configuration reader.
Please add a sentence about returning errSpanConfigUnavailable
up here as well.
pkg/kv/kvserver/store.go
line 2090 at r2 (raw file):
// surface explicitly that we don't have any span configs instead of // falling back to the statically configured one. // - enabling range merges would be extremely dangerous -- we could
Instead of having these words here, they're better placed around where you're setting needsSpanConfigs
for the respective queues.
pkg/spanconfig/spanconfigkvsubscriber/kvsubscriber_client_test.go
line 32 at r2 (raw file):
// TestBlockedKVSubscriberDisablesMerges ensures that the merge queue is // disabled until the KVSubscriber has some snapshot. func TestBlockedKVSubscriberDisablesMerges(t *testing.T) {
Let's expand this test to cover the other queues we've identified in this commit as well.
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 234bcd0 to blathers/backport-release-22.2-98422: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do address comments in a followup. Specifically, the one around expanding our testing.
See #98807.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
Regression test for cockroachdb#98422 which disables {split,replicate,mvccGC} queues until subscribed to span configs. We're only slightly modifying the existing merge-queue specific TestBlockedKVSubscriberDisablesMerges. Release note: None
…99123 98712: storage: expose separate WriteBatch interface r=sumeerbhola a=jbowens Adapt NewUnindexedBatch to no longer take a writeOnly parameter. Instead, a new NewWriteBatch method is exposed that returns a WriteBatch type that does not provide Reader facilities. Future work may remove UnindexedBatch altogether, updating callers to explicitly maintain separate Readers and WriteBatches. Epic: None Release note: None 98807: spanconfig: add TestBlockedKVSubscriberDisablesQueues r=irfansharif a=irfansharif Regression test for #98422 which disables {split,replicate,mvccGC} queues until subscribed to span configs. We're only slightly modifying the existing merge-queue specific TestBlockedKVSubscriberDisablesMerges. Release note: None 98823: sql: address a couple of old TODOs r=yuzefovich a=yuzefovich This commit addresses a couple of old TODOs in the `connExecutor.dispatchToExecutionEngine` method. First, it simply removes the now-stale TODO about needing to `Step()` the txn before executing the query. The TODO mentioned things about the old way FK checks were performed, but we now always run checks in a deferred fashion, and perform the `Step`s correctly there for cascades and checks, thus, the TODO can be removed. (The TODO itself explicitly mentions that it can be removed once we do checks and cascades in the deferred fashion.) Second, it addresses a TODO about how some plan information is saved. Up until 961e66f the cleanup of `planNode` tree and `flow` infra was intertwined, so in 6ae4881 in order to "sample" the plan with correct info (some of which is only available _after_ the execution is done) we delegated that sampling to `planTop.close`. However, with `planNode` tree and `flow` cleanups refactored and de-coupled, we no longer have to close the plan early on. As a result, we now close the plan in a defer right after we create it (now it's the only place we do so), and this commit makes it so that we record the plan info explicitly right after returning from the execution engine, thus, addressing the second TODO. Epic: None Release note: None 98896: tracing: fix WithEventListeners option with verbose parent r=yuzefovich a=yuzefovich Previously, when using `WithEventListeners` span option and not explicitly specifying `WithRecording` option, we could incorrectly use the structured recording. In particular, this was the case when the parent has verbose recording, and this is now fixed. At the moment, this span option is used only by the backup and restore processors, so the impact would be that their verbose recording wouldn't be included into the session recording (perhaps we don't actually expose that anyway). Epic: None Release note: None 99079: sql: fix hint for unknown udfs in views and funcs r=rharding6373 a=rharding6373 Fixes a hint for some instances of an "unknown function" error that was not properly applied previously. Informs: #99002 Epic: None Release note: None 99095: ui: fix inflight request replacement, cleanup database details api r=THardy98 a=THardy98 Changes to allow in-flight request replacement in this PR: #98331 caused breaking changes to `KeyedCachedDataReducer`s. Despite the added `allowReplacementRequests` flag being `false`, in-flight requests would be replaced (specifically, when fetching for initial state). This would prevent all requests but the last one from being stored in the reducer state and leave the remaining requests in a permanent `inFlight` status. Consequently, our pages would break as the permanent `inFlight` status on these requests would prevent us from ever fetching data, putting these pages in a permanent loading state. This PR adds checks to ensure `allowReplacementRequests` is enabled when we receive a response, before deciding whether to omit it from the reducer state. It's important to note that this is still not an ideal solution for `KeyedCachedDataReducer`s, should we ever want to replace inflight requests for one. The checks to replace an in-flight request still occur at the reducer-level, whereas in a `KeyedCachedDataReducer` we'd like them to occur at the state-level (i.e. instead of replacement checks for all requests across the reducer, we should check for requests that impact a specific key's state). This way we scope replacements to each key in the reducer, allowing us to fetch data for each key independently (which is the intended functionality). Enabling this for a `KeyedCachedDataReducer` without making this change I believe will cause the breaking behaviour above. This PR also includes some cleanup to the database details API, namely: - camel casing response fields - encapsulating the database span stats response into its own object (instead of across multiple objects), this helps scope any errors deriving from the query to the single stats object **BEFORE** https://www.loom.com/share/57c9f278376a4551977398316d6ed7b6 **AFTER** https://www.loom.com/share/b2d4e074afca4b42a4f711171de3fd72 Release note (bug fix): Fixed replacement of in-flight requests for `KeyedCachedDataReducer`s to prevent permanent loading on requests stuck on an `inFlight` status. 99097: upgrades: fix backfilling of special roles in user ID migrations r=rafiss a=andyyang890 upgrades: fix system.privileges user ID migration This patch adds a missing case for the public role in the backfilling portion of the system.privileges user ID migration. Release note: None --- upgrades: fix system.database_role_settings user ID migration This patch adds a missing case for the empty role in the backfilling portion of the system.database_role_settings user ID migration. Release note: None --- Part of #87079 99113: server: use no-op cost controller for shared-process tenants r=knz a=stevendanna In the long run, all rate limiting of tenants should be controlled by a tenant capability. However, at the moment, we do not have the infrastructure to read tenant capabilities from the tenant process. Since, the main user of shared-process tenants is the new, experimental unified-architecture were we do not anticipate needing the cost controller for most users, this PR injects a no-op cost controller for shared-process tenants. Epic: CRDB-23559 Release note: None 99120: tracing: allow collection of stack history for Structured spans as well r=dt a=adityamaru Previously, we mandated that the span must be Verbose to be able to collect and emit a Structured event containing stack history. This change permits Structured spans to also collect stack history. Release note: None Epic: none 99123: sql: skip flaky TestSQLStatsDataDriven r=ericharmeling a=ericharmeling This commit skips TestSQLStatsDataDriven. This test is flaky. Part of #89861. Epic: none Release note: None Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: rharding6373 <[email protected]> Co-authored-by: Thomas Hardy <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Eric Harmeling <[email protected]>
...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in #78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation.
Fixes #98421.
Fixes #98385.
Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This only happened for a short window after nodes with existing replicas were restarted, and self-rectified within seconds. This manifested in a few ways: