-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rac2: admitted vector handling #129508
Labels
A-replication-admission-control-v2
Related to introduction of replication AC v2
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
sumeerbhola
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Aug 22, 2024
sumeerbhola
added
the
A-replication-admission-control-v2
Related to introduction of replication AC v2
label
Aug 23, 2024
26 tasks
craig bot
pushed a commit
that referenced
this issue
Sep 3, 2024
129751: rac2: introduce log storage tracker r=sumeerbhola a=pav-kv This PR introduces the data structure that observes all raft log storage writes, syncs and logical admissions, and tracks the stable/admitted log state correspondingly. Log writes are ordered by leader term, then by index. Log writes have no gaps, except when a snapshot clears the log. All writes/syncs must be registered. Admissions can be registered selectively, for entries that are subject to RACv2. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Sep 3, 2024
The AdmittedTracker provides the latest AdmittedVector, and will be implemented by processorImpl. Informs cockroachdb#129508 Epic: CRDB-37515 Release note: None
This was referenced Sep 4, 2024
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Sep 4, 2024
The AdmittedTracker provides the latest AdmittedVector, and will be implemented by processorImpl. Informs cockroachdb#129508 Epic: CRDB-37515 Release note: None
craig bot
pushed a commit
that referenced
this issue
Sep 4, 2024
130076: replica_rac2: use LogMark to convey semantics r=sumeerbhola a=pav-kv Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Sep 4, 2024
The AdmittedTracker provides the latest AdmittedVector, and will be implemented by processorImpl. Informs cockroachdb#129508 Epic: CRDB-37515 Release note: None
craig bot
pushed a commit
that referenced
this issue
Sep 4, 2024
130074: replica_rac2: add admitted state protocol r=sumeerbhola a=pav-kv This PR introduces `AdmittedState` and `PiggybackedAdmittedState` protos, and converts replica- and leader-side code (such as `AdmittedPiggybacker` and raft transport) to use them. The protos are not populated at the moment, and the leader-side code is a bunch of TODOs. More conversion code will follow, to replace the `AdmittedResponseForRange` flows. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 4, 2024
129758: parser,mt: remove CREATE TENANT ... LIKE r=dt a=dt The problem -- needing to apply a large number of configuration knobs -- solved by the 'template tenant' functionality has since been solved instead by having named service modes for each distinct common set of configurations, for example 'shared' which implies all capabilities in addition to in-process service. The template functionality was never pushed on PCR UA users, and is not used in cloud, and thus has no users and is accordingly removed here. Release note: none. Epic: none. 130054: rac2: add AdmittedTracker interface for use by RangeController r=pav-kv,kvoli a=sumeerbhola The AdmittedTracker provides the latest AdmittedVector, and will be implemented by processorImpl. Informs #129508 Epic: CRDB-37515 Release note: None 130096: release: update_versions: continue, even if one repo attempt fails r=celiala a=celiala Follow-up to #130063: after merging 130063, update_versions failed to complete[1], due to encountering another / different error. This commit adjusts update_versions so that it continues & attempts to create PRs for all the remaining repos, despite encountering errors along the way. [1] Latest "Create version update PRs" job: https://teamcity.cockroachdb.com/buildConfiguration/Internal_Cockroach_Release_Publish_CreateVersionUpdatePRs/16752534?buildTab=log&focusLine=3552&logView=flowAware&expandAll=true Release note: None Epic: None Release justification: release-infra only change. 130111: execinfrapb: move MockDistSQLServer to flowinfra r=dt a=dt pb packages are intended to be leaf packages with few or no dependencies. The mock server in the test utils was pulling in dependencies on packages that encode business logic, including logic that wants to depend on pb packages such as pkg/rpc's auth logic that depends on request types. Release note: none. Epic: none. Co-authored-by: David Taylor <[email protected]> Co-authored-by: sumeerbhola <[email protected]> Co-authored-by: Celia La <[email protected]>
This was referenced Sep 5, 2024
craig bot
pushed a commit
that referenced
this issue
Sep 6, 2024
130200: rac2: indicate admitted updates from LogTracker r=sumeerbhola a=pav-kv This PR makes `LogTracker` return a bit indicating when the `Admitted` vector has changed. The admitted vector is considered "changed" when its leader term goes up (and the admitted indices either stay intact or regress), or the term stays intact but the admitted index at any priority goes up. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 9, 2024
129727: kvserver,rac2: initialize RaftNode early r=sumeerbhola a=pav-kv This PR moves the `RaftNode` interface initialization to the point when `raft.RawNode` is created. This will be needed for correct initialization of `LogTracker` from the initial raft state. The `RaftNode` interface is moved to `replica_rac2` integration package, to reduce cluttering of `kvserver`. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 10, 2024
130171: replica_rac2: integrate LogTracker r=sumeerbhola a=pav-kv This PR integrates the `LogTracker` into RACv2 `Processor`. The design sketch is below. - `LogTracker` is created when `raft.RawNode` is initialized. It reads the initial stable state from `RawNode.LogMark()`, and initializes the stable and admitted indices to match this mark. - `LogTracker` observes all log storage appends in `handleRaftReady`, and all log and snapshot syncs in `OnLogSync` and `OnSnapSync` handlers. This guarantees that the stable mark in `LogTracker` is always accurate. - `LogTracker` observes all entries subject to admission control, and their corresponding admissions. This allows updating the admitted vector accurately. The admitted vector is sent to the leader from two places: - In `sendRaftMessage`, any successful `MsgAppResp` message is intercepted, and the corresponding `RaftMessageRequest` is annotated with the admitted vector if it is in the coordinate system of the receiver, i.e. has the same leader term. This flow supports the fast path when logical admission happens without delays: by the time the `MsgAppResp` is sent, the entries are already admitted, and the admitted vector has advanced. - In `handleRaftReady`, the admitted vector is sent to the `Piggybacker`, which then attaches them to any `RaftMessageRequestBatch` going to the same node as the receiver replica. This serves cases when admission is lagging the log syncs, and there might be no `MsgAppResp` to attach the vector to. Such admissions are batched into one `Ready` cycle for efficiency reasons. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
This was referenced Sep 10, 2024
craig bot
pushed a commit
that referenced
this issue
Sep 12, 2024
130427: replica_rac2: integrate admitted tracking on leader r=sumeerbhola,kvoli a=pav-kv This PR plumbs the admitted vectors into the RACv2 `Processor`. Admitted vectors are applied to `replicaSendStream` and cause releasing the tokens held by the leader. The admitted vectors are plumbed to `replicaSendStream` via 3 paths: - The leader's own admitted vector is applied from `HandleRaftReadyRaftMuLocked`, calling into `RangeController.AdmitRaftMuLocked` directly. - The followers' admitted vectors in most cases are received via annotated `RaftMessageRequest.AdmittedState`, which is dispatched from `stepRaftGroupRaftMuLocked` into the `Processor` via `Processor.AdmitRaftMuLocked` method. - The followers' piggybacked admitted vectors from `RaftMessageRequestBatch` are queued on the `Processor` via `EnqueuePiggybackedAdmittedAtLeader` method, and later applied from `HandleRaftReadyRaftMuLocked`. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
This was referenced Sep 12, 2024
craig bot
pushed a commit
that referenced
this issue
Sep 12, 2024
130459: replica_rac2: remove processorImpl mutex r=sumeerbhola a=pav-kv This PR removes the `processorImpl` mutex. Almost all `processorImpl` fields are used under `raftMu` (only by methods containing `RaftMuLocked` infix). Exceptions: - `logTracker`. It has an internal mutex for interaction with log syncs/admissions out of band. - `leader.rc` has `rcReferenceUpdateMu` for high-contention interaction with out-of-band `AdmitForEval`. - `leader.pendingAdmittedMu` has a narrow mutex for quick interaction with appends from `RaftTransport`. - `enabledWhenLeader` is an atomic and doesn't need a mutex. Related to #129508 130590: util: fix race in cidr startup r=jaylim-crl a=andrewbaptist Previously if the `server.cidr_mapping_url` was set and a node restarted, there was a race condition where `SetOnChange` for the setting could be called before the `Start` was called. This could result in it blocking while attempting to submit to the channel. Fixes: #130589 Release note: None Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Andrew Baptist <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 12, 2024
130522: changefeedccl: add roachtesting for network metrics r=rharding6373 a=asg0451 Add a roachtest helper that checks that the new cdc network metrics are being emitted. For now it simply checks that they are > 0. Part of: #130097 Release note: None 130585: replica_rac2: rm Ready scheduling on piggybacked r=sumeerbhola a=pav-kv Since the piggybacked admitted vectors are applied immediately from within the method that clears the queued updates, we no longer need to schedule a Ready cycle. Previously the token release would happen subsequently in Ready handler. Related to #129508 Co-authored-by: Miles Frankel <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 13, 2024
…130695 130178: roachtest: update multitenant-multiregion to use roachprod API r=herkolategan,fqazi a=renatolabs This updates the `multitenant-multiregion` test to use the "official" multitenant roachprod API instead of the deprecated functions in `multitenant_utils.go`. Fixes: #124029 Release note: None 130461: replica_rac2: clean up admitted vector sending r=sumeerbhola a=pav-kv This PR makes `processorImpl` track the raft node `term` unconditionally. It then factors out the admitted vector sending into a well-documented function, and uses the new `term` field to filter out no-op admitted vectors that don't match the leader's term. Part of #129508 130667: opt: remove volatile histogram benchmarks r=mgartner a=mgartner Benchmarks for `(*Histogram).ValuesCount`, `(*Histogram).MaxFrequency`, and `(*Histogram).ApplySelectivity` have been removed. These methods are compiled into very few instructions and are very fast. This makes them susceptible to "phantom regressions" where unlucky interactions with the CPU cache or other resources can degrade performance between commits even when the methods compile to the same set of instructions. See https://cockroachlabs.slack.com/archives/C4X2J0RH6/p1725978728168389 for more details. Epic: None Release note: None 130670: schemachanger: block zone config changes if schema_locked is set r=rafiss a=annrpom Fixes: #130668 Release note (bug fix): Fixed a bug where zone configuration changes issued by the declarative schema changer were not blocked if a table had `schema_locked` set. For more information about the declarative schema changer, see: https://www.cockroachlabs.com/docs/stable/online-schema-changes#declarative-schema-changer/. 130684: rac2: return tracked deductions on close/removal r=sumeerbhola a=kvoli When a replica was removed from a range, or the range controller closed, any tracked deductions did not return tokens to the respective stream token counts. Update `CloseRaftMuLocked` and `SetReplicasRaftMuLocked` to close any replica send streams for replicas which are no longer part of the replica set, closing all when the range controller closes. Also immediately close the send stream when a replica transitions to StateSnapshot. Resolves: #130683 Release note: None 130685: cluster-ui: copy some base components from cloud r=xinhaoz a=xinhaoz This commit ports over some basic components from cloud. They are placed in a new directory, `sharedFromCloud` and will be deleted once the new package is ready. This is simply a temporary solution. Components ported over, with their dependencies: - Search - PageHeader - PageCount (for displaying table pagination) Epic: CRDB-37558 Release note: None 130692: tablemetadatacache: use store ids from response when updating cache r=xinhaoz a=xinhaoz Now that #129060 is done we can use the StoreIDs from the span stats response when updating systme.table_metadata. Epic: CRDB-37558 Release note: None 130694: go.mod: bump Pebble to d5fd9927fa4d r=sumeerbhola a=jbowens Changes: * [`d5fd9927`](cockroachdb/pebble@d5fd9927) sstable: introduce RawColumnWriter * [`0665a3e1`](cockroachdb/pebble@0665a3e1) .github: use upload-artifact@v4 * [`e0afd551`](cockroachdb/pebble@e0afd551) base: consolidate internal key parsing functions * [`117aa7d9`](cockroachdb/pebble@117aa7d9) colblk: allow finishing data, index blocks without last row * [`2cc212bb`](cockroachdb/pebble@2cc212bb) binfmt: add relative offsets * [`7de517e6`](cockroachdb/pebble@7de517e6) colblk: allow UintBuilder.Get into sparse, default rows * [`065a044c`](cockroachdb/pebble@065a044c) colblk: fix up zero-row structures * [`934a2235`](cockroachdb/pebble@934a2235) colblk: expose block reader Describe methods * [`376d455b`](cockroachdb/pebble@376d455b) sstable: adapt EstimateDiskUsage to be agnostic to index block impl * [`da5055ad`](cockroachdb/pebble@da5055ad) sstable: remove RawWriter.IsStrictObsolete * [`fc6eab06`](cockroachdb/pebble@fc6eab06) manual: add cgo memory accounting * [`c8c3806e`](cockroachdb/pebble@c8c3806e) sstable/block: fix compression roundtrip test * [`b3a81bd6`](cockroachdb/pebble@b3a81bd6) sstable: refactor sstable layout facilities, add decodeLayout * [`729b6f72`](cockroachdb/pebble@729b6f72) treesteps: add depth limits * [`90953236`](cockroachdb/pebble@90953236) sstable: remove unnecessary loop while identifying filter block * [`1c5e99fc`](cockroachdb/pebble@1c5e99fc) update crlib and use crstrings.Lines in a few tests * [`0c4e2c6c`](cockroachdb/pebble@0c4e2c6c) treesteps: infrastructure for visualization of operations on trees * [`d3ef91b8`](cockroachdb/pebble@d3ef91b8) sstable: return interface type from NewRawWriter * [`a0f6b9ae`](cockroachdb/pebble@a0f6b9ae) cache: use a specific type for the cache ID * [`19e3a699`](cockroachdb/pebble@19e3a699) cache: remove entry.refcnt Release note: none. Epic: none. 130695: schemachanger: refactor deprecated call in mustRetrieveIndexNameElem r=fqazi a=annrpom This patch refactors `mustRetrieveIndexNameElem` to be more aligned with how we get elems in the DSC today; we also move this in the `helpers.go` file as zone config partitions code uses it to support `ALTER PARTITION ... OF TABLE`. Informs: #129889 Release note: None Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Annie Pompa <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
This was referenced Sep 16, 2024
craig bot
pushed a commit
that referenced
this issue
Sep 16, 2024
130688: replica_rac2: move raft write registering r=sumeerbhola a=pav-kv This commit moves registering all raft log/snapshot writes for RACv2 into `HandleRaftReadyRaftMuLocked`, so that `maybeSendAdmittedRaftMuLocked` call always happens after registering the latest write. This way, the admitted vector in `logTracker` carries the latest leader term we have observed from raft, and `maybeSendAdmittedRaftMuLocked` won't skip it when the term changes. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 17, 2024
130854: logstore: fix bug in getting LogMark from message r=kvoli a=pav-kv The `Entries` field is not used in `MsgStorageAppendResp`. Instead, a non-zero `Index` field indicates that there has been at least one entry in the corresponding `MsgStorageAppend` message. Related to #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
angles-n-daemons
pushed a commit
to angles-n-daemons/cockroach
that referenced
this issue
Sep 17, 2024
130688: replica_rac2: move raft write registering r=sumeerbhola a=pav-kv This commit moves registering all raft log/snapshot writes for RACv2 into `HandleRaftReadyRaftMuLocked`, so that `maybeSendAdmittedRaftMuLocked` call always happens after registering the latest write. This way, the admitted vector in `logTracker` carries the latest leader term we have observed from raft, and `maybeSendAdmittedRaftMuLocked` won't skip it when the term changes. Part of cockroachdb#129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 19, 2024
130382: replica_rac2: rm unused waitingForAdmissionState r=sumeerbhola a=pav-kv Also copy the comment explaining why we chose to track individual log indices rather then compressing them into a single "waiting" interval. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Sep 23, 2024
130619: kvserver: enable admitted vector protocol r=sumeerbhola,kvoli a=pav-kv This commit enables `AdmittedState` annotations in `RaftMessageRequest` and `RaftMessageRequestBatch` protocol. The leader starts receiving admitted vector updates from all replicas. Part of #129508 131172: cloud/amazon: test kms against mock server r=dt a=dt Release note: none. Epic: none. Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: David Taylor <[email protected]>
Only need to merge #130444 after some fixups. Will do today/tomorrow. |
craig bot
pushed a commit
that referenced
this issue
Sep 25, 2024
130444: rac2: add MsgApp pings to tickle admitted vectors r=sumeerbhola a=pav-kv This commit adds another condition for sending `MsgApp` pings to peers, on top of existing `MsgApp` pings that raft leader sends if its knowledge about the follower's state (stable log index and commit index) hasn't converged to its own. The additional condition is that the leader holds some RACv2 eval/flow tokens, meaning that the follower's storage hasn't admitted all the pending writes. Our knowledge may be lagging due to various reasons, e.g. the messages are dropped. We want to learn about admissions in a timely manner, so we send `MsgApp` pings that trigger `MsgAppResp` flow, which delivers the follower's current state to the leader. Eventually, the leader learns the follower's state and quiesces the pings. We integrate with `MsgApp` pings for a couple of reasons: - simplicity - we don't want to send excessive amount of pings. If raft sends pings already (because its knowledge is lagging), RACv2 pings will not be sent in addition to that. Part of #129508 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Oct 14, 2024
132338: pgwire: reduce allocations when writing CHAR(N) datums r=mgartner a=mgartner `ResolveBlankPaddedCharBytes` has been replaced with a more efficient method of padding `CHAR(N)` datums with trailing spaces. Epic: None Release note: None 132501: opt: remove outdated TODO r=mgartner a=mgartner A TODO that was addressed in #129007 has been removed. Release note: None 132563: kvserver: skip empty RACv1 dispatches r=kvoli a=pav-kv This commit skips creating a `RaftMessageRequest` in the fallback RAC admission dispatch code if there are no pending RACv1 dispatches. Previously, it would send an empty `RaftMessageRequest` which would cause an error in the receiver's `handleRaftRequest`, [message drops](https://github.com/cockroachdb/cockroach/blob/9e12f67ff4ad860651c40dcef489a1556d1d11b7/pkg/kv/kvserver/raft_transport.go#L451-L456), and the receiver's message queue to restart after the following warning: ``` unable to accept Raft message from (n0,s0):?: no handler registered for (n0,s0):? ``` Related to #129508 Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-replication-admission-control-v2
Related to introduction of replication AC v2
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
This needs to be done outside Raft, since it (a) allows for more efficient attachment of the admitted vector when sending a MsgAppResp, (b) probably more maintainable, since it will sit inside the RACv2 code, etc.
Prototype is in #129447
Key changes:
(see full history of changes below)
Jira issue: CRDB-41569
Epic: CRDB-37515
The text was updated successfully, but these errors were encountered: