-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: sep-raft-log: make log application stand-alone #75729
Comments
I'm not sure this photo of my whiteboard is helpful for anyone but me, but I spent a little bit of time looking into what's needed here. I think there are a few "themes":
|
I'm starting to type code for cockroachdb#75729 and it's clear immediately that we shouldn't have these protos in the `kvserver` package as this will make dependency cycles hard to avoid. We've long introduced the `kvserverpb` package, but simply didn't pull all the protos into it. This commit moves `raft.proto` to `kvserverpb`. There's still a bit of protobuf left in `api.proto`, but that can be handled separately. Release note: None
I'm starting to type code for cockroachdb#75729 and it's clear immediately that we shouldn't have these protos in the `kvserver` package as this will make dependency cycles hard to avoid. We've long introduced the `kvserverpb` package, but simply didn't pull all the protos into it. This commit moves `raft.proto` to `kvserverpb`. There's still a bit of protobuf left in `api.proto`, but that can be handled separately. Release note: None
76113: kvserver: move raft.proto to kvserverpb r=erikgrinaker a=tbg kvserver: move raft.proto to kvserverpb I'm starting to type code for #75729 and it's clear immediately that we shouldn't have these protos in the `kvserver` package as this will make dependency cycles hard to avoid. We've long introduced the `kvserverpb` package, but simply didn't pull all the protos into it. This commit moves `raft.proto` to `kvserverpb`. There's still a bit of protobuf left in `api.proto`, but that can be handled separately (#76114) Release note: None Co-authored-by: Tobias Grieger <[email protected]>
This is similar to cockroachdb#76113 in that it prevents import cycles in cockroachdb#75729. We don't want the code moved here to necessarily stay in `kvserverbase`, but that is a good place for now. Release note: None
76120: kvserver: move raft version encoding to `kvserverbase` r=erikgrinaker a=tbg This is similar to #76113 in that it prevents import cycles in #75729. We don't want the code moved here to necessarily stay in `kvserverbase`, but that is a good place for now. Release note: None Co-authored-by: Tobias Grieger <[email protected]>
This commit introduces a new package `raftlog`. Aspirationally, this will at some point become the de facto way of encapsulating the raft log encoding and may play a role in programmatically constructing and inspecting raft logs (e.g. for testing). For now, we introduce two concepts: - `raftlog.Entry`, which wraps a `raftpb.Entry` and all of the information derived from it, such as the command ID, the `kvserverpb.RaftCommand`, the configuration change (if any), etc. - `raftlog.Iterator`, a way to iterate over the raft log in terms of `raftlog.Entry` (as opposed to `raftpb.Entry` which requires lots of manual processing). Both are then applied across the codebase, concretely: - `loqrecovery` is simplified via `raftlog.Iterator` to pull commit triggers out of the raft log. - debug pretty-printing is simpler thanks to use of `raftlog.Entry`. - `decodedRaftEntry` is now structurally a `raftpb.Entry`, and again lots manual custom unmarshaling code evaporates. It's currently difficult to create "interesting" raft log entries if trying to stay away from manual population of large datastructures (which is prone to rotting), so there's zero unit testing of `raftlog.Iterator`. However, the code is not new, instead it was deduplicated from a few places, and is now tested through all of them; so I don't feel to bad about it. I still think it is a priority to be able to "comfortably" create at least "simple" raft logs, meaning we need to be able to string together `batcheval` and entry creation at least in a rudimentary fashion. I intend to look into this next and add comprehensive unit tests for `raftlog.{Entry,Iterator}`. Touches cockroachdb#75729. Release note: None
The right side of the picture ("Log encoding") is librarified, at least on the read path, in #76126. We still "manually" create these proposals when we run config changes: cockroach/pkg/kv/kvserver/replica_proposal_buf.go Lines 529 to 559 in 7051448
|
This commit introduces a new package `raftlog`. Aspirationally, this will at some point become the de facto way of encapsulating the raft log encoding and may play a role in programmatically constructing and inspecting raft logs (e.g. for testing). For now, we introduce two concepts: - `raftlog.Entry`, which wraps a `raftpb.Entry` and all of the information derived from it, such as the command ID, the `kvserverpb.RaftCommand`, the configuration change (if any), etc. - `raftlog.Iterator`, a way to iterate over the raft log in terms of `raftlog.Entry` (as opposed to `raftpb.Entry` which requires lots of manual processing). Both are then applied across the codebase, concretely: - `loqrecovery` is simplified via `raftlog.Iterator` to pull commit triggers out of the raft log. - debug pretty-printing is simpler thanks to use of `raftlog.Entry`. - `decodedRaftEntry` is now structurally a `raftpb.Entry`, and again lots manual custom unmarshaling code evaporates. It's currently difficult to create "interesting" raft log entries if trying to stay away from manual population of large datastructures (which is prone to rotting), so there's zero unit testing of `raftlog.Iterator`. However, the code is not new, instead it was deduplicated from a few places, and is now tested through all of them; so I don't feel to bad about it. I still think it is a priority to be able to "comfortably" create at least "simple" raft logs, meaning we need to be able to string together `batcheval` and entry creation at least in a rudimentary fashion. I intend to look into this next and add comprehensive unit tests for `raftlog.{Entry,Iterator}`. Touches cockroachdb#75729. Release note: None
This commit introduces `TestCreateManualProposal` which attempts to carry out an "offline" (i.e. `Replica`-less) version of evaluation & entry creation (i.e. the middle and right-hand side [here]). We start with a `*roachpb.PutRequest` and end up with a payload fit for a `(raftpb.Entry).Data`. This is helpful since it provides an easily digestible "life of a proposal" read-along, but more importantly represents a step towards establishing clearer interfaces for the various stages of a request - evaluation, replication, and application - and in particular separation of these stages from the surrounding `Replica`. The test helps pinpoint pieces of code that will need to be worked on in order to reach one of the goals of cockroachdb#75729, being able to create a meaningful raft log programmatically in a unit test without also instantiating a `Replica` and, later, also applying raft logs in a similar way. As a first step, this commit extracts a method `raftCmdToPayload` from `(*Replica).propose` which translates a `kvserverpb.RaftCommand` (which is roughly the result of evaluating a `BatchRequest`) into a payload that can be handed to raft for replication. This extracted method is far from clean - it takes a few crufty parameters, lives in `kvserver` (where it is hard to import) and also uses the logging singleton which is undesirable for a library method. However, it's enough to keep the ball rolling, and can be improved on later. [here]: cockroachdb#75729 (comment) Touches cockroachdb#75729. Release note: None
We don't need this for |
This isn't tracking exactly the work needed for |
The concrete example where you might want it both for LOQ and for ReplicasStorage is that for testing you'll want to populate custom raft logs. Having to spin up servers and manufacture them in a live system is now how this should work. Yet, that's what we do today: cockroach/pkg/kv/kvserver/loqrecovery/collect_raft_log_test.go Lines 34 to 90 in c4d4df9
|
This commit introduces a new package `raftlog`. Aspirationally, this will at some point become the de facto way of encapsulating the raft log encoding and may play a role in programmatically constructing and inspecting raft logs (e.g. for testing). For now, we introduce two concepts: - `raftlog.Entry`, which wraps a `raftpb.Entry` and all of the information derived from it, such as the command ID, the `kvserverpb.RaftCommand`, the configuration change (if any), etc. - `raftlog.Iterator`, a way to iterate over the raft log in terms of `raftlog.Entry` (as opposed to `raftpb.Entry` which requires lots of manual processing). Both are then applied across the codebase, concretely: - `loqrecovery` is simplified via `raftlog.Iterator` to pull commit triggers out of the raft log. - debug pretty-printing is simpler thanks to use of `raftlog.Entry`. - `decodedRaftEntry` is now structurally a `raftpb.Entry`, and again lots manual custom unmarshaling code evaporates. It's currently difficult to create "interesting" raft log entries if trying to stay away from manual population of large datastructures (which is prone to rotting), so there's zero unit testing of `raftlog.Iterator`. However, the code is not new, instead it was deduplicated from a few places, and is now tested through all of them; so I don't feel to bad about it. I still think it is a priority to be able to "comfortably" create at least "simple" raft logs, meaning we need to be able to string together `batcheval` and entry creation at least in a rudimentary fashion. I intend to look into this next and add comprehensive unit tests for `raftlog.{Entry,Iterator}`. Touches cockroachdb#75729. Release note: None
This commit introduces a new package `raftlog`. Aspirationally, this will at some point become the de facto way of encapsulating the raft log encoding and may play a role in programmatically constructing and inspecting raft logs (e.g. for testing). For now, we introduce two concepts: - `raftlog.Entry`, which wraps a `raftpb.Entry` and all of the information derived from it, such as the command ID, the `kvserverpb.RaftCommand`, the configuration change (if any), etc. - `raftlog.Iterator`, a way to iterate over the raft log in terms of `raftlog.Entry` (as opposed to `raftpb.Entry` which requires lots of manual processing). Both are then applied across the codebase, concretely: - `loqrecovery` is simplified via `raftlog.Iterator` to pull commit triggers out of the raft log. - debug pretty-printing is simpler thanks to use of `raftlog.Entry`. - `decodedRaftEntry` is now structurally a `raftpb.Entry`, and again lots manual custom unmarshaling code evaporates. It's currently difficult to create "interesting" raft log entries if trying to stay away from manual population of large datastructures (which is prone to rotting), so there's zero unit testing of `raftlog.Iterator`. However, the code is not new, instead it was deduplicated from a few places, and is now tested through all of them; so I don't feel to bad about it. I still think it is a priority to be able to "comfortably" create at least "simple" raft logs, meaning we need to be able to string together `batcheval` and entry creation at least in a rudimentary fashion. I intend to look into this next and add comprehensive unit tests for `raftlog.{Entry,Iterator}`. Touches cockroachdb#75729. Release note: None
`(*replicaAppBatch).Stage` roughly 1. checks whether to apply the `Command` as a no-op 2. runs a bunch of triggers (touching `*Replica` a lot) 3. stages the command into a `storage.Batch` 4. runs a bunch of triggers (touching `*Replica` a lot) 5. returns the cmd as a `CheckedCommand` Standalone log application needs to do 1, 3, 5 but it only needs to do parts of 2 and 4. We're trying to morph `replicaAppBatch` into something that can apply log entries both in regular and standalone mode, where all of the interactions with the surrounding *Replica (if any) are pushed into an interface handed to `replicaAppBatch`. The implementation of the interface is either a standalone-specific handler, or a *Replica-specific one (that relies heavily on the standalone handler to avoid logic duplication). This will take many commits to achieve but the goal is something like this: ```go // appBatch implements apply.Batch and Handler (for standalone // application). type appBatch struct { // All the fields that we need during standalone application. state ReplicaState batch storage.Batch } // Handler is used by replicaAppBatch to execute side effects of entry // application. The implementation depends on regular vs standalone mode. type Handler interface { // methods are called from Stage PreHandler // addSSTPreApply, maybeAcquireSplitLock, etc curre PostHandler // runPreApplyTriggersAfterStagingWriteBatch, itemized } // replicaAppBatch implements apply.Batch. It also implements Handler, // by first calling appBatch's Handler implementation and augmenting it // with additional actions that don't apply in a standalone context. type replicaAppBatch struct { appBatch r *Replica // More fields only required in *Replica context, like // `followerStoreWriteBytes` which is used for admission control } ``` That exact plan is unlikely to entirely survive contact with reality but something along those lines ought to work out. Other than paving a particularly bumpy part of the road to stand-alone log application, I expect this to also make the code more accessible, as it is currently somewhat ad-hoc. This first commit moves the lease applied index checks and the closed timestamp assertions[^1] to this new scheme. Follow-up commits will do the same for the other references to `*Replica` in `Stage` until `*Replica` is only referenced from `*replicaAppBatchHandler`. At that point, we can move `*Replica` from `replicaAppBatch` to `replicaAppBatchHandler` and can declare victory. This PR tries to minimize code movement. I left TODOs on everything that I think should move in a future commit. Touches cockroachdb#75729. [^1]: these assertions are currently only run in "regular" mode but sould also be ported to standalone mode in the future. For now it's convenient to have them out of the way; better to have a standalone mode with a few assertions missing earlier. Release note: None
For cockroachdb#75729 we will want to work on replicaAppBatch and decompose it into a new type appBatch around which replicaAppBatch can be layered. As a first baby step, move appbatch-related code to its own file to physically separate it from the `apply.StateMachine`. This is 100% mechanical (using Goland's refactoring tools). Release note: None
93237: kvserver: move replicaAppBatch to replica_app_batch.go r=pavelkalinnikov a=tbg For #75729 we will want to work on replicaAppBatch and decompose it into a new type appBatch around which replicaAppBatch can be layered. As a first baby step, move appbatch-related code to its own file to physically separate it from the `apply.StateMachine`. This is 100% mechanical (using Goland's refactoring tools). Epic: CRDB-220 Release note: None Co-authored-by: Tobias Grieger <[email protected]>
93266: kvserver: refactor replicaAppBatch for standalone log application r=pavelkalinnikov a=tbg This long (but individually small) sequence of commits moves `(*replicaAppBatch).Stage` close to the structure that was prototyped in #93265, where it has the following steps: - command checks (standalone) - testing interceptors (replica) - pre-add triggers (standalone) - pre-add triggers (replica) - add (to pebble batch, standalone) - post-add triggers (standalone) - post-add triggers (replica) In standalone application (e.g. for #93244) we'd use an `apply.Batch` that is an `appBatch`, instead of a `replicaAppBatch`, i.e. skip all of the (replica) steps above. This PR doesn't get us all the way there - we still need to tease apart the `post-add triggers (replica)` step, which currently contains code that should be in `post-add triggers (standalone)`; this is best tackled in a separate PR since it's going to be quite a bit of work. Touches #75729. Epic: CRDB-220 Release note: None Co-authored-by: Tobias Grieger <[email protected]>
In #97347 we're seeing a tricky interaction between reproposals that we can't unit test without a significant rework. Having more stand-alone components for log application would help here. |
104401: kvserver: add TestCreateManyUnappliedProbes r=pavelkalinnikov a=tbg This is the test used for #102953. This PR also makes modest progress on #75729, not by making log application stand-alone, but by making it somewhat less convoluted to manufacture log entries programmatically. It would be desirable to be able to test the replication layer in CRDB in the same way that Raft allows via the InteractionEnv[^1]. [^1]: https://github.com/etcd-io/raft/blob/6bf4f7fe3122b064e0a0d76314298dca6f379fc7/interaction_test.go Epic: none Release note: none 105083: kvcoord: Release catchup reservation before re-acquire attempt r=miretskiy a=miretskiy Release catchup scan reservation prior to attemt to re-acquire it. Failure to do so could result in a stuck mux rangefeed when enough ranges encounter an error, such as range split, prior to receiving the first checkpoint event, that would cause additional attempts to acquire catchup scan quota. Fixes #105058 Release note (bug fix): Fix a bug in mux rangefeed implementation that may cause mux rangefeed to become stuck if enough ranges encounter certain error concurrently. Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Is your feature request related to a problem? Please describe.
Today, there is no "library-style" way of crafting a CockroachDB log of commands.
We can in principle craft individual commands by evaluating requests (the
batcheval
package), but below that you need a*Replica
which stringstogether the raft log, and the apply loop.
We have felt for a long time that abstractly we could benefit from better
modularization here, but now we are starting to see this come up in projects we
are working on, for example
cockroach/pkg/storage/replicas_storage.go
Lines 133 to 142 in 4a73be0
and similarly Loss Of Quorum Recovery tools could benefit from tools to interpret the raft log at the right level of abstraction.
Describe the solution you'd like
Continue the work begun by #39254
and further extract the logic pertaining to entry application so that it can be
used to apply log entries outside of the context of a fully instantiated
Replica
. I think most of the work will be to wean the apply code off themany direct accesses to
replica.mu
and the associated fields.As an acceptance criterion, we should be able to write a unit test that sets up
a test cluster, appends some commands to the log, stops the cluster and
subsequently re-applies the log entries to an empty state, arriving at an
identical state machine.
Ideally, we also get a testing win by allowing tests to be set up with bespoke
raft logs that exercise hand-crafted sequences of commands; this is very difficult
today and essentially doesn't happen for that very reason.
Note: inclusion of splits/merges will add complexity. We may not need this in
the first pass but should think about how this could work in the future.
Epic: CRDB-220
Jira issue: CRDB-12807
The text was updated successfully, but these errors were encountered: