-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raftlog: introduce & use in loqrecovery, kvserver #76126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this cleanup!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by comments. Thanks for doing this!
Reviewed 8 of 12 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/kv/kvserver/raftlog/entry.go, line 50 at r1 (raw file):
} // Load populates the Entry from the Ent field. Will reset the Meta field.
wondering why this comment mentions the Meta
field. If it populates Entry
doesn't that imply that everything will be reset?
I guess it is because of this weird behavior where sometimes we have to unmarshal MVCCMetadata
to get a raftpb.Entry
and sometimes we are handed a raftpb.Entry
directly. Is the latter case the one where we have it in the cache?
pkg/kv/kvserver/raftlog/entry.go, line 98 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: consider typing this out as
ConfChange
, ditto with the fields. It's too terse when encountered in the code.
+1
pkg/kv/kvserver/raftlog/iterator.go, line 52 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Seek shouldn't take a bound. Put the iterator construction and bounds in
NewIterator
, and haveSeek
take a single index.
And rename it SeekGE just to be consistent with other code.
pkg/kv/kvserver/raftlog/iterator.go, line 80 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Seems iffy to do this as a side-effect of
Valid()
. This iterator pattern that we generally use is pretty awkward in terms of error handling and such (particularly decoding errors), but what we usually end up doing is to call into this decoding inSeek
andNext
, and set an error inIterator.err
which is surfaced fromValid()
.Making
UnsafeEntry()
return a decoding error might be the saner choice here -- I think that's what we want to do with e.g.MVCCIterator
down the road too.
MVCCIterator will probably move to a model where Next
(and seeks) will return (valid bool, err error)
, since the interface call overhead of calling Valid
every time is not insignificant (there is a TODO in engine.go). We may as well adopt a similar approach here, which means we would decode in Next and SeekGE.
1bbb2af
to
ec57fc4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the comments, ready for another round. I still need to do a little perf work here (left a comment there as well).
Dismissed @erikgrinaker and @sumeerbhola from 5 discussions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @sumeerbhola)
pkg/kv/kvserver/replica_application_cmd.go, line 45 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Remove the comment --
ent
no longer exists.
Done.
pkg/kv/kvserver/replica_application_cmd.go, line 187 at r2 (raw file):
var err error d.Entry, err = raftlog.MakeEntry(*e)
I need to look at this one again, I think as is I'm allocating more than before because MakeEntry
will internally put the return value on the heap.
pkg/kv/kvserver/raftlog/iterator.go, line 52 at r1 (raw file):
Previously, sumeerbhola wrote…
And rename it SeekGE just to be consistent with other code.
Done.
pkg/kv/kvserver/raftlog/iterator.go, line 80 at r1 (raw file):
Previously, sumeerbhola wrote…
MVCCIterator will probably move to a model where
Next
(and seeks) will return(valid bool, err error)
, since the interface call overhead of callingValid
every time is not insignificant (there is a TODO in engine.go). We may as well adopt a similar approach here, which means we would decode in Next and SeekGE.
I've implemented Sumeer's suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a few hours figuring out why once I pushed the new code all packages were timing out. I haven't figured it out yet, but ultimately it seems to be caused by a rebase on master I carried out. I seem to be able to pass all tests when rebased on 4ea464d (the initial commit this PR started out on top of) but on 18e4193 it's broken.
Since I need to figure this out anyway, no need to review again yet. I will likely end up making some changes anyway.
Reviewed 2 of 10 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @sumeerbhola)
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
679cc04
to
0237730
Compare
7302e83
to
2960aed
Compare
Worked on this a bit today (thanks, Flex Friday!) and this should actually work now. Let's hold off on reviews until the stability period stuff is a bit less hectic, though. |
This is more legible; these are effectively random bytes. Extracted from cockroachdb#76126. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, ta! Left a few nits, and questions about the semantics.
I think this PR could have been easier to review if split so that one commit introduced the Entry
type, and another one for the Iterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR could have been easier to review if split so that one commit introduced the Entry type, and another one for the Iterator.
Thanks for the feedback, I'll be a bit more aggressive about splitting next time. I find that it does slow me down quite a bit so it needs to be worth it.
I'd like to add some more unit testing to this so this needs some more work, but please PTAL at the fixup commits. Additional testing will come in follow-up commits (I don't want to merge before we have this), so by establishing a checkpoint here you don't have to revisit it later.
Reviewed 7 of 7 files at r9.
Dismissed @erikgrinaker from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @pavelkalinnikov, and @sumeerbhola)
pkg/kv/kvserver/replica_application_cmd.go
line 45 at r10 (raw file):
Previously, pavelkalinnikov (Pavel Kalinnikov) wrote…
How about removing the
decodedRaftEntry
wrapper, and directly inliningentry *raflog.Entry
? There seems to be no reason to keep it?
Well spotted, done!
pkg/kv/kvserver/replica_application_cmd.go
line 84 at r10 (raw file):
Previously, pavelkalinnikov (Pavel Kalinnikov) wrote…
ctx
seems unused all the way down the stack from here
Done.
pkg/kv/kvserver/raftlog/entry.go
line 86 at r10 (raw file):
Previously, pavelkalinnikov (Pavel Kalinnikov) wrote…
ConfChangeV1
The type is raftpb.ConfChange
, and that's what we're unmarshalling (into).
pkg/kv/kvserver/raftlog/iterator.go
line 89 at r10 (raw file):
Previously, pavelkalinnikov (Pavel Kalinnikov) wrote…
Can there be gaps in the log indices? I.e. isn't the first log index >= idx equal to idx (or none)?
I imagine a gap can happen when the index is compacted away? Will this be an error then, or depends on who iterates?
Maybe would be worth elaborating nuances around entries with consecutive indices, somewhere in the
Iterator
comment.
There are never any gaps. I'll make this clear on the Iterator comment.
pkg/kv/kvserver/raftlog/iterator.go
line 101 at r10 (raw file):
Previously, pavelkalinnikov (Pavel Kalinnikov) wrote…
Will iteration ever jump by more than +1 index? What precisely does "next entry" mean?
Index+1. I added a comment.
pkg/kv/kvserver/raftlog/iterator.go
line 130 at r10 (raw file):
Previously, pavelkalinnikov (Pavel Kalinnikov) wrote…
Is reading one extra entry unavoidable here? E.g. could the storage iterator have an upper bound that it more efficiently terminates at?
Yes, see the TODO in NewIterator
. Actually I went ahead and implemented it.
9092822
to
396d847
Compare
Added another commit with unit testing, which can also be reviewed. |
We should no longer need it. Also, it introduces the complication of "empty but ID'ed raft commands", which is annoying. We could probably change this to propose a truly `nil` command (mimicking what raft does internally) but it is better to avoid this problem altogether. If there truly is a case in which we need to nudge raft, it likely also exists outside of the `splitDelayHelper`, and the sooner we find out about it the better. Extracted from cockroachdb#76126. Release note: None
87710: kvserver: remove ProposeEmptyCommand r=erikgrinaker a=tbg We should no longer need it. Also, it introduces the complication of "empty but ID'ed raft commands", which is annoying. We could probably change this to propose a truly `nil` command (mimicking what raft does internally) but it is better to avoid this problem altogether. If there truly is a case in which we need to nudge raft, it likely also exists outside of the `splitDelayHelper`, and the sooner we find out about it the better. Extracted from #76126. Epic: None Release note: None Co-authored-by: Tobias Grieger <[email protected]>
This commit introduces a new package `raftlog`. Aspirationally, this will at some point become the de facto way of encapsulating the raft log encoding and may play a role in programmatically constructing and inspecting raft logs (e.g. for testing). For now, we introduce two concepts: - `raftlog.Entry`, which wraps a `raftpb.Entry` and all of the information derived from it, such as the command ID, the `kvserverpb.RaftCommand`, the configuration change (if any), etc. - `raftlog.Iterator`, a way to iterate over the raft log in terms of `raftlog.Entry` (as opposed to `raftpb.Entry` which requires lots of manual processing). Both are then applied across the codebase, concretely: - `loqrecovery` is simplified via `raftlog.Iterator` to pull commit triggers out of the raft log. - debug pretty-printing is simpler thanks to use of `raftlog.Entry`. - `decodedRaftEntry` is now structurally a `raftpb.Entry`, and again lots manual custom unmarshaling code evaporates. It's currently difficult to create "interesting" raft log entries if trying to stay away from manual population of large datastructures (which is prone to rotting), so there's zero unit testing of `raftlog.Iterator`. However, the code is not new, instead it was deduplicated from a few places, and is now tested through all of them; so I don't feel to bad about it. I still think it is a priority to be able to "comfortably" create at least "simple" raft logs, meaning we need to be able to string together `batcheval` and entry creation at least in a rudimentary fashion. I intend to look into this next and add comprehensive unit tests for `raftlog.{Entry,Iterator}`. Touches cockroachdb#75729. Release note: None
This is cockroachdb#76126, not to be reviewed in this PR. It will be rebased away once that PR has merged.
Protobuf makes it hard to do well here. ``` BenchmarkNewEntry/fromRawValue=false,release=false-10 1319 ns/op 4890 B/op 10 allocs/op BenchmarkNewEntry/fromRawValue=false,release=true-10 1128 ns/op 4442 B/op 9 allocs/op BenchmarkNewEntry/fromRawValue=true,release=false-10 2611 ns/op 13213 B/op 13 allocs/op BenchmarkNewEntry/fromRawValue=true,release=true-10 2536 ns/op 12766 B/op 12 allocs/op ``` Release note: None
TFTR! bors r=pavelkalinnikov |
Build succeeded: |
This commit introduces a new package
raftlog
. Aspirationally, thiswill at some point become the de facto way of encapsulating the raft log
encoding and may play a role in programmatically constructing and
inspecting raft logs (e.g. for testing).
For now, we introduce two concepts:
raftlog.Entry
, which wraps araftpb.Entry
and all of theinformation derived from it, such as the command ID, the
kvserverpb.RaftCommand
, the configuration change (if any), etc.raftlog.Iterator
, a way to iterate over the raft log in terms ofraftlog.Entry
(as opposed toraftpb.Entry
which requires lots ofmanual processing).
Both are then applied across the codebase, concretely:
loqrecovery
is simplified viaraftlog.Iterator
to pull committriggers out of the raft log.
raftlog.Entry
.decodedRaftEntry
is now structurally araftpb.Entry
, and againlots manual custom unmarshaling code evaporates.
It's currently difficult to create "interesting" raft log entries if
trying to stay away from manual population of large datastructures
(which is prone to rotting), so there's zero unit testing of
raftlog.Iterator
. However, the code is not new, instead it wasdeduplicated from a few places, and is now tested through all of them;
so I don't feel to bad about it. I still think it is a priority to be
able to "comfortably" create at least "simple" raft logs, meaning we
need to be able to string together
batcheval
and entry creation atleast in a rudimentary fashion. I intend to look into this next and
add comprehensive unit tests for
raftlog.{Entry,Iterator}
.Touches #75729.
Epic: CRDB-220
Release note: None