-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
span: Rewrite span frontier to use btree #110516
Conversation
503008a
to
b061ed4
Compare
|
Not really, other than we have implementation that's already used in crdb.
…On Wed, Sep 13, 2023, 8:03 AM Pavel Kalinnikov ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/util/span/frontierentry_interval_btree.go
<#110516 (comment)>
:
> +func (b keyBound) contains(a *btreeFrontierEntry) bool {
+ c := bytes.Compare(a.Key(), b.key)
+ if c == 0 {
+ return b.inc
+ }
+ return c < 0
+}
+
+func upperBound(c *btreeFrontierEntry) keyBound {
+ if len(c.EndKey()) != 0 {
+ return keyBound{key: c.EndKey()}
+ }
+ return keyBound{key: c.Key(), inc: true}
+}
+
+type leafNode struct {
Any reason not to use existing implementations of the tree, e.g.
https://github.com/google/btree?
—
Reply to this email directly, view it on GitHub
<#110516 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANA4FVBIA4TCZWU7L6OXBILX2GOJXANCNFSM6AAAAAA4VRLHDQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
It looks like |
It's a generated code: https://github.com/cockroachdb/cockroach/blob/master/pkg/util/interval/generic/doc.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed this at a high level, LGTM overall. I'll leave the detailed review to CDC and DR.
Reviewed 1 of 1 files at r1, 23 of 24 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @jayshrivastava, @miretskiy, and @stevendanna)
pkg/ccl/changefeedccl/changefeed_processors.go
line 1772 at r2 (raw file):
} // Forward forwards span timestamp.
nit: consider embedding the frontier, so these are automatically forwarded?
pkg/util/span/frontier.go
line 35 at r2 (raw file):
// can be used to make thread safe btreeFrontier. type Frontier interface { // AddSpansAt adds the provided spans to the btreeFrontier at the provided timestamp.
nit: the interface probably shouldn't reference concrete implementations.
pkg/util/span/frontier.go
line 191 at r2 (raw file):
// NB: It is *extremely* important for the caller to guarantee that the passed // in spans (the underlying Key/EndKey []byte slices) are not modified in any // way after this call. If modifications are made to the underlying key slices
As we discussed, if we want to derisk including this in 23.2, we could consider copying here for now and then remove the copying for 24.1. How much of a perf hit would we take in that case? If we can copy into reused slices then the cost shouldn't be terrible since we avoid the allocation?
pkg/util/span/frontier.go
line 324 at r2 (raw file):
func (f *btreeFrontier) mergeEntries(e *btreeFrontierEntry) (*btreeFrontierEntry, error) { defer func() { f.mergeAlloc = f.mergeAlloc[:0]
We may want to nil out the pointers first, so we don't hold onto the entries in memory until a later merge replaces them. Then again, I guess these go back into the pool anyway.
Do we want to release this if this ever becomes very large, e.g. if we occasionally merge 100k spans? Maybe hard to say, since we could just come back around and need 100k again soon.
pkg/util/span/frontier.go
line 714 at r2 (raw file):
*e = btreeFrontierEntry{ Start: start, End: end,
Hm, we're not actually pooling the byte slices here at all? If we did, then copying the input slices would possibly have a fairly minor cost. Re: my other comment above. We'd have to be careful when using these e.g. during merges, in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, @jayshrivastava, @pavelkalinnikov, and @stevendanna)
pkg/ccl/changefeedccl/changefeed_processors.go
line 1772 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: consider embedding the frontier, so these are automatically forwarded?
Ahh... good idea; but I needed to add a type alias (because of Frontier() -- you can't have
span.Frontier embedded, with the same named method).
pkg/util/span/frontier.go
line 35 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: the interface probably shouldn't reference concrete implementations.
Argh... goland rename refactoring also changes comments. Fixed.
pkg/util/span/frontier.go
line 191 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
As we discussed, if we want to derisk including this in 23.2, we could consider copying here for now and then remove the copying for 24.1. How much of a perf hit would we take in that case? If we can copy into reused slices then the cost shouldn't be terrible since we avoid the allocation?
Yeah; I was thinking about this. It's totally fine to copy here; the problem lies in Forward call which would also need to copy whenever we split existing entry based on the caller provided span boundaries.
And because of that, I decided not to copy at all, but to add sanity checks (in test).
Now, I have reviewed every use of the frontier. It might be common to re-use the same roachpb.Span; but I think that any place where somebody changes underlying []byte slice -- is probably very broken (keys are supposed to be immutable).
It should be noted that the pre-historic frontier implementation never copied slices either -- it's only when I added a benchmark, that I made a mistake (in the benchmark) of mutating underlying byte slices. And that made me afraid that these failures could occur. In retrospect, I should have never added key copiying in the first place, and a check (under test) like I have here would have been preferable.
pkg/util/span/frontier.go
line 324 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
We may want to nil out the pointers first, so we don't hold onto the entries in memory until a later merge replaces them. Then again, I guess these go back into the pool anyway.
Do we want to release this if this ever becomes very large, e.g. if we occasionally merge 100k spans? Maybe hard to say, since we could just come back around and need 100k again soon.
I added setting nil pts in the loop that deletes merge entries.
I don't think we need to worry about this becoming large. As a matter of fact, most of the times merge entries will only contain 2 elements (the e itself, and the element to the right). The hypothetical scenario you are worried about is when all but 1 span are at the same timestamp (T), and then we advance the 1 span to that timestamp. However, note that merging happens on every call to forward frontier, and thus all of those other spans will have been merged in just 2 spans (one to the left and one to the right of the span we are forwarding) when we forwarded those other spans to T.
pkg/util/span/frontier.go
line 714 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Hm, we're not actually pooling the byte slices here at all? If we did, then copying the input slices would possibly have a fairly minor cost. Re: my other comment above. We'd have to be careful when using these e.g. during merges, in that case.
See answer on your other comment.
pkg/util/span/frontierentry_interval_btree.go
line 102 at r2 (raw file):
Previously, pavelkalinnikov (Pavel Kalinnikov) wrote…
Any reason not to use existing implementations of the tree, e.g. https://github.com/google/btree?
Done.
d2c6930
to
b6b371c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changefeedccl changes LGTM 👍
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, @miretskiy, @pavelkalinnikov, and @stevendanna)
pkg/ccl/changefeedccl/changefeed_processors.go
line 1739 at r3 (raw file):
latestKV: timeutil.Now(), } scf.spanFrontier = span.MakeConcurrentFrontier(sf)
Does it need to be concurrent?
Yes; because it is shared as a "timestamp oracle" with the underlying sink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @jayshrivastava, @miretskiy, @msbutler, @pavelkalinnikov, and @stevendanna)
pkg/util/span/frontier.go
line 191 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Yeah; I was thinking about this. It's totally fine to copy here; the problem lies in Forward call which would also need to copy whenever we split existing entry based on the caller provided span boundaries.
And because of that, I decided not to copy at all, but to add sanity checks (in test).Now, I have reviewed every use of the frontier. It might be common to re-use the same roachpb.Span; but I think that any place where somebody changes underlying []byte slice -- is probably very broken (keys are supposed to be immutable).
It should be noted that the pre-historic frontier implementation never copied slices either -- it's only when I added a benchmark, that I made a mistake (in the benchmark) of mutating underlying byte slices. And that made me afraid that these failures could occur. In retrospect, I should have never added key copiying in the first place, and a check (under test) like I have here would have been preferable.
Do we ever pass in slices that might originate from Pebble's UnsafeKey()
? Those are owned by Pebble and may be mutated by it. I'm pretty sure anything that crosses RPC boundaries is safe though, due to Protobuf encoding/decoding.
pkg/util/span/frontier.go
line 324 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
I added setting nil pts in the loop that deletes merge entries.
I don't think we need to worry about this becoming large. As a matter of fact, most of the times merge entries will only contain 2 elements (the e itself, and the element to the right). The hypothetical scenario you are worried about is when all but 1 span are at the same timestamp (T), and then we advance the 1 span to that timestamp. However, note that merging happens on every call to forward frontier, and thus all of those other spans will have been merged in just 2 spans (one to the left and one to the right of the span we are forwarding) when we forwarded those other spans to T.
Good point.
457d62f
to
5823a97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! I left some non-blocking suggestions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, @jayshrivastava, @miretskiy, @pavelkalinnikov, and @stevendanna)
pkg/ccl/streamingccl/streamproducer/event_stream.go
line 132 at r5 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
because frontier is used throughout the lifetime of this stream.
So, if start returns nil error, the ownership of the frontier (and responsibility for closing) transferred to start stream processor)
makes sense!
pkg/util/span/frontier.go
line 305 at r5 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
i mean, it's generated code... i'll leave it for now because I'm not even sure I'll wind up keeping this.
Sounds good
pkg/util/span/frontier.go
line 368 at r5 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
You'll see that if leftMost != e above, then leftMost is removed from the merge alloc, and e itself added instead of leftmost.
ahhh. mind adding a comment that explains this for the dull?
pkg/util/span/frontier.go
line 529 at r5 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
yes we can, but I felt it's cleaner and easier to follow if the code doesn't have many breaks in it and just keeps advancing todoEntry (like an iterator).
good point. don't be mvcc.go.
pkg/util/span/frontier.go
line 538 at r5 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
no, I don't think I want to do that. forwarding invalidates any existing iterators and also possibly overalap itself. I updated comments to this effect.
sounds good
pkg/util/span/frontier.go
line 109 at r6 (raw file):
var useBtreeFrontier = envutil.EnvOrDefaultBool("COCKROACH_BTREE_SPAN_FRONTIER_ENABLED", true) // DNM: Re-enable metamorphic prior to merge.
highlighting so you don't forget :)
pkg/util/span/frontier_test.go
line 310 at r6 (raw file):
var expectedRanges []string for r := 'A'; r <= 'Z'-1; r++ {
nice runes!!
pkg/util/span/frontier_test.go
line 402 at r6 (raw file):
} func checkContiguousFrontier(f Frontier) (startKey, endKey []byte, retErr error) {
nit: one other check you do in the scan below or in a separate helper: verify that f.Frontier(), f.Len(), and f.PeekFrontierSpan return the correct results
pkg/util/span/frontier_test.go
line 622 at r6 (raw file):
seed := randutil.NewPseudoSeed() rnd := rand.New(rand.NewSource(seed)) f.Logf("seed %d", seed)
maybe i'm missing something, but I think it's unecessary to reseed and log it in this test https://github.com/miretskiy/cockroach/blob/spanfrontier/pkg/util/randutil/rand.go#L247
pkg/util/span/frontier_test.go
line 627 at r6 (raw file):
const corpusSize = 2 << 10 for i := 0; i < corpusSize; i++ { s := spanMaker.rndSpan(false)
why don't you allow for start key reuse here?
pkg/util/span/frontier_test.go
line 628 at r6 (raw file):
for i := 0; i < corpusSize; i++ { s := spanMaker.rndSpan(false) f.Add([]byte(s.Key), []byte(s.EndKey), i)
could add a random wall time too, so some elements in the corpus have the same wall time?
5823a97
to
3e5b1dc
Compare
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, @jayshrivastava, @msbutler, @pavelkalinnikov, and @stevendanna)
pkg/ccl/streamingccl/streamproducer/event_stream.go
line 132 at r5 (raw file):
Previously, msbutler (Michael Butler) wrote…
makes sense!
Done.
pkg/util/span/frontier.go
line 368 at r5 (raw file):
Previously, msbutler (Michael Butler) wrote…
ahhh. mind adding a comment that explains this for the dull?
Expanded comment above, where I clear out left most from mergeAlloc.
pkg/util/span/frontier.go
line 109 at r6 (raw file):
Previously, msbutler (Michael Butler) wrote…
highlighting so you don't forget :)
Thanks.
pkg/util/span/frontier_test.go
line 310 at r6 (raw file):
Previously, msbutler (Michael Butler) wrote…
nice runes!!
Done.
pkg/util/span/frontier_test.go
line 402 at r6 (raw file):
Previously, msbutler (Michael Butler) wrote…
nit: one other check you do in the scan below or in a separate helper: verify that f.Frontier(), f.Len(), and f.PeekFrontierSpan return the correct results
Not sure about Len (I honestly don't know what the right length is; and LLRB frontier has some bugs w/ merging, as I discovered), but an excellent idea re Frontier/PeekFrontierSpan
pkg/util/span/frontier_test.go
line 622 at r6 (raw file):
Previously, msbutler (Michael Butler) wrote…
maybe i'm missing something, but I think it's unecessary to reseed and log it in this test https://github.com/miretskiy/cockroach/blob/spanfrontier/pkg/util/randutil/rand.go#L247
Sure. Gone.
pkg/util/span/frontier_test.go
line 627 at r6 (raw file):
Previously, msbutler (Michael Butler) wrote…
why don't you allow for start key reuse here?
Good question; took a while to remember. I removed this flag. (if you're curious: the problem I was debugging was with test fuzzer; fuzzer mutates its inputs, so if you allowed reuse of the keys, then the whole tree would
get out of whack because subsequent fuzz calls may mutate byte slices used before.
At any rate, I added explicit copy to the fuzz function for this very reason, and thus I no longer need to have this flag here.
pkg/util/span/frontier_test.go
line 628 at r6 (raw file):
Previously, msbutler (Michael Butler) wrote…
could add a random wall time too, so some elements in the corpus have the same wall time?
Fuzzer is pretty good at this ; having said that I extended it so that invalid/negative timestamp can
also be added.
3e5b1dc
to
7362fe6
Compare
You don't say :) |
7e6c8db
to
693aade
Compare
Rewrite span frontier to use btree instead of LLRB (left leaning red black trees). The old implementation was very CPU intensive, and very ineficient in memory allocation. Btree based implementation corrects these deficience. Both implementation are available, with the ability to revert to old LLRB based implementation via setting `COCKROACH_BTREE_SPAN_FRONTIER_ENABLED` to false. In addition to performance enhancements, the following changes were made: * New implementation assumes that spans passed to the frontier during construction or Forward() are never mutated by the caller. This assumption is verified under unit tests. * Default Frontier implementation is no longer thread safe. Instead, the caller may explicitly construct thread safe frontier via `MakeConcurrentFrontier` Enahnce frontier tests by implementing fuzz tests. Epic: CRDB-26372 Release note: None
693aade
to
c726497
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, @jayshrivastava, @pavelkalinnikov, and @stevendanna)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing work!
bors r+ |
1 similar comment
bors r+ |
Already running a review |
Build succeeded: |
In cockroachdb#110516, the code started used a generated btree. Unfortunately, the code did not adopt bazel code generation, so missed a regeneration when the changes in cockroachdb#116663 merged. Also, it seems, the change rotted such that regenerating didn't work because the entry was renamed from `frontierEntry` to `btreeFrontierEntry` without updating the go:generate comment. I imagine somebody did a manual rename. This fixes the problem by adopting bazel generation. Epic: None Release note: None
99884: kvserver: remove unused snapshot fields r=kvoli a=andrewbaptist This commit removes a number of unused fields in snapshot messages. Release note: None Epic: none 118490: util/span: use bazel for codegen, regenerate r=miretskiy a=ajwerner In #110516, the code started used a generated btree. Unfortunately, the code did not adopt bazel code generation, so missed a regeneration when the changes in #116663 merged. Also, it seems, the change rotted such that regenerating didn't work because the entry was renamed from `frontierEntry` to `btreeFrontierEntry` without updating the go:generate comment. I imagine somebody did a manual rename. This fixes the problem by adopting bazel generation. Epic: None Release note: None Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
In cockroachdb#110516, the code started used a generated btree. Unfortunately, the code did not adopt bazel code generation, so missed a regeneration when the changes in cockroachdb#116663 merged. Also, it seems, the change rotted such that regenerating didn't work because the entry was renamed from `frontierEntry` to `btreeFrontierEntry` without updating the go:generate comment. I imagine somebody did a manual rename. This fixes the problem by adopting bazel generation. Epic: None Release note: None
Rewrite span frontier to use btree instead of
LLRB (left leaning red black trees).
The old implementation was very CPU intensive, and
very inefficient in memory allocation.
Btree based implementation corrects these deficiencies.
Both implementation are available, with the ability
to revert to old LLRB based implementation via setting
COCKROACH_BTREE_SPAN_FRONTIER_ENABLED
to false.In addition to performance enhancements, the following
changes were made:
during construction or Forward() are never mutated by the caller.
This assumption is verified under unit tests.
the caller may explicitly construct thread safe frontier via
MakeConcurrentFrontier
Enhance frontier tests by implementing fuzz tests.
New implementation ~50% faster and amortized 0 allocation.
Epic: CRDB-26372
Release note: None