-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest: add range annotations #3759
manifest: add range annotations #3759
Conversation
d1f0f5b
to
35aa22e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind extracting the conversion to generics into its own commit/PR so that it can be reviewed independently?
Reviewable status: 0 of 11 files reviewed, all discussions resolved (waiting on @itsbilal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely - that PR is now at #3760. I'll rebase this one to just include the range annotation changes once that one is merged.
Reviewable status: 0 of 11 files reviewed, all discussions resolved (waiting on @itsbilal)
35aa22e
to
79bdbc1
Compare
b046563
to
282a3b1
Compare
cockroachdb#3760 contained a bug which causes Annotator values to be incorrectly aggregated when pointer values should be overwritten. This is because the `vtyped` variable was not being modified due to being on the stack. This change fixes this and adds a unit test for `PickFileAggregator` to catch this issue in the future. cockroachdb#3759 should already not be affected by this due to the different way it handles aggregation.
cockroachdb#3760 contained a bug which causes Annotator values to be incorrectly aggregated when pointer values should be overwritten. This is because the `vtyped` variable was not being modified due to being on the stack. This change fixes this and adds a unit test for `PickFileAggregator` to catch this issue in the future. cockroachdb#3759 should already not be affected by this due to the different way it handles aggregation.
#3760 contained a bug which causes Annotator values to be incorrectly aggregated when pointer values should be overwritten. This is because the `vtyped` variable was not being modified due to being on the stack. This change fixes this and adds a unit test for `PickFileAggregator` to catch this issue in the future. #3759 should already not be affected by this due to the different way it handles aggregation.
282a3b1
to
2d16b80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @anish-shanbhag and @itsbilal)
internal/manifest/annotator.go
line 104 at r3 (raw file):
lowerBound []byte, // upperBound is a UserKeyBoundary that may be inclusive or exclusive. upperBound *base.UserKeyBoundary,
what impact does this have on the existing annotation benchmarks without any ranges?
I expect there is some overhead to combining these two routines, and we might be better off duplicating the function (using mutual recursion when an entire subtree is contained within the bounds). We're also less likely to accidentally begin caching an annotation that does not apply across the node's width.
I think we could also avoid adding a new scratch
field to every annotation
, and have the caller pass in a *T
into which they want the value accumulated. Callers can avoid an allocation by allocating the T
with their own data types.
2d16b80
to
5761b73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)
internal/manifest/annotator.go
line 104 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
what impact does this have on the existing annotation benchmarks without any ranges?
I expect there is some overhead to combining these two routines, and we might be better off duplicating the function (using mutual recursion when an entire subtree is contained within the bounds). We're also less likely to accidentally begin caching an annotation that does not apply across the node's width.
I think we could also avoid adding a new
scratch
field to everyannotation
, and have the caller pass in a*T
into which they want the value accumulated. Callers can avoid an allocation by allocating theT
with their own data types.
You're right, there was some extra overhead introduced:
goos: darwin
goarch: arm64
pkg: github.com/cockroachdb/pebble/internal/manifest
│ old │ new │
│ sec/op │ sec/op vs base │
NumFilesAnnotator-10 1.536µ ± 1% 1.664µ ± 1% +8.33% (p=0.000 n=10)
│ old │ new │
│ B/op │ B/op vs base │
NumFilesAnnotator-10 536.0 ± 0% 537.0 ± 0% +0.19% (p=0.000 n=10)
│ old │ new │
│ allocs/op │ allocs/op vs base │
NumFilesAnnotator-10 7.000 ± 0% 7.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
Agreed that it's better to separate the two functions - I've updated with that change, and the logic is definitely looking cleaner.
I realized that we can just use a single scratch
field for the entire Annotator, rather than one for each annotation. My updated implementation takes that route, but let me know if you think there's still a benefit to allowing callers to pass a custom *T
.
5761b73
to
e375244
Compare
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files. This PR only sets up range annotations without changing any existing behavior. See cockroachdb#3793 for some potential use cases. `BenchmarkNumFilesRangeAnnotation` shows that range annotations are significantly faster than using `version.Overlaps` to aggregate over a key range: ``` pkg: github.com/cockroachdb/pebble/internal/manifest BenchmarkNumFilesRangeAnnotation/annotator-10 306010 4015 ns/op 48 B/op 6 allocs/op BenchmarkNumFilesRangeAnnotation/overlaps-10 2223 513519 ns/op 336 B/op 8 allocs/op ```
e375244
to
4776fa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Reviewed 1 of 13 files at r2, 1 of 2 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: 3 of 13 files reviewed, all discussions resolved (waiting on @itsbilal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 3 of 13 files reviewed, all discussions resolved (waiting on @itsbilal)
manifest: add range annotations
This change adds a "range annotation" feature to Annotators , which are computations that aggregate some value over a specific key range within a level. Range annotations use the same B-tree caching behavior as regular annotations, so queries remain fast even with thousands of tables because they avoid a sequential iteration over a level's files.
This PR only sets up range annotations without changing any existing behavior. See #3793 for some potential use cases.
BenchmarkNumFilesRangeAnnotation
shows that range annotations are significantly faster than usingversion.Overlaps
to aggregate over a key range: