-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: support 128 FastIntSet elements without allocation #72798
Conversation
5158448
to
9cbef7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, 13 of 13 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @nvanbenschoten, and @RaduBerinde)
pkg/sql/opt/xform/testdata/rules/join, line 185 at r4 (raw file):
SELECT * FROM abc, stu, xyz WHERE abc.a=stu.s AND stu.s=xyz.x ---- memo (optimized, ~42KB, required=[presentation: a:1,b:2,c:3,s:7,t:8,u:9,x:12,y:13,z:14])
Adding 8 bytes to FastIntSet increases memo size by 13.5%... Shows you how much we use FastIntSet.
pkg/util/fast_int_set.go, line 309 at r2 (raw file):
return (s.small & rhs.small) == s.small } if rhs.large != nil {
nit: should/could this be if !rhs.large.fitsInSmall() {
?
pkg/util/fast_int_set.go, line 322 at r2 (raw file):
// Fast path. if delta > 0 { if bits.LeadingZeros64(s.small)-(64-smallCutoff) >= delta {
A mysterious -(64-smallCutoff) => -(0)
😆
pkg/util/fast_int_set.go, line 343 at r2 (raw file):
// If the set fits in s.small, we encode a 0 followed by the bitmap values. // // Otherwise, we encode a length followed by each element.
nit: mention that Encode errors if the set contains negative integers.
pkg/util/fast_int_set.go, line 43 at r4 (raw file):
// bitmap implements a bitmap of size smallCutoff. type bitmap [bitmapWords]uint64
This blog post mentions that the compiler doesn't optimize [2]uint64 as well as struct { a, b uint64 }
due to golang/go#24416. Maybe their case was special and we won't run into the same problem, but it might be worth checking. The [2]uint64 approach is certainly preferable because the implementation is cleaner and it'll be easy to grow the small set in the future if needed.
pkg/util/fast_int_set.go, line 164 at r4 (raw file):
// ForEach calls a function for each value in the set (in increasing order). func (s FastIntSet) ForEach(f func(i int)) { if s.large != nil {
Is small set iteration generally faster than large set iteration? If so, we should change this to if s.fitsInSmall() {
.
pkg/util/fast_int_set.go, line 341 at r4 (raw file):
} for _, v := range s.small { if err := writeUint(v); err != nil {
Currently Encode
is only used for plan gists. Are gists ever persisted to disk? Are they ever encoded by one node and decoded by another? Both cases would require handling this new encoding format carefully.
pkg/util/fast_int_set.go, line 395 at r4 (raw file):
func (v bitmap) IsSet(i int) bool { return v[i>>6]&(1<<uint64(i&63)) != 0
😳
pkg/util/fast_int_set_test.go, line 71 at r1 (raw file):
// Cross-check Ordered and Next(). var vals []int // Start at the minimum vale, or before.
nit: vale => value
9cbef7c
to
ca7db83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, and @nvanbenschoten)
pkg/sql/opt/xform/testdata/rules/join, line 185 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Adding 8 bytes to FastIntSet increases memo size by 13.5%... Shows you how much we use FastIntSet.
Yeah, it's kind of unfortunate in increases this much..
pkg/util/fast_int_set.go, line 322 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
A mysterious
-(64-smallCutoff) => -(0)
😆
I think it was valid to set smallCutoff
to something smaller (for testing purposes).
pkg/util/fast_int_set.go, line 43 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This blog post mentions that the compiler doesn't optimize [2]uint64 as well as
struct { a, b uint64 }
due to golang/go#24416. Maybe their case was special and we won't run into the same problem, but it might be worth checking. The [2]uint64 approach is certainly preferable because the implementation is cleaner and it'll be easy to grow the small set in the future if needed.
I've been meaning to look at the generated assembly. I'll also check out some opt benchmarks.
pkg/util/fast_int_set.go, line 164 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is small set iteration generally faster than large set iteration? If so, we should change this to
if s.fitsInSmall() {
.
Probably very similar, but I changed it.
pkg/util/fast_int_set.go, line 341 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Currently
Encode
is only used for plan gists. Are gists ever persisted to disk? Are they ever encoded by one node and decoded by another? Both cases would require handling this new encoding format carefully.
@cucaroach can correct me but storing gists is WIP right now. I agree we definitely need some kind of versioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 12 of 14 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, @nvanbenschoten, and @RaduBerinde)
pkg/util/fast_int_set.go, line 341 at r4 (raw file):
Previously, RaduBerinde wrote…
@cucaroach can correct me but storing gists is WIP right now. I agree we definitely need some kind of versioning.
I think that's still TBD at this point but they probably will at some point. I think the versioning provided by gists should be enough though, if gists were already released we'd need to bump the gist versioning for this I think.
ca7db83
to
b21034f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, and @nvanbenschoten)
pkg/util/fast_int_set.go, line 341 at r4 (raw file):
Previously, cucaroach (Tommy Reilly) wrote…
I think that's still TBD at this point but they probably will at some point. I think the versioning provided by gists should be enough though, if gists were already released we'd need to bump the gist versioning for this I think.
I bumped the version to set a good precedent (and added a comment for Encode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 14 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, @nvanbenschoten, and @RaduBerinde)
pkg/util/fast_int_set.go, line 341 at r4 (raw file):
Previously, RaduBerinde wrote…
I bumped the version to set a good precedent (and added a comment for Encode).
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, @nvanbenschoten, and @RaduBerinde)
bors r+ |
Build failed: |
Did you get a chance to look at the assembly or run some opt benchmarks? No worries if not, but I'm curious to know the findings if there are any. |
Ah, it slipped my mind. Fortunately bors failed :) Adding the do-not-merge label so I don't forget again. |
This commit updates `NewMemBatchWithCapacity` to batch allocate all necessary `memColumn` objects in a single chunk. Outside of cockroachdb#72798, this was the single largest source of heap allocations by object count (`alloc_objects`) in TPC-E. With cockroachdb#72798 applied, calls to `NewMemColumn` were responsible for **3.18%** of total heap allocations in the workload: ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 355202097 97.92% | github.com/cockroachdb/cockroach/pkg/col/coldata.NewMemBatchWithCapacity /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/batch.go:123 7561654 2.08% | github.com/cockroachdb/cockroach/pkg/sql/colmem.(*Allocator).NewMemColumn /go/src/github.com/cockroachdb/cockroach/pkg/sql/colmem/allocator.go:217 362763751 3.18% 3.18% 362763751 3.18% | github.com/cockroachdb/cockroach/pkg/col/coldata.NewMemColumn /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/vec.go:202 ----------------------------------------------------------+------------- ``` Lower in the heap profile, we see that each of the other heap allocations that are performed once per call to `NewMemBatchWithCapacity` were observed 39625411 times. So we can estimate that the average batch contains `362763751 / 39625411 = 9.15` columns. This lines up with the improvement we see due to this change. Heap allocations due to `memColumn` objects drop by about a factor of 9, down to **0.33%** of all heap allocations: ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 12082615 100% | github.com/cockroachdb/cockroach/pkg/sql/colmem.(*Allocator).NewMemBatchWithFixedCapacity /go/src/github.com/cockroachdb/cockroach/pkg/sql/colmem/allocator.go:131 12082615 0.33% 38.80% 12082615 0.33% | github.com/cockroachdb/cockroach/pkg/col/coldata.NewMemBatchWithCapacity /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/batch.go:122 ----------------------------------------------------------+------------- ``` Despite this change, we will still probably want to address Jordan's TODO a few lines earlier about pooling all allocations it this level: https://github.com/cockroachdb/cockroach/blob/28bb1ea049da5bfb6e15a7003cd7b678cbc4b67f/pkg/col/coldata/batch.go#L113
This commit updates `DistSQLTypeResolver` to implement all interfaces on a pointer receiver instead of a value receiver. Implementing interfaces on values is rarely the right choice, as it forces a heap allocation whenever the object is boxed into an interface, instead of just forcing the pointer onto the heap once (on its own or as part of a larger object) and then storing the pointer in the interface header. Before this commit, the use of value receivers was causing `HydrateTypeSlice` to allocate. Outside of cockroachdb#72798 and cockroachdb#72961, this was the single largest source of heap allocations in TPC-E. With those two PRs applied, `HydrateTypeSlice` was accounting for **2.30%** of total heap allocations in the workload: ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 27722149 32.66% | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).InitWithEvalCtx /go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:790 27460002 32.36% | github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlowCreator).setupFlow.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:1097 21266755 25.06% | github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlowCreator).setupInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:818 8421503 9.92% | github.com/cockroachdb/cockroach/pkg/sql/colfetcher.populateTableArgs /go/src/github.com/cockroachdb/cockroach/pkg/sql/colfetcher/cfetcher_setup.go:174 84870409 2.30% 2.30% 84870409 2.30% | github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.DistSQLTypeResolver.HydrateTypeSlice /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/dist_sql_type_resolver.go:134 ----------------------------------------------------------+------------- ``` With this change, the heap allocation in `HydrateTypeSlice` disappears. With these three PRs combined, the largest source of heap allocations in the workload is `context.WithValue`, like the Go gods intended. ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 16723340 38.17% | github.com/cockroachdb/logtags.WithTags /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/logtags/context.go:34 7405899 16.90% | google.golang.org/grpc/peer.NewContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/peer/peer.go:44 3910493 8.93% | google.golang.org/grpc.NewContextWithServerTransportStream /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1672 3702950 8.45% | github.com/cockroachdb/cockroach/pkg/util/tracing.maybeWrapCtx /go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/context.go:80 3560952 8.13% | google.golang.org/grpc/metadata.NewIncomingContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/metadata/metadata.go:152 3342479 7.63% | google.golang.org/grpc.newContextWithRPCInfo /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/rpc_util.go:791 2938326 6.71% | google.golang.org/grpc/internal/credentials.NewRequestInfoContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/internal/credentials/credentials.go:29 1387235 3.17% | github.com/cockroachdb/cockroach/pkg/util/grpcutil.NewLocalRequestContext /go/src/github.com/cockroachdb/cockroach/pkg/util/grpcutil/grpc_util.go:39 655388 1.50% | github.com/cockroachdb/cockroach/pkg/sql.withStatement /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:3197 185693 0.42% | google.golang.org/grpc/metadata.NewOutgoingContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/metadata/metadata.go:159 43812755 2.20% 2.20% 43812755 2.20% | context.WithValue /usr/local/go/src/context/context.go:533 ----------------------------------------------------------+------------- ```
72841: sql: cleanup r=andreimatei a=andreimatei This code was unnecessarily constructing a struct. Release note: None 72961: col/coldata: batch allocate memColumn objects r=nvanbenschoten a=nvanbenschoten This commit updates `NewMemBatchWithCapacity` to batch allocate all necessary `memColumn` objects in a single chunk. Outside of #72798, this was the single largest source of heap allocations by object count (`alloc_objects`) in TPC-E. With #72798 applied, calls to `NewMemColumn` were responsible for **3.18%** of total heap allocations in the workload: ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 355202097 97.92% | github.com/cockroachdb/cockroach/pkg/col/coldata.NewMemBatchWithCapacity /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/batch.go:123 7561654 2.08% | github.com/cockroachdb/cockroach/pkg/sql/colmem.(*Allocator).NewMemColumn /go/src/github.com/cockroachdb/cockroach/pkg/sql/colmem/allocator.go:217 362763751 3.18% 3.18% 362763751 3.18% | github.com/cockroachdb/cockroach/pkg/col/coldata.NewMemColumn /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/vec.go:202 ----------------------------------------------------------+------------- ``` Lower in the heap profile, we see that each of the other heap allocations that are performed once per call to `NewMemBatchWithCapacity` were observed 39625411 times. So we can estimate that the average batch contains `362763751 / 39625411 = 9.15` columns. This lines up with the improvement we see due to this change. Heap allocations due to `memColumn` objects drop by about a factor of 9, down to **0.33%** of all heap allocations: ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 12082615 100% | github.com/cockroachdb/cockroach/pkg/sql/colmem.(*Allocator).NewMemBatchWithFixedCapacity /go/src/github.com/cockroachdb/cockroach/pkg/sql/colmem/allocator.go:131 12082615 0.33% 38.80% 12082615 0.33% | github.com/cockroachdb/cockroach/pkg/col/coldata.NewMemBatchWithCapacity /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/batch.go:122 ----------------------------------------------------------+------------- ``` Despite this change, we will still probably want to address Jordan's TODO a few lines earlier about pooling all allocations it this level: https://github.com/cockroachdb/cockroach/blob/28bb1ea049da5bfb6e15a7003cd7b678cbc4b67f/pkg/col/coldata/batch.go#L113 72962: sql/catalog/descs: implement interfaces on *DistSQLTypeResolver r=nvanbenschoten a=nvanbenschoten This commit updates `DistSQLTypeResolver` to implement all interfaces on a pointer receiver instead of a value receiver. Implementing interfaces on values is rarely the right choice, as it forces a heap allocation whenever the object is boxed into an interface, instead of just forcing the pointer onto the heap once (on its own or as part of a larger object) and then storing the pointer in the interface header. Before this commit, the use of value receivers was causing `HydrateTypeSlice` to allocate. Outside of #72798 and #72961, this was the single largest source of heap allocations in TPC-E. With those two PRs applied, `HydrateTypeSlice` was accounting for **2.30%** of total heap allocations in the workload: ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 27722149 32.66% | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).InitWithEvalCtx /go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:790 27460002 32.36% | github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlowCreator).setupFlow.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:1097 21266755 25.06% | github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlowCreator).setupInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:818 8421503 9.92% | github.com/cockroachdb/cockroach/pkg/sql/colfetcher.populateTableArgs /go/src/github.com/cockroachdb/cockroach/pkg/sql/colfetcher/cfetcher_setup.go:174 84870409 2.30% 2.30% 84870409 2.30% | github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.DistSQLTypeResolver.HydrateTypeSlice /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/dist_sql_type_resolver.go:134 ----------------------------------------------------------+------------- ``` With this change, the heap allocation in `HydrateTypeSlice` disappears. With these three PRs combined, the largest source of heap allocations in the workload is `context.WithValue`, like the Go gods intended. ``` ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 16723340 38.17% | github.com/cockroachdb/logtags.WithTags /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/logtags/context.go:34 7405899 16.90% | google.golang.org/grpc/peer.NewContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/peer/peer.go:44 3910493 8.93% | google.golang.org/grpc.NewContextWithServerTransportStream /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1672 3702950 8.45% | github.com/cockroachdb/cockroach/pkg/util/tracing.maybeWrapCtx /go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/context.go:80 3560952 8.13% | google.golang.org/grpc/metadata.NewIncomingContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/metadata/metadata.go:152 3342479 7.63% | google.golang.org/grpc.newContextWithRPCInfo /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/rpc_util.go:791 2938326 6.71% | google.golang.org/grpc/internal/credentials.NewRequestInfoContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/internal/credentials/credentials.go:29 1387235 3.17% | github.com/cockroachdb/cockroach/pkg/util/grpcutil.NewLocalRequestContext /go/src/github.com/cockroachdb/cockroach/pkg/util/grpcutil/grpc_util.go:39 655388 1.50% | github.com/cockroachdb/cockroach/pkg/sql.withStatement /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:3197 185693 0.42% | google.golang.org/grpc/metadata.NewOutgoingContext /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/metadata/metadata.go:159 43812755 2.20% 2.20% 43812755 2.20% | context.WithValue /usr/local/go/src/context/context.go:533 ----------------------------------------------------------+------------- ``` Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
The optimizer benchmarks don't look good.. Somehow a few queries show more allocations (which I can't explain right now), and even tests that don't involve
These are all with GOMAXPROCS=1. |
b21034f
to
d26223c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach, @mgartner, @nvanbenschoten, and @RaduBerinde)
pkg/util/fast_int_set.go, line 43 at r4 (raw file):
Previously, RaduBerinde wrote…
I've been meaning to look at the generated assembly. I'll also check out some opt benchmarks.
The benchmarks suggest we are indeed hitting a problem like that. I will try the two a, b uint64
approach when I have some time.
Very nice find by the way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach, @mgartner, @nvanbenschoten, and @RaduBerinde)
pkg/util/fast_int_set.go, line 43 at r4 (raw file):
Previously, RaduBerinde wrote…
The benchmarks suggest we are indeed hitting a problem like that. I will try the two
a, b uint64
approach when I have some time.Very nice find by the way!
Darn... That's unfortunate. Hopefully a, b uint64
works.
d26223c
to
7b1482f
Compare
I tried replacing the array with two uint64 fields (see the latest revision). Still seeing significant regressions.
|
Part of the regression is due to I updated the change to keep the old encoding (only use 64 bits). What I find baffling is that even benchmarks that don't actually involve that function are improved. I looked at the assembly difference of Encode and it's very small, simply one more function call. Maybe it's simply luck in terms of how the parsing code (which must be sizable) is laid out (eg w.r.t cachelines).
|
cf871ea
to
3f47291
Compare
I separated out the other changes and fixes and those are already merged. Here are the benchmarks for the current version (on a gceworker). Again, we see some regression in benchmarks that don't even use FastIntSets (like Parse).
|
3f47291
to
dccedad
Compare
The worst latency regressions are ops that take < 1ms. And the tests that use between 64 and 128 column IDs show a nice improvement:
Maybe the regressions are acceptable? |
Yes, I agree, the regressions seem more like a benchmarking artifact rather than something concerning. I would like to merge this if folks agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 17 files at r13, 5 of 10 files at r15, 1 of 1 files at r16, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @nvanbenschoten, and @RaduBerinde)
pkg/util/fast_int_set.go, line 181 at r16 (raw file):
f(64 + i) v &^= 1 << uint(i) }
nit: You've abstracted all the bit manipulation into bitmap methods elsewhere, so to follow that pattern you could move these loops into a bitmap.ForEach
method.
pkg/util/fast_int_set_testonly.go, line 34 at r16 (raw file):
type FastIntSet struct { // Used to keep the size of the struct the same. _ [2]uint64
nit: Doesn't make a difference in the size, but using _ struct{ lo, hi uint64 }
instead might be more clear to someone who comes along in the future.
This commit increases the size of the "small set" bitmap from 64 bits to 128 bits. The main goal is to avoid the slow path in optimizer ColSets for more queries. Fixes cockroachdb#72733. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach and @nvanbenschoten)
pkg/util/fast_int_set.go, line 181 at r16 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: You've abstracted all the bit manipulation into bitmap methods elsewhere, so to follow that pattern you could move these loops into a
bitmap.ForEach
method.
I'm mildly worried that it might reduce inlining, I'd want to run all the benchmarks again so I'd rather keep it as is.
dccedad
to
be0aa93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r17, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach and @nvanbenschoten)
pkg/util/fast_int_set.go, line 181 at r16 (raw file):
Previously, RaduBerinde wrote…
I'm mildly worried that it might reduce inlining, I'd want to run all the benchmarks again so I'd rather keep it as is.
Make sense. No need to go through the trouble of re-running benchmarks.
TFTR! bors r+ |
Build succeeded: |
util: support 128 FastIntSet elements without allocation
This commit increases the size of the "small set" bitmap from 64 bits
to 128 bits. The main goal is to avoid the slow path in optimizer
ColSets for more queries.
Fixes #72733.
Release note: None