-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: some optimizations #47942
colexec: some optimizations #47942
Conversation
Third commit removes the allocations I mistakenly introduced:
Compare it to the benchmarks of logical plumbing PR from here:
Thanks @jordanlewis |
❌ The GitHub CI (Cockroach) build has failed on c4c176f8. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Looks good. I don't think we should forgo the small group size stuff, but I understand that with the current algorithm it would be hard to get small groups right. We need to think more about this though. |
I have some more improvements coming up :) (nothing drastic though) |
4aa9c80
to
3baf549
Compare
I removed the comments that were describing the benchmarks of my WIP. Some observations from those comments:
The second commit removes the buffering stage which reduces the maximum length of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 7 of 7 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @yuzefovich)
pkg/sql/colexec/hash_aggregator.go, line 195 at r1 (raw file):
// We picked value this as the result of our benchmark. tupleLimit := coldata.BatchSize() * 2
This is interesting. Based on previous benchmarks, it seemed like it was good to have a buffering stage. What changed?
pkg/sql/colexec/hash_aggregator.go, line 374 at r1 (raw file):
for selIdx, hashCode := range hashBuffer { selsSlot := -1 for slot, hash := range op.scratch.hashCodeForSelsSlot {
maybe add a comment as to why we're not using a map
pkg/sql/colexec/hash_aggregator.go, line 447 at r2 (raw file):
} const hashAggFuncsAllocSize = 128
What's the perf+mem difference with using 1024?
This commit switches usage of `map` to iteration over `[]uint64` when building selection vectors in the hash aggregator. This is a lot more efficient when group sizes are relatively large with moderate hit when group sizes are small. This hit is reduced in a follow-up commit. Release note: None
49101c6
to
d24f5d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @Azhng)
pkg/sql/colexec/hash_aggregator.go, line 195 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
This is interesting. Based on previous benchmarks, it seemed like it was good to have a buffering stage. What changed?
Not sure if it is the only reason, but the first commit here removes the lookup in a map (which has amortized O(1)
cost) in favor of linear search in a slice (which has O(distinct buffered tuples)
cost), so if we buffer up several batches, that cost will increase, especially in case of small group sizes.
pkg/sql/colexec/hash_aggregator.go, line 374 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
maybe add a comment as to why we're not using a map
Added.
pkg/sql/colexec/hash_aggregator.go, line 447 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
What's the perf+mem difference with using 1024?
The comparison of 128 (old) vs 1024 (new) is here. Seems like there is no noticeable change in memory allocations, but there is a minor hit in performance with small group sizes (which is somewhat surprising to me).
I ran a few other comparisons: 128 vs 32 and 128 vs 64
Maybe 64 would be best? I feel like it is a nicer number than 128, but there is not much difference between the two in the benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @Azhng, and @yuzefovich)
pkg/sql/colexec/hash_aggregator.go, line 447 at r2 (raw file):
Previously, yuzefovich wrote…
The comparison of 128 (old) vs 1024 (new) is here. Seems like there is no noticeable change in memory allocations, but there is a minor hit in performance with small group sizes (which is somewhat surprising to me).
I ran a few other comparisons: 128 vs 32 and 128 vs 64
Maybe 64 would be best? I feel like it is a nicer number than 128, but there is not much difference between the two in the benchmarks.
Let's do 64 if there's not a big boost to using 128. Maybe also add a comment about how you got to this number.
This commit removes the buffering stage of the hash aggregator as well as removes the "append only" scratch batch that we're currently using. The removal of buffering stage allows us to have smaller buffers without sacrificing the performance. The removal of the scratch batch allows to avoid copying over the data from the input batch and using that input batch directly. We will be descructively modifying the selection vector on that batch, but such behavior is acceptable because hash aggregator owns the output batch, and the input batch will not be propagated further. This commit also bumps `hashAggFuncsAllocSize` from 16 to 64 which gives us minor performance improvement in case of small group sizes. Release note: None
In a recent PR (for logical types plumbing) I introduced some unnecessary allocations for unhandled type case - by taking a pointer from a value in `[]types.T` slice. This commit fixes that. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @Azhng)
pkg/sql/colexec/hash_aggregator.go, line 447 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Let's do 64 if there's not a big boost to using 128. Maybe also add a comment about how you got to this number.
Done.
Build succeeded |
colexec: remove one of the Go maps from hash aggregator
This commit switches usage of
map
to iteration over[]uint64
whenbuilding selection vectors in the hash aggregator. This is a lot more
efficient when group sizes are relatively large with moderate hit when
group sizes are small. This hit is reduced in a follow-up commit.
Release note: None
colexec: more improvements to hash aggregator
This commit removes the buffering stage of the hash aggregator as well
as removes the "append only" scratch batch that we're currently using.
The removal of buffering stage allows us to have smaller buffers without
sacrificing the performance. The removal of the scratch batch allows to
avoid copying over the data from the input batch and using that input
batch directly. We will be descructively modifying the selection vector
on that batch, but such behavior is acceptable because hash aggregator
owns the output batch, and the input batch will not be propagated
further.
This commit also bumps
hashAggFuncsAllocSize
from 16 to 64 whichgives us minor performance improvement in case of small group sizes.
Release note: None
colexec: remove some allocations
In a recent PR (for logical types plumbing) I introduced some
unnecessary allocations for unhandled type case - by taking a pointer
from a value in
[]types.T
slice. This commit fixes that.Release note: None