-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexecagg: reduce the size of hash aggregates #74437
Conversation
21f1ccf
to
e1e6d11
Compare
As expected, this shows small improvement. |
When we introduced the hash aggregation with partial order support, we mistakenly removed the ordered aggregation from `aggTypes` slice that is used in some tests as well as in the benchmarks. This is now fixed. Release note: None
e1e6d11
to
a6ec0aa
Compare
Found a recent minor bug with |
a6ec0aa
to
ff0442b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the bug! Nice work.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)
-- commits, line 35 at r4:
nit: s/implimentation/implementation
pkg/sql/colexec/execgen/cmd/execgen/sum_agg_gen.go, line 31 at r2 (raw file):
InputVecMethod string RetGoType string RetGoTypeSlice string
Should the changes in this file be in the third commit?
This commit replaces concrete slices (like `[]int64`) with the corresponding native type aliases (like `coldata.Int64s`). This allows us to use inlined `Set` methods. Release note: None
This commit removes several `execgen.COPYVAL` calls that were redundant because the first and the second argument are the same. These calls are redundant because we already performed the same call right after calling `Get` from the original vector and we will perform a deep copy when calling `Set` next. Release note: None
This commit reduces the size of the hash aggregates by removing the reference to the well-typed column (i.e. a concrete unwrapped `coldata.Vec`, something like `[]int64`). This is possible because the hash aggregates only access the concrete column once, in `Flush`, so there is no point in storing the concrete column as we do for the ordered aggregates. We still perform the interface dispatch call only once - previously it was in `SetOutput`, now it is in `Flush`. This should be a non-trivial improvement since the hash aggregation uses a separate aggregation function object for each bucket. This change also allows us to remove the overriding of `SetOutput` method implementation provided by the base struct from the hash and window aggregates. Release note: None
ff0442b
to
2f50c4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rharding6373 and @rytaft)
pkg/sql/colexec/execgen/cmd/execgen/sum_agg_gen.go, line 31 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Should the changes in this file be in the third commit?
The idea behind these changes was to allow the usage of Set
methods from the native type aliases in order to make the work in #74469 easier, and it didn't belong to any of the existing commits well, so I extracted a new separate commit for it.
Build succeeded: |
colexec: fix a recent bug with aggTypes
When we introduced the hash aggregation with partial order support, we
mistakenly removed the ordered aggregation from
aggTypes
slice that isused in some tests as well as in the benchmarks. This is now fixed.
Release note: None
colexecagg: replace concrete slices with native type aliases
This commit replaces concrete slices (like
[]int64
) with thecorresponding native type aliases (like
coldata.Int64s
). This allowsus to use inlined
Set
methods.Release note: None
colexecagg: remove some redundant COPYVAL calls
This commit removes several
execgen.COPYVAL
calls that were redundantbecause the first and the second argument are the same. These calls are
redundant because we already performed the same call right after
calling
Get
from the original vector and we will perform a deep copywhen calling
Set
next.Release note: None
colexecagg: reduce the size of hash aggregates
This commit reduces the size of the hash aggregates by removing the
reference to the well-typed column (i.e. a concrete unwrapped
coldata.Vec
, something like[]int64
). This is possible because thehash aggregates only access the concrete column once, in
Flush
, sothere is no point in storing the concrete column as we do for the
ordered aggregates. We still perform the interface dispatch call only
once - previously it was in
SetOutput
, now it is inFlush
. Thisshould be a non-trivial improvement since the hash aggregation uses
a separate aggregation function object for each bucket.
This change also allows us to remove the overriding of
SetOutput
method implementation provided by the base struct from the hash and
window aggregates.
Release note: None