Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexecagg: reduce the size of hash aggregates #74437

Merged
merged 4 commits into from
Jan 5, 2022

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jan 5, 2022

colexec: fix a recent bug with aggTypes

When we introduced the hash aggregation with partial order support, we
mistakenly removed the ordered aggregation from aggTypes slice that is
used in some tests as well as in the benchmarks. This is now fixed.

Release note: None

colexecagg: replace concrete slices with native type aliases

This commit replaces concrete slices (like []int64) with the
corresponding native type aliases (like coldata.Int64s). This allows
us to use inlined Set methods.

Release note: None

colexecagg: remove some redundant COPYVAL calls

This commit removes several execgen.COPYVAL calls that were redundant
because the first and the second argument are the same. These calls are
redundant because we already performed the same call right after
calling Get from the original vector and we will perform a deep copy
when calling Set next.

Release note: None

colexecagg: reduce the size of hash aggregates

This commit reduces the size of the hash aggregates by removing the
reference to the well-typed column (i.e. a concrete unwrapped
coldata.Vec, something like []int64). This is possible because the
hash aggregates only access the concrete column once, in Flush, so
there is no point in storing the concrete column as we do for the
ordered aggregates. We still perform the interface dispatch call only
once - previously it was in SetOutput, now it is in Flush. This
should be a non-trivial improvement since the hash aggregation uses
a separate aggregation function object for each bucket.

This change also allows us to remove the overriding of SetOutput
method implementation provided by the base struct from the hash and
window aggregates.

Release note: None

@yuzefovich yuzefovich requested review from rytaft, rharding6373 and a team January 5, 2022 00:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

As expected, this shows small improvement.

When we introduced the hash aggregation with partial order support, we
mistakenly removed the ordered aggregation from `aggTypes` slice that is
used in some tests as well as in the benchmarks. This is now fixed.

Release note: None
@yuzefovich
Copy link
Member Author

Found a recent minor bug with aggTypes. RFAL.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the bug! Nice work. :lgtm:

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)


-- commits, line 35 at r4:
nit: s/implimentation/implementation


pkg/sql/colexec/execgen/cmd/execgen/sum_agg_gen.go, line 31 at r2 (raw file):

	InputVecMethod string
	RetGoType      string
	RetGoTypeSlice string

Should the changes in this file be in the third commit?

This commit replaces concrete slices (like `[]int64`) with the
corresponding native type aliases (like `coldata.Int64s`). This allows
us to use inlined `Set` methods.

Release note: None
This commit removes several `execgen.COPYVAL` calls that were redundant
because the first and the second argument are the same. These calls are
redundant because we already performed the same call right after
calling `Get` from the original vector and we will perform a deep copy
when calling `Set` next.

Release note: None
This commit reduces the size of the hash aggregates by removing the
reference to the well-typed column (i.e. a concrete unwrapped
`coldata.Vec`, something like `[]int64`). This is possible because the
hash aggregates only access the concrete column once, in `Flush`, so
there is no point in storing the concrete column as we do for the
ordered aggregates. We still perform the interface dispatch call only
once - previously it was in `SetOutput`, now it is in `Flush`. This
should be a non-trivial improvement since the hash aggregation uses
a separate aggregation function object for each bucket.

This change also allows us to remove the overriding of `SetOutput`
method implementation provided by the base struct from the hash and
window aggregates.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rharding6373 and @rytaft)


pkg/sql/colexec/execgen/cmd/execgen/sum_agg_gen.go, line 31 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Should the changes in this file be in the third commit?

The idea behind these changes was to allow the usage of Set methods from the native type aliases in order to make the work in #74469 easier, and it didn't belong to any of the existing commits well, so I extracted a new separate commit for it.

@craig
Copy link
Contributor

craig bot commented Jan 5, 2022

Build succeeded:

@craig craig bot merged commit 00cb5d1 into cockroachdb:master Jan 5, 2022
@yuzefovich yuzefovich deleted the reduce-agg-size branch January 5, 2022 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants