-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: fix hash aggregator when spilling to disk #63372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 16 files at r1, 3 of 3 files at r2, 11 of 11 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
This commit introduces nicer aliases for the specification of the aggregate functions and uses the aliases throughout the code base. Release note: None
This commit is only a test change. It cleans up the aggregator test cases in the following ways: - removing some of the defaults in favor of explicit setting (easier to read each test case in isolation) - reordering the fields to have uniform assignment order - inserting any_not_null aggregates for the cases when the input is ordered (this will be needed by the follow up commit that will enforce a particular order on the output). This change simulates how specs are created in the production. - removing a couple of impossible in production test cases (when some columns are unused). Release note: None
In some cases the aggregation is expected to maintain the required ordering in order to eliminate an explicit sort afterwards. It is always the case that the required ordering is a prefix of ordered grouping columns. With the introduction of disk spilling for the vectorized hash aggregator in 21.1 release the ordering was no longer maintained if the spilling occurs. In all previous cases (row-by-row processors and in-memory columnar operator) the ordering was maintained by construction, but with `hashBasedPartitioner` the ordering can be arbitrary. In order to fix this issue we now do what we did for the external distinct - we plan an external sort on top of the external hash aggregator to restore the required ordering. Note that this will only kick in if the spilling to disk occurred. This required changes to the AggregatorSpec to propagate the required output ordering. Release note (bug fix): In 21.1 alpha and beta releases CockroachDB could return the output in an incorrect order if the query containing hash aggregation was executed via the vectorized engine and spilling to temporary storage was required, in some cases.
Thanks for a quick review! bors r+ |
This PR was included in a batch that was canceled, it will be automatically retried |
Build succeeded: |
The third commit is responsible for noticeable regression in a micro-benchmark:
|
execinfrapb: introduce aliases for agg funcs and use everywhere
This commit introduces nicer aliases for the specification of the
aggregate functions and uses the aliases throughout the code base.
Release note: None
colexec: clean up aggregator test cases
This commit is only a test change. It cleans up the aggregator test
cases in the following ways:
read each test case in isolation)
ordered (this will be needed by the follow up commit that will enforce
a particular order on the output). This change simulates how specs are
created in the production.
columns are unused).
Release note: None
colexec: fix hash aggregator when spilling to disk
In some cases the aggregation is expected to maintain the required
ordering in order to eliminate an explicit sort afterwards. It is always
the case that the required ordering is a prefix of ordered grouping
columns. With the introduction of disk spilling for the vectorized hash
aggregator in 21.1 release the ordering was no longer maintained if the
spilling occurs. In all previous cases (row-by-row processors and
in-memory columnar operator) the ordering was maintained by
construction, but with
hashBasedPartitioner
the ordering can bearbitrary.
In order to fix this issue we now do what we did for the external
distinct - we plan an external sort on top of the external hash
aggregator to restore the required ordering. Note that this will only
kick in if the spilling to disk occurred. This required changes to the
AggregatorSpec to propagate the required output ordering.
Fixes: #63159.
Release note (bug fix): In 21.1 alpha and beta releases CockroachDB
could return the output in an incorrect order if the query containing
hash aggregation was executed via the vectorized engine and spilling to
temporary storage was required, in some cases.