Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hash agg #76421

Closed
wants to merge 3 commits into from
Closed

Fix hash agg #76421

wants to merge 3 commits into from

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Feb 11, 2022

colexec: deeply reset datum-backed vectors in ResetInternalBatch

Previously, we would keep references to the old datums set in the
datum-backed vectors until the datums are overwritten, thus, extending
the time period when the no-longer-used datums are live unnecessarily.
We don't have to do that for any other types because for others we can
actually reuse the old space for new elements, so we want to keep them
around (e.g. decimals can reuse the non-inlined coefficient). This
commit makes it so that we deeply unset the datums in
ResetInternalBatch.

Care had to be taken to keep the memory accounting up-to-date. In
particular, after deeply resetting the datum-backed vector, we want to
release the memory allocation that was taken up by the actual datums
while keeping the overhead of tree.Datum interface in (since
[]tree.Datum slice is still fully live).

Release note: None

colexec: fix limiting the output batch in size in several operators

In the cFetcher, the hash aggregator, and the ordered synchronizer we
are performing careful memory accounting in order to limit the size of
the output batches.

However, we had an incorrect assumption that could lead to batches
being larger than desired. Previously, the first time the batch exceeded
the target size, we would remember its "capacity" (i.e. the number of
rows that fit into that batch), and in the future we would always put up
to that number of rows in the batch. That works well when each row is of
roughly the same size throughout the lifetime of an operator; however,
that is not always the case. Imagine we have 1000 small rows followed by
1000 large rows - previously, all 1000 large rows would be put into
a single batch, significantly exceeding the target size.

This commit makes the limiting much more sane - it removes the notion of
"max capacity" and, instead, after each row is set in the batch we check
whether the batch has exceeded the target size.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Previously, we would keep references to the old datums set in the
datum-backed vectors until the datums are overwritten, thus, extending
the time period when the no-longer-used datums are live unnecessarily.
We don't have to do that for any other types because for others we can
actually reuse the old space for new elements, so we want to keep them
around (e.g. decimals can reuse the non-inlined coefficient). This
commit makes it so that we deeply unset the datums in
`ResetInternalBatch`.

Care had to be taken to keep the memory accounting up-to-date. In
particular, after deeply resetting the datum-backed vector, we want to
release the memory allocation that was taken up by the actual datums
while keeping the overhead of `tree.Datum` interface in (since
`[]tree.Datum` slice is still fully live).

Release note: None
In the cFetcher, the hash aggregator, and the ordered synchronizer we
are performing careful memory accounting in order to limit the size of
the output batches.

However, we had an incorrect assumption that could lead to batches
being larger than desired. Previously, the first time the batch exceeded
the target size, we would remember its "capacity" (i.e. the number of
rows that fit into that batch), and in the future we would always put up
to that number of rows in the batch. That works well when each row is of
roughly the same size throughout the lifetime of an operator; however,
that is not always the case. Imagine we have 1000 small rows followed by
1000 large rows - previously, all 1000 large rows would be put into
a single batch, significantly exceeding the target size.

This commit makes the limiting much more sane - it removes the notion of
"max capacity" and, instead, after each row is set in the batch we check
whether the batch has exceeded the target size.

Release note: None
The problem in the previous commit is that if bytes-like and/or decimal
vectors are present, and once the footprint of the batch exceeds the
target size because of bytes-like/decimal vectors, then after
`ResetMaybeReallocate` call `AccountForSet` will only allow a single
row.

The problem is that we're holding onto old bytes/decimal values in order
to reuse their space, but then our heuristic of no longer allowing more
Sets once the reported usage exceeds the target size doesn't work.

One idea is to not account for the bytes-like vectors in
`ResetMaybeReallocate` and then use proportional size. The problem with
this is that we're obviously not accounting for possibly large
allocations (for some time).

Another idea is to only apply the heuristic of making the batch full if
the reported memory usage exceeds the target size IFF there was an
increase in memory usage. This works well for datum-backed vectors. For
bytes-like/decimals, however, we could have a pathological behavior
where the sizes of the batches keep on growing until
`coldata.BatchSize()` capacity is reached.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants