Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colfetcher: limit batches by memory footprint #68084

Merged
merged 1 commit into from
Jul 27, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jul 27, 2021

Previously, in the cFetcher we were performing the memory accounting for
the output batch at a batch granularity. In some cases this led to
batches that significantly exceed the target memory limit. One notable
example is the case when we have very wide rows (say large blob column)
and an estimated row count - we would allocate the batch based on the
estimated row count. Later on we would not realize that each row is
large.

This commit refactors the memory accounting to be at a row granularity.
Whenever a row is finished, we perform the accounting for the last row
set. We have to be a bit careful for this accounting to not have
significant performance hit, so a new Allocator.AccountForSet method
is introduced. It assumes that all fixed-length vectors have already
been accounted for, handles bytes-like vectors in a special manner (with
the help of the caller), and updates the account if there were any
decimals or datum-backed values only for the last row.

Addresses: #68008.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

With this PR, for the two queries in the linked issue, max memory allocated went down from 1.5GiB to 480 MiB and from 518 MiB to 139 MiB. For the query with the sort, we still need to update the sorter to respect the memory limits better (this will require changing the contract of ResetMaybeReallocate), and I'll do it in a follow-up PR which will close the linked issue.

@yuzefovich yuzefovich marked this pull request as ready for review July 27, 2021 04:27
@yuzefovich yuzefovich requested a review from a team as a code owner July 27, 2021 04:27
@yuzefovich yuzefovich requested a review from DrewKimball July 27, 2021 04:27
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colfetcher/cfetcher.go, line 335 at r1 (raw file):

	var minCapacity int
	if rf.memAccounting.maxCapacity > 0 {
		// If we have already exceeded the memory limit by the output batch, we

[nit] by -> for (or maybe in?)


pkg/sql/colfetcher/cfetcher.go, line 825 at r1 (raw file):

// NextBatch processes keys until we complete one batch of rows (subject to the
// limit hint, memory limit, max coldata.BatchSize() in length), which are

[nit] limit hint, memory limit, and max coldata.BatchSize()

Previously, in the cFetcher we were performing the memory accounting for
the output batch at a batch granularity. In some cases this led to
batches that significantly exceed the target memory limit. One notable
example is the case when we have very wide rows (say large blob column)
and an estimated row count - we would allocate the batch based on the
estimated row count. Later on we would not realize that each row is
large.

This commit refactors the memory accounting to be at a row granularity.
Whenever a row is finished, we perform the accounting for the last row
set. We have to be a bit careful for this accounting to not have
significant performance hit, so a new `Allocator.AccountForSet` method
is introduced. It assumes that all fixed-length vectors have already
been accounted for, handles bytes-like vectors in a special manner (with
the help of the caller), and updates the account if there were any
decimals or datum-backed values only for the last row.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/sql/colfetcher/cfetcher.go, line 335 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] by -> for (or maybe in?)

Done.


pkg/sql/colfetcher/cfetcher.go, line 825 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] limit hint, memory limit, and max coldata.BatchSize()

Rephrased a bit.

@craig
Copy link
Contributor

craig bot commented Jul 27, 2021

Build succeeded:

@craig craig bot merged commit b077a2f into cockroachdb:master Jul 27, 2021
@yuzefovich yuzefovich deleted the cfetcher-batch branch July 27, 2021 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants