colfetcher: limit batches by memory footprint #68084

yuzefovich · 2021-07-27T00:32:52Z

Previously, in the cFetcher we were performing the memory accounting for
the output batch at a batch granularity. In some cases this led to
batches that significantly exceed the target memory limit. One notable
example is the case when we have very wide rows (say large blob column)
and an estimated row count - we would allocate the batch based on the
estimated row count. Later on we would not realize that each row is
large.

This commit refactors the memory accounting to be at a row granularity.
Whenever a row is finished, we perform the accounting for the last row
set. We have to be a bit careful for this accounting to not have
significant performance hit, so a new Allocator.AccountForSet method
is introduced. It assumes that all fixed-length vectors have already
been accounted for, handles bytes-like vectors in a special manner (with
the help of the caller), and updates the account if there were any
decimals or datum-backed values only for the last row.

Addresses: #68008.

Release note: None

cockroach-teamcity · 2021-07-27T00:32:58Z

This change is

yuzefovich · 2021-07-27T04:27:21Z

With this PR, for the two queries in the linked issue, max memory allocated went down from 1.5GiB to 480 MiB and from 518 MiB to 139 MiB. For the query with the sort, we still need to update the sorter to respect the memory limits better (this will require changing the contract of ResetMaybeReallocate), and I'll do it in a follow-up PR which will close the linked issue.

DrewKimball

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/colfetcher/cfetcher.go, line 335 at r1 (raw file):

	var minCapacity int
	if rf.memAccounting.maxCapacity > 0 {
		// If we have already exceeded the memory limit by the output batch, we

[nit] by -> for (or maybe in?)

pkg/sql/colfetcher/cfetcher.go, line 825 at r1 (raw file):

// NextBatch processes keys until we complete one batch of rows (subject to the
// limit hint, memory limit, max coldata.BatchSize() in length), which are

[nit] limit hint, memory limit, and max coldata.BatchSize()

Previously, in the cFetcher we were performing the memory accounting for the output batch at a batch granularity. In some cases this led to batches that significantly exceed the target memory limit. One notable example is the case when we have very wide rows (say large blob column) and an estimated row count - we would allocate the batch based on the estimated row count. Later on we would not realize that each row is large. This commit refactors the memory accounting to be at a row granularity. Whenever a row is finished, we perform the accounting for the last row set. We have to be a bit careful for this accounting to not have significant performance hit, so a new `Allocator.AccountForSet` method is introduced. It assumes that all fixed-length vectors have already been accounted for, handles bytes-like vectors in a special manner (with the help of the caller), and updates the account if there were any decimals or datum-backed values only for the last row. Release note: None

yuzefovich

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)

pkg/sql/colfetcher/cfetcher.go, line 335 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] by -> for (or maybe in?)

Done.

pkg/sql/colfetcher/cfetcher.go, line 825 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] limit hint, memory limit, and max coldata.BatchSize()

Rephrased a bit.

craig · 2021-07-27T18:30:53Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich force-pushed the cfetcher-batch branch from dd557b6 to deef296 Compare July 27, 2021 04:20

yuzefovich marked this pull request as ready for review July 27, 2021 04:27

yuzefovich requested a review from a team as a code owner July 27, 2021 04:27

yuzefovich requested a review from DrewKimball July 27, 2021 04:27

DrewKimball approved these changes Jul 27, 2021

View reviewed changes

yuzefovich force-pushed the cfetcher-batch branch from deef296 to e2f145d Compare July 27, 2021 15:58

yuzefovich commented Jul 27, 2021

View reviewed changes

craig bot merged commit b077a2f into cockroachdb:master Jul 27, 2021

yuzefovich deleted the cfetcher-batch branch July 27, 2021 18:31

erikgrinaker mentioned this pull request Aug 13, 2021

roachperf: July 28th performance regression on master #68887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colfetcher: limit batches by memory footprint #68084

colfetcher: limit batches by memory footprint #68084

yuzefovich commented Jul 27, 2021 •

edited

Loading

cockroach-teamcity commented Jul 27, 2021

yuzefovich commented Jul 27, 2021

DrewKimball left a comment

yuzefovich left a comment

craig bot commented Jul 27, 2021

colfetcher: limit batches by memory footprint #68084

colfetcher: limit batches by memory footprint #68084

Conversation

yuzefovich commented Jul 27, 2021 • edited Loading

cockroach-teamcity commented Jul 27, 2021

yuzefovich commented Jul 27, 2021

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Jul 27, 2021

yuzefovich commented Jul 27, 2021 •

edited

Loading