-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colfetcher: limit batches by memory footprint #68084
Conversation
dd557b6
to
deef296
Compare
With this PR, for the two queries in the linked issue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colfetcher/cfetcher.go, line 335 at r1 (raw file):
var minCapacity int if rf.memAccounting.maxCapacity > 0 { // If we have already exceeded the memory limit by the output batch, we
[nit] by -> for (or maybe in?)
pkg/sql/colfetcher/cfetcher.go, line 825 at r1 (raw file):
// NextBatch processes keys until we complete one batch of rows (subject to the // limit hint, memory limit, max coldata.BatchSize() in length), which are
[nit] limit hint, memory limit, and max coldata.BatchSize()
Previously, in the cFetcher we were performing the memory accounting for the output batch at a batch granularity. In some cases this led to batches that significantly exceed the target memory limit. One notable example is the case when we have very wide rows (say large blob column) and an estimated row count - we would allocate the batch based on the estimated row count. Later on we would not realize that each row is large. This commit refactors the memory accounting to be at a row granularity. Whenever a row is finished, we perform the accounting for the last row set. We have to be a bit careful for this accounting to not have significant performance hit, so a new `Allocator.AccountForSet` method is introduced. It assumes that all fixed-length vectors have already been accounted for, handles bytes-like vectors in a special manner (with the help of the caller), and updates the account if there were any decimals or datum-backed values only for the last row. Release note: None
deef296
to
e2f145d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/sql/colfetcher/cfetcher.go, line 335 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] by -> for (or maybe in?)
Done.
pkg/sql/colfetcher/cfetcher.go, line 825 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] limit hint, memory limit, and max coldata.BatchSize()
Rephrased a bit.
Build succeeded: |
Previously, in the cFetcher we were performing the memory accounting for
the output batch at a batch granularity. In some cases this led to
batches that significantly exceed the target memory limit. One notable
example is the case when we have very wide rows (say large blob column)
and an estimated row count - we would allocate the batch based on the
estimated row count. Later on we would not realize that each row is
large.
This commit refactors the memory accounting to be at a row granularity.
Whenever a row is finished, we perform the accounting for the last row
set. We have to be a bit careful for this accounting to not have
significant performance hit, so a new
Allocator.AccountForSet
methodis introduced. It assumes that all fixed-length vectors have already
been accounted for, handles bytes-like vectors in a special manner (with
the help of the caller), and updates the account if there were any
decimals or datum-backed values only for the last row.
Addresses: #68008.
Release note: None