colmem: improve memory accounting when memory limit is exceeded #86357

yuzefovich · 2022-08-17T23:54:45Z

colmem: improve memory accounting when memory limit is exceeded

This commit improves the memory accounting when memory limit is
exceeded. Previously, in several operators we could run into a situation
where we perform some allocations and run into a memory limit error
later, which results in those allocations being unaccounted for. In some
cases this is acceptable (when the query results in an error), but in
others the memory error is caught and spilling to disk occurs. In the
latter scenarios we would under-account, and this commit fixes most of
such situations.

Now, each disk-spilling operator instantiates a "limited" allocator that
will grow an unlimited memory account when a memory error is
encountered. The idea is that even though the denied allocations cannot
be registered with the main memory account (meaning the operator has
exceeded its memory limit), we still will draw from the
--max-sql-memory pool since the allocations can be live for
non-trivial amount of time. If an error occurs when growing the
unlimited memory account, then that error is returned (not the original
memory error) so that the disk spiller doesn't catch it.

This commit audits all operators in execplan to use the limited
allocator where appropriate. The new accounting method is only used in
a handful of places which cover most of the use cases. The goal is to
make this commit backportable whereas the follow-up commit will audit
usages of AdjustMemoryUsage and will not be backported.

Addresses: #64906.
Fixes: #86351.
Addresses: https://github.com/cockroachlabs/support/issues/1762.

Release justification: bug fix.

Release note: None

colmem: audit callers of AdjustMemoryUsage

This commit audits all callers of Allocator.AdjustMemoryUsage to use
the newly-exported AdjustMemoryUsageAfterAllocation where applicable
(meaning that if an allocation occurs before the method is called, then
the new method is now used). In many cases this won't result in a change
in the behavior since the allocators are not instantiated with limited
memory accounts, but in some cases it is still useful.

Release justification: bug fix.

Release note: None

cockroach-teamcity · 2022-08-17T23:54:52Z

This change is

michae2

WDYT of pushing this down into BoundAccount and BytesMonitor? Seems like there could be other cases where we're accounting after-the-fact and would want to both throw an error and keep track of the overflow.

Reviewed 1 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)

yuzefovich · 2022-08-19T19:35:02Z

Hm, I'm quite certain there are other cases like this, but that would be too large of a refactor and would make the API of mon package less clear.

Currently, the contract is such that you're supposed to Grow an account before making any allocation that you want to track against that account, and that seems reasonable. I think it should be up to the user to respect that contract or - like this PR does - perform some additional actions when the contract is consciously broken. In the vectorized engine we do this "after the fact" accounting out of convenience as well as because of unpredictability of some allocations. I think the better approach would be to make the allocations deterministic as Drew suggested here, and then we'd be able to always adhere to the contract.

michae2

Yes, I think it would mean expanding the contract a bit to include the possibility of debt.

It might be as simple as a change to BytesMonitor.reserveBytes to include the possibility of BytesMonitor.mu.curAllocated > BytesMonitor.limit.

I think we could keep the refactor small, by only doing it with new methods and/or new types and leaving the existing code alone. I.e. keep BoundAccount.Grow the same, and add BoundAccount.GrowWithOverdraft (or some other better name) that can put the account / monitor into debt. (Or maybe add a new type BoundOverdraftAccount or BoundCreditCard that has its own Grow if a type change would be simpler than a method change.) Something like this:

diff --git a/pkg/util/mon/bytes_usage.go b/pkg/util/mon/bytes_usage.go
index 4d2f9246eb..f328ef3165 100644
--- a/pkg/util/mon/bytes_usage.go
+++ b/pkg/util/mon/bytes_usage.go
@@ -710,6 +710,25 @@ func (b *BoundAccount) Grow(ctx context.Context, x int64) error {
        return nil
 }

+func (b *BoundAccount) GrowWithOverdraft(ctx context.Context, x int64) error {
+       if b == nil {
+               return nil
+       }
+       if b.Mu != nil {
+               b.Mu.Lock()
+               defer b.Mu.Unlock()
+       }
+       var err error
+       if b.reserved < x {
+               minExtra := b.mon.roundSize(x - b.reserved)
+               err = b.mon.reserveBytes(ctx, minExtra, true)
+               b.reserved += minExtra
+       }
+       b.reserved -= x
+       b.used += x
+       return err
+}
+
 // Shrink releases part of the cumulated allocations by the specified size.
 //
 // If Mu is set, it is safe for use by concurrent goroutines.
@@ -738,7 +757,7 @@ func (b *BoundAccount) Shrink(ctx context.Context, delta int64) {
 // reserveBytes declares an allocation to this monitor. An error is returned if
 // the allocation is denied.
 // x must be a multiple of `poolAllocationSize`.
-func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64) error {
+func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64, overdraft bool) error {
        mm.mu.Lock()
        defer mm.mu.Unlock()
        // Check the local limit first. NB: The condition is written in this manner
@@ -747,10 +766,14 @@ func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64) error {
        //
        // TODO(knz): make the monitor name reportable in telemetry, after checking
        // that the name is never constructed from user data.
+       var err error
        if mm.mu.curAllocated > mm.limit-x {
-               return errors.Wrapf(
+               err = errors.Wrapf(
                        mm.resource.NewBudgetExceededError(x, mm.mu.curAllocated, mm.limit), "%s", mm.name,
                )
+               if !overdraft {
+                       return err
+               }
        }
        // Check whether we need to request an increase of our budget.
        if mm.mu.curAllocated > mm.mu.curBudget.used+mm.reserved.used-x {
@@ -786,7 +809,7 @@ func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64) error {
                log.Infof(ctx, "%s: now at %d bytes (+%d) - %s",
                        mm.nameWithPointer, mm.mu.curAllocated, x, util.GetSmallTrace(3))
        }
-       return nil
+       return err
 }

 // releaseBytes releases bytes previously successfully registered via

Is that too scary? Maybe I'm missing something? I don't want to block your work, but it also seems a little silly to allocate another monitor and account for something that the original monitor and account could track without very many changes. But maybe I'm minimizing it or glossing over some complexity?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)

DrewKimball

Another possibility to consider would be to add a method to the BufferingInMemoryOperator interface that releases the memory reserved in the limited account and accounts for all memory used by the in-memory operator in an unlimited disk-spiller account, since the disk-spiller should "know" how long the in-memory operator's allocations will live.

If we want to stick with this approach though, the changes

Reviewed 10 of 10 files at r1, 42 of 42 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/colexec/sort.go line 341 at r3 (raw file):

		// Finalize the accounting in case the capacity of the new slice is
		// larger than we asked for.
		extraCap := cap(p.order) - len(p.order)

Is it possible for the capacity to be larger than the length requested with make?

This commit improves the memory accounting when memory limit is exceeded. Previously, in several operators we could run into a situation where we perform some allocations and run into a memory limit error later, which results in those allocations being unaccounted for. In some cases this is acceptable (when the query results in an error), but in others the memory error is caught and spilling to disk occurs. In the latter scenarios we would under-account, and this commit fixes most of such situations. Now, each disk-spilling operator instantiates a "limited" allocator that will grow an unlimited memory account when a memory error is encountered. The idea is that even though the denied allocations cannot be registered with the main memory account (meaning the operator has exceeded its memory limit), we still will draw from the `--max-sql-memory` pool since the allocations can be live for non-trivial amount of time. If an error occurs when growing the unlimited memory account, then that error is returned (not the original memory error) so that the disk spiller doesn't catch it. This commit audits all operators in `execplan` to use the limited allocator where appropriate. The new accounting method is only used in a handful of places which cover most of the use cases. The goal is to make this commit backportable whereas the follow-up commit will audit usages of `AdjustMemoryUsage` and will not be backported. Release justification: bug fix. Release note: None

This commit audits all callers of `Allocator.AdjustMemoryUsage` to use the newly-exported `AdjustMemoryUsageAfterAllocation` where applicable (meaning that if an allocation occurs before the method is called, then the new method is now used). In many cases this won't result in a change in the behavior since the allocators are not instantiated with limited memory accounts, but in some cases it is still useful. Release justification: bug fix. Release note: None

michae2 · 2022-08-22T17:18:36Z

(@yuzefovich to be clear, I can be argued out of this idea, if you don't want to do it. I just wanted to flesh it out a bit.)

yuzefovich

It might be as simple as a change to BytesMonitor.reserveBytes to include the possibility of BytesMonitor.mu.curAllocated > BytesMonitor.limit.

Hm, it's a good point. It's a bit difficult to reason about whether anything will break because of it (e.g. are all of the bytes correctly released when the account is closed if the overdraft occurred).

I started going down this route, and TestAdjustMemoryUsage (that is being added in this PR) does expose a problem - allowing the monitor going into debt without making some adjustments to the memory account that is putting the monitor in debt breaks the synchronization between the two. In other words, we can now arrive at a situation where the account thinks that it is using 64MiB but it has consumed more from the monitor, so when the account is closed, not all of the bytes are correctly released, so we have a "memory accounting leak". If we do make the account update its used field even when the monitor returns an error, then we end up in a problematic situation when the root monitor returns the error. We would have to examine the error message and update the limited account only if the error message comes from the limited monitor, and not the unlimited one.

So after all I think this idea makes the implementation more complicated and more error-prone, but offers somewhat cleaner API. I'm inclined to keep my original approach since it seems safer and thus can be backported.

Another possibility to consider would be to add a method to the BufferingInMemoryOperator interface that releases the memory reserved in the limited account and accounts for all memory used by the in-memory operator in an unlimited disk-spiller account, since the disk-spiller should "know" how long the in-memory operator's allocations will live.

That would only solve the problem this PR attempts to solve if the disk-backed operator also recomputed the size of all live allocations. In other words, if the limited memory account under-accounts, then it would not matter if we transfer that under-accounted reservation to the unlimited account of the disk-backed operator. We would need to ask the disk-backed op to calculate the precise footprint of all allocations it is taking over and clear the limited account.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)

pkg/sql/colexec/sort.go line 341 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Is it possible for the capacity to be larger than the length requested with make?

Yeah, you're right - it shouldn't be possible, removed this commit. My mind was in the "append-only" mode when I wrote that commit :)

yuzefovich · 2022-08-22T20:49:36Z

I see two thumbs-ups, so I'm assuming we're all ok with this change.

TFTRs!

bors r+

craig · 2022-08-22T21:40:46Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-08-23T01:42:13Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-08-23T06:06:51Z

Build succeeded:

Bazel Essential CI (Cockroach)

yuzefovich force-pushed the allocator branch 6 times, most recently from 30269ce to 984487b Compare August 18, 2022 21:17

yuzefovich marked this pull request as ready for review August 18, 2022 21:22

yuzefovich requested a review from a team as a code owner August 18, 2022 21:22

yuzefovich requested review from DrewKimball and michae2 August 18, 2022 21:22

michae2 reviewed Aug 19, 2022

View reviewed changes

DrewKimball approved these changes Aug 19, 2022

View reviewed changes

yuzefovich added 2 commits August 22, 2022 08:57

yuzefovich force-pushed the allocator branch from 984487b to 256223f Compare August 22, 2022 17:53

yuzefovich commented Aug 22, 2022

View reviewed changes

craig bot merged commit cf533b0 into cockroachdb:master Aug 23, 2022

yuzefovich deleted the allocator branch August 23, 2022 13:55

yuzefovich mentioned this pull request Aug 23, 2022

release-22.1: colmem: improve memory accounting when memory limit is exceeded #86663

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colmem: improve memory accounting when memory limit is exceeded #86357

colmem: improve memory accounting when memory limit is exceeded #86357

yuzefovich commented Aug 17, 2022 •

edited

Loading

cockroach-teamcity commented Aug 17, 2022

michae2 left a comment

yuzefovich commented Aug 19, 2022

michae2 left a comment

DrewKimball left a comment

michae2 commented Aug 22, 2022

yuzefovich left a comment

yuzefovich commented Aug 22, 2022

craig bot commented Aug 22, 2022

craig bot commented Aug 23, 2022

craig bot commented Aug 23, 2022

colmem: improve memory accounting when memory limit is exceeded #86357

colmem: improve memory accounting when memory limit is exceeded #86357

Conversation

yuzefovich commented Aug 17, 2022 • edited Loading

cockroach-teamcity commented Aug 17, 2022

michae2 left a comment

Choose a reason for hiding this comment

yuzefovich commented Aug 19, 2022

michae2 left a comment

Choose a reason for hiding this comment

DrewKimball left a comment

Choose a reason for hiding this comment

michae2 commented Aug 22, 2022

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Aug 22, 2022

craig bot commented Aug 22, 2022

craig bot commented Aug 23, 2022

craig bot commented Aug 23, 2022

yuzefovich commented Aug 17, 2022 •

edited

Loading