Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colmem: improve memory accounting when memory limit is exceeded #86357

Merged
merged 2 commits into from
Aug 23, 2022

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Aug 17, 2022

colmem: improve memory accounting when memory limit is exceeded

This commit improves the memory accounting when memory limit is
exceeded. Previously, in several operators we could run into a situation
where we perform some allocations and run into a memory limit error
later, which results in those allocations being unaccounted for. In some
cases this is acceptable (when the query results in an error), but in
others the memory error is caught and spilling to disk occurs. In the
latter scenarios we would under-account, and this commit fixes most of
such situations.

Now, each disk-spilling operator instantiates a "limited" allocator that
will grow an unlimited memory account when a memory error is
encountered. The idea is that even though the denied allocations cannot
be registered with the main memory account (meaning the operator has
exceeded its memory limit), we still will draw from the
--max-sql-memory pool since the allocations can be live for
non-trivial amount of time. If an error occurs when growing the
unlimited memory account, then that error is returned (not the original
memory error) so that the disk spiller doesn't catch it.

This commit audits all operators in execplan to use the limited
allocator where appropriate. The new accounting method is only used in
a handful of places which cover most of the use cases. The goal is to
make this commit backportable whereas the follow-up commit will audit
usages of AdjustMemoryUsage and will not be backported.

Addresses: #64906.
Fixes: #86351.
Addresses: https://github.com/cockroachlabs/support/issues/1762.

Release justification: bug fix.

Release note: None

colmem: audit callers of AdjustMemoryUsage

This commit audits all callers of Allocator.AdjustMemoryUsage to use
the newly-exported AdjustMemoryUsageAfterAllocation where applicable
(meaning that if an allocation occurs before the method is called, then
the new method is now used). In many cases this won't result in a change
in the behavior since the allocators are not instantiated with limited
memory accounts, but in some cases it is still useful.

Release justification: bug fix.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the allocator branch 6 times, most recently from 30269ce to 984487b Compare August 18, 2022 21:17
@yuzefovich yuzefovich marked this pull request as ready for review August 18, 2022 21:22
@yuzefovich yuzefovich requested a review from a team as a code owner August 18, 2022 21:22
Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of pushing this down into BoundAccount and BytesMonitor? Seems like there could be other cases where we're accounting after-the-fact and would want to both throw an error and keep track of the overflow.

Reviewed 1 of 10 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)

@yuzefovich
Copy link
Member Author

Hm, I'm quite certain there are other cases like this, but that would be too large of a refactor and would make the API of mon package less clear.

Currently, the contract is such that you're supposed to Grow an account before making any allocation that you want to track against that account, and that seems reasonable. I think it should be up to the user to respect that contract or - like this PR does - perform some additional actions when the contract is consciously broken. In the vectorized engine we do this "after the fact" accounting out of convenience as well as because of unpredictability of some allocations. I think the better approach would be to make the allocations deterministic as Drew suggested here, and then we'd be able to always adhere to the contract.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would mean expanding the contract a bit to include the possibility of debt.

It might be as simple as a change to BytesMonitor.reserveBytes to include the possibility of BytesMonitor.mu.curAllocated > BytesMonitor.limit.

I think we could keep the refactor small, by only doing it with new methods and/or new types and leaving the existing code alone. I.e. keep BoundAccount.Grow the same, and add BoundAccount.GrowWithOverdraft (or some other better name) that can put the account / monitor into debt. (Or maybe add a new type BoundOverdraftAccount or BoundCreditCard that has its own Grow if a type change would be simpler than a method change.) Something like this:

diff --git a/pkg/util/mon/bytes_usage.go b/pkg/util/mon/bytes_usage.go
index 4d2f9246eb..f328ef3165 100644
--- a/pkg/util/mon/bytes_usage.go
+++ b/pkg/util/mon/bytes_usage.go
@@ -710,6 +710,25 @@ func (b *BoundAccount) Grow(ctx context.Context, x int64) error {
        return nil
 }

+func (b *BoundAccount) GrowWithOverdraft(ctx context.Context, x int64) error {
+       if b == nil {
+               return nil
+       }
+       if b.Mu != nil {
+               b.Mu.Lock()
+               defer b.Mu.Unlock()
+       }
+       var err error
+       if b.reserved < x {
+               minExtra := b.mon.roundSize(x - b.reserved)
+               err = b.mon.reserveBytes(ctx, minExtra, true)
+               b.reserved += minExtra
+       }
+       b.reserved -= x
+       b.used += x
+       return err
+}
+
 // Shrink releases part of the cumulated allocations by the specified size.
 //
 // If Mu is set, it is safe for use by concurrent goroutines.
@@ -738,7 +757,7 @@ func (b *BoundAccount) Shrink(ctx context.Context, delta int64) {
 // reserveBytes declares an allocation to this monitor. An error is returned if
 // the allocation is denied.
 // x must be a multiple of `poolAllocationSize`.
-func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64) error {
+func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64, overdraft bool) error {
        mm.mu.Lock()
        defer mm.mu.Unlock()
        // Check the local limit first. NB: The condition is written in this manner
@@ -747,10 +766,14 @@ func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64) error {
        //
        // TODO(knz): make the monitor name reportable in telemetry, after checking
        // that the name is never constructed from user data.
+       var err error
        if mm.mu.curAllocated > mm.limit-x {
-               return errors.Wrapf(
+               err = errors.Wrapf(
                        mm.resource.NewBudgetExceededError(x, mm.mu.curAllocated, mm.limit), "%s", mm.name,
                )
+               if !overdraft {
+                       return err
+               }
        }
        // Check whether we need to request an increase of our budget.
        if mm.mu.curAllocated > mm.mu.curBudget.used+mm.reserved.used-x {
@@ -786,7 +809,7 @@ func (mm *BytesMonitor) reserveBytes(ctx context.Context, x int64) error {
                log.Infof(ctx, "%s: now at %d bytes (+%d) - %s",
                        mm.nameWithPointer, mm.mu.curAllocated, x, util.GetSmallTrace(3))
        }
-       return nil
+       return err
 }

 // releaseBytes releases bytes previously successfully registered via

Is that too scary? Maybe I'm missing something? I don't want to block your work, but it also seems a little silly to allocate another monitor and account for something that the original monitor and account could track without very many changes. But maybe I'm minimizing it or glossing over some complexity?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility to consider would be to add a method to the BufferingInMemoryOperator interface that releases the memory reserved in the limited account and accounts for all memory used by the in-memory operator in an unlimited disk-spiller account, since the disk-spiller should "know" how long the in-memory operator's allocations will live.

If we want to stick with this approach though, the changes :lgtm:

Reviewed 10 of 10 files at r1, 42 of 42 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/sort.go line 341 at r3 (raw file):

		// Finalize the accounting in case the capacity of the new slice is
		// larger than we asked for.
		extraCap := cap(p.order) - len(p.order)

Is it possible for the capacity to be larger than the length requested with make?

This commit improves the memory accounting when memory limit is
exceeded. Previously, in several operators we could run into a situation
where we perform some allocations and run into a memory limit error
later, which results in those allocations being unaccounted for. In some
cases this is acceptable (when the query results in an error), but in
others the memory error is caught and spilling to disk occurs. In the
latter scenarios we would under-account, and this commit fixes most of
such situations.

Now, each disk-spilling operator instantiates a "limited" allocator that
will grow an unlimited memory account when a memory error is
encountered. The idea is that even though the denied allocations cannot
be registered with the main memory account (meaning the operator has
exceeded its memory limit), we still will draw from the
`--max-sql-memory` pool since the allocations can be live for
non-trivial amount of time. If an error occurs when growing the
unlimited memory account, then that error is returned (not the original
memory error) so that the disk spiller doesn't catch it.

This commit audits all operators in `execplan` to use the limited
allocator where appropriate. The new accounting method is only used in
a handful of places which cover most of the use cases. The goal is to
make this commit backportable whereas the follow-up commit will audit
usages of `AdjustMemoryUsage` and will not be backported.

Release justification: bug fix.

Release note: None
This commit audits all callers of `Allocator.AdjustMemoryUsage` to use
the newly-exported `AdjustMemoryUsageAfterAllocation` where applicable
(meaning that if an allocation occurs before the method is called, then
the new method is now used). In many cases this won't result in a change
in the behavior since the allocators are not instantiated with limited
memory accounts, but in some cases it is still useful.

Release justification: bug fix.

Release note: None
@michae2
Copy link
Collaborator

michae2 commented Aug 22, 2022

(@yuzefovich to be clear, I can be argued out of this idea, if you don't want to do it. I just wanted to flesh it out a bit.)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be as simple as a change to BytesMonitor.reserveBytes to include the possibility of BytesMonitor.mu.curAllocated > BytesMonitor.limit.

Hm, it's a good point. It's a bit difficult to reason about whether anything will break because of it (e.g. are all of the bytes correctly released when the account is closed if the overdraft occurred).

I started going down this route, and TestAdjustMemoryUsage (that is being added in this PR) does expose a problem - allowing the monitor going into debt without making some adjustments to the memory account that is putting the monitor in debt breaks the synchronization between the two. In other words, we can now arrive at a situation where the account thinks that it is using 64MiB but it has consumed more from the monitor, so when the account is closed, not all of the bytes are correctly released, so we have a "memory accounting leak". If we do make the account update its used field even when the monitor returns an error, then we end up in a problematic situation when the root monitor returns the error. We would have to examine the error message and update the limited account only if the error message comes from the limited monitor, and not the unlimited one.

So after all I think this idea makes the implementation more complicated and more error-prone, but offers somewhat cleaner API. I'm inclined to keep my original approach since it seems safer and thus can be backported.

Another possibility to consider would be to add a method to the BufferingInMemoryOperator interface that releases the memory reserved in the limited account and accounts for all memory used by the in-memory operator in an unlimited disk-spiller account, since the disk-spiller should "know" how long the in-memory operator's allocations will live.

That would only solve the problem this PR attempts to solve if the disk-backed operator also recomputed the size of all live allocations. In other words, if the limited memory account under-accounts, then it would not matter if we transfer that under-accounted reservation to the unlimited account of the disk-backed operator. We would need to ask the disk-backed op to calculate the precise footprint of all allocations it is taking over and clear the limited account.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/sql/colexec/sort.go line 341 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Is it possible for the capacity to be larger than the length requested with make?

Yeah, you're right - it shouldn't be possible, removed this commit. My mind was in the "append-only" mode when I wrote that commit :)

@yuzefovich
Copy link
Member Author

I see two thumbs-ups, so I'm assuming we're all ok with this change.

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 22, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 23, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 23, 2022

Build succeeded:

@craig craig bot merged commit cf533b0 into cockroachdb:master Aug 23, 2022
@yuzefovich yuzefovich deleted the allocator branch August 23, 2022 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

colmem: improve accounting when exceeding the limit
4 participants