Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: add way for applications to respond to GC backpressure #29696

Open
josharian opened this issue Jan 11, 2019 · 34 comments
Open

Comments

@josharian
Copy link
Contributor

josharian commented Jan 11, 2019

This is marked as a proposal, but I'd like it to start out as a discussion. Although I believe the need is clear, the correct design is not at all obvious.

Many applications have caches that they could easily drop in response to GC backpressure. However, there is currently no good way to do this.

One obvious example is sync.Pool. If we also had per-P storage, we could rebuild sync.Pool outside the standard library using GC backpressure. Better, folks could customize the sync.Pool emptying strategy in response to their own needs.

Another example is interning strings.

As to design, there are a few aspects. What does the API look like? When and how does the runtime trigger it? And what system dynamics might result.

One API ideas:

package runtime

// The API below needs better names. And the docs are a mix of user-facing and internal details.

// GCBackpressureC returns a channel that the runtime will close to indicate GC backpressure.
// One the channel has been closed, another call to GCBackpressureC will return a new channel.
// There is only a single, shared channel across all calls to GCBackpressureC.
// This makes the runtime side cheap: a single, lazily allocated channel, and a single close call per round of backpressure.
// The downside is that it could cause a stampede of memory releases.
// Another downside is that there could be logical races in which clients miss backpressure notifications by not obtaining a new channel quickly enough.
func GCBackpressureC() <-chan struct{}

// GCBackpressureC returns a channel that the runtime will send values on to indicate GC backpressure.
// The values have no meaning here, but maybe there's something useful we could send?
// The runtime side here is more expensive. It has O(n) channels to keep track of, and does O(n) work to send signals to everyone.
// On the other hand, by not sending signals to everyone simultaneously it can help avoid stampedes.
func GCBackpressureC() <-chan struct{}

As to when the runtime should indicate backpressure, there are several options. It could do it once per GC cycle, as a matter of course. (This matches existing sync.Pool behavior.) It could do it in response to heap growth. Or maybe something else or some combination.

It's unclear what effect this might have on system dynamics. To avoid messy situations like #22950, someone should probably do some hysteresis and/or incremental work. It is unclear whether that should reside in the runtime or in user code.

cc @aclements @rsc @dvyukov @dsnet @cespare

@gopherbot gopherbot added this to the Proposal milestone Jan 11, 2019
@ianlancetaylor
Copy link
Member

See also #16843.

@josharian
Copy link
Contributor Author

Thanks, Ian. Looks like the primary difference between this request and that one is that #16843 seems focused on dealing with heap limits, whereas I mostly had in mind improving the management of long-lived shared caches.

Another obvious use case is monitoring and metrics.

Another observation: you don’t want the runtime to block on these. So for the queue-per-consumer model you probably need it to be best effort only and use a buffered channel. Also, for the queue-per-consumer model, you could use the sent value to provide information about the memory event (GC cycle? Regular backpressure? > 50% on our way to heap growth? > 90% on our way to heap growth? Etc.)

cc @RLH

@CAFxX
Copy link
Contributor

CAFxX commented Jan 12, 2019

@josharian the closest thing I managed to do is https://github.com/CAFxX/gcnotifier, but it fires a notification at the end of a GC cycle, not when it starts (as is the case of the clearpools hook).

@dvyukov
Copy link
Member

dvyukov commented Jan 14, 2019

One thought about the interface until it's too late:

func GCBackpressureC() <-chan struct{}

Async interface based on chans can limit its usefulness. It won't be possible to understand when the callbacks have actually finished to do the scanning. The previous round may not have finished when we start another one. Sync callbacks may be more useful in preventing heap growth.

@aclements
Copy link
Member

My big question is what does "GC backpressure" mean? In the context of SetMaxHeap, this is fairly clear: the GC is under pressure when it switches from the amortized proportional regime to the heap-limited regime. But outside of that context, the garbage collector is always under the same amount of pressure, by design.

If the point is to release caches, what you really want to know is what the trade-off is between using that memory for your caches versus giving the GC less memory to mark. I have no idea how you compute the trade-off.

In the case of sync.Pool, the current behavior leads to weird GC dynamics and I'm not sure I want to let users construct that sort of behavior without any runtime visibility into what's going on.

Related to @dvyukov's comment, it's not clear when you actually invoke this signal, but maybe defining backpressure would clarify that. Calling it when GC starts is too late for that GC cycle to collect anything that gets released. So you have to call it before GC starts, but how long before? You probably don't want to block GC until it's done (that would cause either deadlocks or unbounded heap growth). You could call it right after marking finishes, when the GC knows the targets for the next GC, but maybe that's too early.

FWIW, we've been thinking of splitting up SetMaxHeap and its notification channel because almost everyone who's tried SetMaxHeap has ignored or misused the channel and becayse tying the channel to SetMaxHeap may limit future evolution that could need the same type of notification for different reasons.

@rsc
Copy link
Contributor

rsc commented Aug 13, 2019

@aclements, what is the state of this proposal?
Is it the same as the current SetMaxHeap implementation sketch?
Something different?
And what is the relationship between this issue and #23044 and #16843?
Thanks.

@changkun

This comment has been minimized.

@tv42
Copy link

tv42 commented Mar 7, 2020

I don't see why mixing this with operating system signals would make sense.

Returning a channel is ambiguous to its caller. For instance: what happens if I close the channel?

Those channels are receive-only channels, <-chan struct{}, thus you can't close them.

@changkun

This comment has been minimized.

@urjitbhatia
Copy link

urjitbhatia commented Mar 23, 2020

If there's an application where new memory allocation is fairly restricted to a few locations like an http/rpc server entry-point after initial warm-up, can I use a manually-instrumented hack like this for a similar effect while GC support is still being worked on?
https://github.com/chronomq/chronomq/blob/master/internal/monitor/memory.go

@networkimprov
Copy link

@aclements, should https://golang.org/cl/227767 reference this issue?

It refs #16843 which was closed in favor of this one.

@aclements
Copy link
Member

@networkimprov, ah, yes. Thanks.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/227767 mentions this issue: [release-branch.go1.14] runtime/debug: add SetMaxHeap API

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/46751 mentions this issue: runtime/debug: add SetMaxHeap API

@CAFxX
Copy link
Contributor

CAFxX commented Apr 12, 2020

@aclements would it be possible to merge this as a GOEXPERIMENT (maybe with the public API bits in x/exp?) so that it can see wider testing?

You could call it right after marking finishes, when the GC knows the targets for the next GC, but maybe that's too early.

I don't think it's too early. That would be arguably the best moment for user code to perform housekeeping, since IIUC it's when most memory is available and housekeeping activities may directly or indirectly cause allocations.

FWIW, I agree with dvyukov's suggestion that a synchronous interface may be better as it would allow the runtime some degree of control over the pace of housekeeping (e.g. as done for finalizers).

@neild
Copy link
Contributor

neild commented Apr 12, 2020

I think that CL 46751 should split out the notification channel from SretMaxHeap, as @aclements suggests in #29696 (comment). And that we should then go ahead and add SetMaxHeap without the notification channel.

We've been using SetMaxHeap for some time, and so far as I know it has performed well. The API, sans notification channel, is simple. The rationale for it (staying within container memory limits) is clear and useful. I think it should graduate from experimental status.

The notification channel is a different question. Not only is it not clear that a channel is the correct API, I'm not convinced that there is a correct API to be had. I have not personally seen a successful attempt at reducing memory pressure by dropping existing work. (Perhaps there is one, in which case I'm interested to hear about it.) The idea is compelling, but I think any signal provided by the GC simply comes too late--the way to avoid memory overload is to gate the amount of work accepted, not to react after too much has already been accepted.

@josharian
Copy link
Contributor Author

I have not personally seen a successful attempt at reducing memory pressure by dropping existing work. (Perhaps there is one, in which case I'm interested to hear about it.) The idea is compelling, but I think any signal provided by the GC simply comes too late--the way to avoid memory overload is to gate the amount of work accepted, not to react after too much has already been accepted.

You can drop existing caches instead of work. And you could theoretically signal before there's too much, and there's still time to slow down. In my first comment above, brainstorming, I mentioned "> 50% on our way to heap growth? > 90% on our way to heap growth?"

@neild
Copy link
Contributor

neild commented Apr 12, 2020

One big problem with dropping caches in response to pressure is that it increases load at exactly the point when you're already overloaded. It isn't even a reliable tradeoff of CPU for memory, since filling the cache often has a memory cost as well. I would expect a better strategy to be to set a maximum cache size, to keep memory usage from the cache predictable and consistent.

It's possible that dropping caches can work as a response to memory pressure, but I haven't seen it in practice. Examples of it being made to work would be useful in validating the backpressure API.

In contrast, SetMaxHeap (ignoring the backpressure channel) is demonstrably useful for staying within system memory limits. You still need a mechanism to avoid taking on too much concurrent work, but it is an important part of a complete solution.

@CAFxX
Copy link
Contributor

CAFxX commented Apr 12, 2020

The rationale for it (staying within container memory limits) is clear and useful. I think it should graduate from experimental status.

💯

One big problem with dropping caches in response to pressure is that it increases load at exactly the point when you're already overloaded. It isn't even a reliable tradeoff of CPU for memory, since filling the cache often has a memory cost as well. I would expect a better strategy to be to set a maximum cache size, to keep memory usage from the cache predictable and consistent.

This is heavily workload-dependent, but there are ways to mitigate that problem (e.g. drop N% of the cache)

The notification channel is a different question. Not only is it not clear that a channel is the correct API, I'm not convinced that there is a correct API to be had. I have not personally seen a successful attempt at reducing memory pressure by dropping existing work. (Perhaps there is one, in which case I'm interested to hear about it.)

My main use-case for this is long-lived stateless services, so what I was thinking of is triggering a graceful shutdown of the process, and letting the supervisor deal with it. You'd probably need some kind of random backoff to avoid stampeding the supervisor, but it would still arguably be an easy, correct, and pretty generic solution for that class of services. (sure, it won't save you in case you're trying to deal with a seriously oversized request - but that's not something the runtime can deal with anyway)

Anyway agree on splitting it off and thinking about it more.

@SerialVelocity
Copy link

The rationale for it (staying within container memory limits) is clear and useful. I think it should graduate from experimental status.

Another thing to think about here is that there's quite a few Go programs that others have written that run in containers. It would be nice to be able to set the max heap size without modifying and recompiling their programs. Maybe another environment variables or an extension to GOGC would be good?

@bradfitz
Copy link
Contributor

@aclements, any update on this? In particular, on SetMaxHeap (https://go-review.googlesource.com/c/go/+/46751)?

@rsc
Copy link
Contributor

rsc commented Jun 10, 2020

My memory from a while back is that we were having problems with getting programs that found out that they were close to running out of memory to do something useful about that. Without the ability to adapt, there's not much point to SetMaxHeap. (If it were strictly enforced as a max without backpressure, you'd just make the programs slower and slower and slower as the GC ran more and more often.)

It would be helpful to get a status report about our experiments with SetMaxHeap inside Google. Maybe @neild knows?

It would also be helpful to hear from people who want to use SetMaxHeap in their own programs. Would those programs be able to respond effectively in practice to finding out that they are getting close to the max? I realize Josh suggested that programs might drop caches up above, but it would be nice to have confirmation that would work in practice. One potential problem is that if you are getting close to the max, it may be hard to drop enough memory to get back down - being close to max means you need to drop a substantial fraction of your in-use memory. A few sync.Pools may not be enough.

Given that the CL is public, it would be great to hear from people who have patched it in and used it successfully.

@andreimatei
Copy link
Contributor

One thing I'd be interested in having CockroachDB do on a signal that the runtime provides is have it dump a heap profile. We're having trouble with OOMs - not helped by the fact that Go generally doesn't help us do good memory accounting to keep track of how much memory different modules are taking - and so seeing where the memory went when it blows up should help us keep things under control.
We are collecting profiles periodically, and are also investigating collecting them on OS-level signals but at least so far we've frequently been unsuccessful in getting a profile at the right moment before the OOM killer comes along.

@neild
Copy link
Contributor

neild commented Jun 10, 2020

I don't know of any success stories of applying backpressure based on current memory usage. I'm willing to believe that it's possible, but in every case I've seen the delay is too long--by the time you can detect that you're using too much memory, it's too late to do anything about it.

That doesn't mean SetMaxHeap isn't useful, though.

Let's say you have a working set size of 5.1GiB, and a container memory limit of 10GiB. With GOGC=100, the runtime requests about twice the working set from the OS (yes, I know that this isn't a perfect description), so we use 10.2GiB and OOM. In this case, we don't need backpressure; we just need to tell the GC to stay within the container limit. SetMaxHeap is perfect for this job.

We're using SetMaxHeap inside Google right now when running the linker in Blaze builds. Build actions have a memory limit, and the Go linker's memory consumption is high enough to exceed it on some very large targets. Using SetMaxHeap helps the linker stay within those limits. Backpressure isn't meaningful here at all.

SetMaxHeap isn't a panacea. It's not an overload protection mechanism, and it won't save you if your live heap size approaches your heap limit. It's still very useful in the right circumstances.

@rsc

This comment has been minimized.

@mvdan

This comment has been minimized.

@ajwerner
Copy link

I'll pile on to what Andrei said. Not just have we had a hard time detecting grabbing memory profiles. We've had a very hard time picking apart heap dumps or core dumps. We suspect that some ways in which allocations miss the painstaking attempts we make to track memory allocation is by taking pointers out of bigger slices. We have a hard time tracking down not where the memory was allocated but rather what is keeping it alive.

I think @andreimatei's point to some extent is that the signal would be helpful in a first-order response as merely a mechanism to trigger dumping of a heap profile.

I'll add to this that cockroach seems like the type of relatively advanced user which could respond to such a signal by potentially canceling contexts and shrinking memory budgets which we use to try to track memory usage. In short, I suspect with a signal that we could meaningfully shed memory load.

Furthermore, we've recently switched over to using a pure go storage engine https://github.com/cockroachdb/pebble/ from https://github.com/facebook/rocksdb but still found ourselves needing to allocate the block cache in C to avoid its impact on GC (see cockroachdb/pebble#523).

@andybons

This comment has been minimized.

@seebs
Copy link
Contributor

seebs commented Jun 11, 2020

It's very hard to make a design for this that allows immediate responses, but there's definitely circumstances under which I would benefit from a slower process -- not necessarily anywhere near fast enough to help with responding to a specific incoming memory request -- that allowed me to say "oh, well, if we're running low on memory we could drop these cached things".

The right size for a cache is usually "as much as you can get away with". In most cases, it'd be really easy to free up huge amounts of memory quite quickly -- if I knew it were useful to do so.

So, a solution to "there's no memory now, you have to free something right this instant for us to finish satisfying a request" is probably impossible and a bad idea. But a way to find out that, hey, memory usage is a bit larger than we'd like it to be, maybe trim the caches some? That'd be lovely.

We've been working on the costs of our caches quite a bit, but we've had cases where the cache was, say, over 80% of heap usage, and lots of cases where the cache was well over 20-30%.

@rsc
Copy link
Contributor

rsc commented Jun 17, 2020

The original proposal here was "add way for applications to respond to GC backpressure". We designed and built SetMaxHeap to do that, but it sounds like the people who have tried it and used it are all not watching for or responding to backpressure.

@andreimatei and @ajwerner, SetMaxHeap only tracks usage of the heap itself, not things like goroutine stacks or other system memory, but if you think heap growth is the source of the OOMs, it seems like you should be able to use it to get a signal (a note on a channel) when the heap reaches a certain point. It would be helpful to us in understanding whether to officially adopt SetMaxHeap to hear whether it actually works for that use case. The CL implementing it (which you should be able to patch into your Go release) is https://go-review.googlesource.com/c/go/+/46751.

In any event, it sounds like we have a potential mechanism for "add way for applications to respond to GC backpressure", but we don't have any evaluations of that mechanism to actually apply backpressure. That's what we'd need to understand whether it is any good. In the absence of people who can try it and report back, it's hard to say what we should do. (And if there are no such people maybe we don't need this feature after all.)

Placing this on hold until someone who wants this feature can report back about how well SetMaxHeap does or does not work for them. Or until someone proposes a different feature that they've used and can report on.

@seebs
Copy link
Contributor

seebs commented Jun 17, 2020

So, as one of the people who would sort of like a functionality like this, the reason I am not yet testing SetMaxHeap, even though I've heard of it now, is that using it seems like it would require non-trivial (though not huge) architectural work, in order to make a program which can be run only with a local custom-patched Go compiler/runtime.

But I have spent some time thinking about trying to use this, and a couple of things stand out to me.

First, I am pretty sure we want/need a way to have more than one thing registered as wanting to know about memory backpressure. If this feature exists, it's "obvious" that you'd want to have a cache listen for such signals. But what if you have two cache implementations? Each independently wants to do that. But also, you can't express "I want to know when there's backpressure on memory" separately from "I know what is a good limit for the heap to have".

Even without a specific max heap size, it might be desireable for a program to get hints about memory usage -- possibly, say, doing so if the GC has to start throttling other activity to keep up with the amount of GC it needs to do.

I was going to also say that it'd be nice to have a way to get a sense for the severity of the GC backpressure, but actually it looks like that's already present.

So imagine that we had:

debug.NotifyGCPressure(ctx context.Context, avail int, notify chan struct{})

This would produce edge-triggered notifications whenever AvailGCPercent crosses above the given value, until ctx is canceled. This would let any individual author pick suitable behavior and/or threshholds, possibly multiple threshholds (requesting notification at 30%, and again at 10%), and prevent them from colliding, but at some expense in additional work to track this.

(The context parameter is possibly the wrong solution, but I'm not sure what to do if anything for "let me cancel this notification", apart from "close the channel and just expect runtime to catch the panic and clear it up" and that seems awful.)

@shabbyrobe
Copy link

Hello! I have an experience report for this.

I have a data processing program that needs to work with a much bigger dataset than my main memory can reasonably hold. It's a CLI tool, not a server. My approach is to fill the memory with as much of the results as I can, flush to disk, then fill the memory again, and merge it all together at the end. Unfortunately, this involves a horribly high number of unavoidable small append operations.

I struggled a lot with OOM errors with this, which seemed to happen at unexpected times, and as with some of the other comments in this thread, I couldn't really gain any insights from the available details (heap profiles, etc). It took me a week of tinkering to get something that allowed the program to run to the end, though I had no confidence in its reliability. This was made more complex by the fact that the program takes ages to run, and the OOMs could happen at minute 5 or hour 7 of a run.

I tried a bunch of different things to work around it. I started out querying the OS for free memory, which worked OK but I still got some unexpected OOMs from time to time unless I set the threshhold far lower than I wanted. I looked around in ReadMemStats for something reliable but came up empty handed. Finally, I tried experimenting with custom allocation strategies for my critical data structures, which seemed to work very well but required an extraordinary amount of extra work and ultimately felt like I was just re-inventing malloc in Go, badly. I've kept some of that as it does seem to help reduce the GC burden from all those tiny appends quite considerably, but pushing any further down that path was going to leave me coughing yak hair for the next "forever" or so.

I had all but given up on this before finding this GitHub issue. I applied the SetMaxHeap CL and modified my program to use the new notification channel (removing an awful lot of really rotten code in the process, which was quite satisfying), and managed to successfully run the program to completion on the first real attempt (after I was done poking and prodding). I could process about 20% more of the data for each flush, which saves my shoddy little tool quite a bit of work at the merge stage.

My strategy while using the custom allocation stuff was to wait until AvailGCPercent fell below a configurable threshold (I had it as low as 10% in the successful run), then to evict some stuff from my allocator and run runtime.GC(), sleep for a bit, and flush to disk if AvailGCPercent was still below the threshold. I'm not entirely sure if this is the right approach for a program of this kind, but it did seem to work. The 'evict, gc, sleep' approach helped avoid what felt like "false positive" notifications from the channel: I expected to be able to process about 1 million inputs per flush, but without this step, I would occasionally see flushes after only about 300,000, with no observable pattern from run to run.

I did find that disabling my custom allocation code and relying entirely on the Go GC led to the OOM errors coming back and a radically lower number of inputs per flush (approx 200,000), which I attribute to the high likelihood that I'm "holding it wrong" in the way I'm checking AvailGCPercent, and also the really crazy amount of pressure I'm putting on the GC with all those tiny appends, but I'm not really sure about this, I'm just guessing at this stage.

I hope this report is useful. I think I'd like to see some more concrete explanations and examples for what "holding this right" would look like in different scenarios, but for me, having access to this feature would make what I'm doing here substantially less problematic. I can't say for certain whether this is the right implementation (not being too well versed in the internals), but I do feel like this is a worthwhile feature.

bradfitz pushed a commit to tailscale/go that referenced this issue Jul 27, 2020
DO NOT SUBMIT. This is an experiment to get some experience with the
API and figure out if this is even a reasonable primitive.

This adds an API to set a soft limit on the heap size. This augments
the existing GOGC-based GC policy by using the lower of the
GOGC-computed GC target and the heap limit.

When the garbage collector is bounded by the heap limit, it can no
longer amortize the cost of garbage collection against the cost of
growing the heap. Hence, callers of this API are required to register
for notifications of when the garbage collector is under pressure and
are strongly encouraged/expected to use this signal to shed load.

This CL incorporates fixes from CL 151540, CL 156917, and CL 183317 by
[email protected].

Updates golang#29696.

Change-Id: I82e6df42ada6bd77f0a998a8ac021058996a8d65
chaochn47 pushed a commit to chaochn47/go that referenced this issue Apr 21, 2021
DO NOT SUBMIT. This is an experiment to get some experience with the
API and figure out if this is even a reasonable primitive.

This adds an API to set a soft limit on the heap size. This augments
the existing GOGC-based GC policy by using the lower of the
GOGC-computed GC target and the heap limit.

When the garbage collector is bounded by the heap limit, it can no
longer amortize the cost of garbage collection against the cost of
growing the heap. Hence, callers of this API are required to register
for notifications of when the garbage collector is under pressure and
are strongly encouraged/expected to use this signal to shed load.

This CL incorporates fixes from CL 151540, CL 156917, and CL 183317 by
[email protected].

Updates golang#29696.

Change-Id: I82e6df42ada6bd77f0a998a8ac021058996a8d65
@kent-h
Copy link

kent-h commented Oct 25, 2021

Does anyone know the status of this? It appears that there was a working PR in progress (at least for SetMaxHeap), but that it hasn't been worked on in about a year.

There appear to be two separate issues discussed here, with one blocking the other:

  1. Allow the MaxHeap to be set (and possibly try to get it from the environment), to allow programs to run the GC sooner rather than getting OOM killed.
  2. Provide a way to report memory/GC pressure to the application, to try and drop caches, etc. when memory begins to become constrained.
    There appears to be an experimental PR for (1), would it be worthwhile to clean up & merge this, and treat (2) as a separate issue?

My personal interest is that I have a memory-constrained application, in which my process occasionally gets OOM killed. Just MaxHeap would likely be sufficient for non-caching situations like this.

@seankhliao
Copy link
Member

I believe the current iteration is #48409 which provides a soft memory limit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Hold
Development

No branches or pull requests