-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: mechanism for monitoring heap size #16843
Comments
Can you please expand on what you have in mind? |
I have nothing specific in mind. This bug was filed as part of a triage meeting with a bunch of us. One bug (#5049) was ancient with no activity and one bug (#14162) proposed a solution instead of discussing the problem. This bug is recognition that there is a problem, and we've heard problem statements and potential solutions (and sometimes both) from a number of people. The reality is that there are always memory limits, and it'd be nice for the Go runtime to help applications stay within them, through perhaps some combination of limiting itself, and/or helping the application apply backpressure when resources are getting tight. That might involve new runtime API surface to help applications know when things are getting tight. /cc @nictuku also. |
Btw, there was lots of good conversation at #14162 and it wasn't our intention to kill it or devalue it. It just didn't fit the proposal process, and we also didn't want to decline it, nor close it as a dup of #5049. Changing the language is out of scope, so all discussions of things like catching memory allocation failures, language additions like "trymake" or "tryappend", etc, are all not going to happen. But we can add runtime APIs to help out. That's what this bug is tracking. /cc @matloob @aclements |
Agreed. "try*" isn't practical. It would require changing too make call-sites and even then would not catch all allocations. Adding runtime.SetSoftMemoryLimit() still seems like the best approach. |
It would be nice to have the ability to set a limit to the memory usage. After a limit is set, perhaps the runtime could provide a clear indication that we're under memory pressure and that the application should avoid creating new allocations. Example new runtime APIs that would help:
That would provide a clear signal to the application. How exactly that's decided should be an internal implementation decision and not part of the API. An example implementation, to illustrate: if we limit ourselves to the heap size specified by the user, we could trigger GC whenever the used heap is close to the limit. Then we could enter pushback whenever the GC performance (latency or CPU overhead) is outside certain bounds. Apply smoothing as needed. The approach suggested by this API has limitations. For example, it's still possible for an application that is behaving well to do one monstrous allocation after it has checked for the pushback state. This would be common for HTTP and RPC servers that do admittance control at the beginning of the request processing. If the monstrous allocation would bring the memory heap above the limit, Go should probably panic. Since we don't want to change the language to add memory allocation error checks, I think this is fine. And we have no other option :). Another problem is that deciding what is the right time to pushback can be hard. Whatever the runtime implements, some folks may find it too aggressive (pushing back too much, leading to poor resource utilization) or too conservative (pushing back too late, leading to high latency due to excessive GC). I guess the go team could provide a knob similar to GOGC to control the pushbackiness of the runtime, if folks are really paranoid about it. |
The runtime could set up a channel and send a message whenever it completes Trying to pick up the pieces after a failure does not seem doable. Likewise I believe this is where we were headed in #14162 I would be interested in what useful policy could not be implemented using On Tue, Aug 23, 2016 at 1:43 AM, Yves Junqueira [email protected]
|
I previously gave the reasoning why using a channel or a callback to receive memory exceeded events won't work: #14162 To robustly handling exceeding a memory limit the check for the limit has to be part of the allocator, not done after a GC run. This is because you can't afford to wait. If you wait for the next GC run, it may be too late. Consider a single large slice allocation that would put you over the soft limit and would exceed the hard memory limit. You'll get an OOM panic. The same applies to a callback function. You need to immediately stop the code which is doing the heavy allocating. To do that you need a check in the allocator and you need to send a panic(). It's up to the application to set the soft memory limit at which these optional, catchable panics are sent. Please, before rehashing old suggestions or coming up with new variants, read through #14162 where I gave the reasoning why a panic and a check in the allocator is needed. Otherwise we keep covering the same old ground. |
@rgooch If you are allocating giant arrays, you probably know exactly where in your code that is happening, and you can add code there to first check if there is enough memory available. You can even do that using the GC information we're discussing passing down a channel. I do think there is a race here, but in the opposite case - if code is sitting in a tight loop making many small allocations, your channel read/callback might not run in time to actually trigger a new GC soon enough without OOMing. |
I discussed all this in #14162: you can be reading GOB-encoded data from a network connection. No way to know ahead of time how big it's going to be. Or it can be some other library you don't control where a lot of data are allocated, whether a single huge slice or a lot of small allocations. The point is, you don't know how much will be allocated before you enter the library code and you've got no way to reach in there and stop things if you hit some pre-defined limit. And, as you say, if you're in a loop watching allocations, even if you could stop things, you may not get there in time. Spinning in a loop watching the memory level is grossly expensive. This needs to be tied to the allocator. |
This does not propose a callback or channel for delivering a memory One suggestion was The application's heap monitor goroutine, HMG, could initially allocate a If ReserveOOMBuffer is the API that some Go application needs then this On Tue, Aug 23, 2016 at 11:13 AM, rgooch [email protected] wrote:
|
As I read this, #14162 describes a workload where (analogy follows) sometimes the python attempts to swallow a rhino, and if the attempt is not halted ASAP it is guaranteed to end badly. Is it in fact the case that the rhino will never be successfully swallowed? (I can imagine DOS attacks on servers where this might be the case.) I think that the periodic notification scheme is intended to deal with a python diet of a large number of smaller prey; if an application has the two constraints of m=memory < M and l=latency < L, and if m is affine in workload W (reasonable assumption) and l is also affine in workload W (semi-reasonable), then simply comparing observed m with limit M and observed l with limit L tells you how much more work can be admitted (W' = W * min(M/m, L/l)), with the usual handwaving around unlucky variations in the input and lag in the measurement. It's possible to adjust GOGC up or down if M/m and L/l are substantially different, so as to maximize the workload within constraints -- this however also requires knowledge of the actual GC overhead imposed on the actual application (supposed to be 25% during GC, but high allocation rates change this). One characteristic of this approach is that a newly started application might not snap online immediately at full load, but would increase its intake as it figured out what load it could handle. But this is no help for intermittent rhino-swallowing. |
As long as the proposal isn't to "make it possible to catch failed memory allocations", which I'm pretty sure everybody agrees isn't going to happen. But any proposal should address or at least consider the whole range of related issues in this space. (back pressure, runtime & applications being aware of limits & usage levels) |
I was thinking a couple additions to the runtime package to expose information that might be useful for applications like you said in #16843 (comment) |
Is there any decision about how this would be properly implemented? In Perl there is documented a notorious $^M global variable that user code could initialize to some lengthy string, that in case of Out of memory error could be used as an emergency memory pool after die()ing. However I couldn't find a working example and it seems that feature was never implemented. However it seems logical approach. Since you are most probably in multitenancy environment, sharing memory with other go/non-go programs, so the only buffer that you can rely on is the emergency one allocated by yourself. Using that memory by go runtime in case of low memory and immediately notifying the subscribed process that you are running out of memory seems like a good measure to prevent pure go programs panic. |
My proposal is here: https://docs.google.com/document/d/1zn4f3-XWmoHNj702mCCNvHqaS7p9rzqQGa74uOwOBKM/edit I hope to have an implementation open sourced soon. I don't know if it could be included in the standard libraries. I would like to make it as robust as possible, so if you'd like to test it, please drop me an email (see my github profile) and I'll contact you later. Thanks! |
This proposal looks interesting. I made a couple of comments in the document:
|
Added feedback to optionally trigger orderly application shutdown when GC pacing fails to keep memory below the set maximum. |
I'm dealing with an app that runs out of memory (on a 16GB box) and that eventually lead me here. Some of the notes I took along the way are below, apologies if these fall into a "yeah, we know" category.
Overall I concur with the sentiment that most apps that run out of memory will run out of memory regardless of how fancy a mechanism is added to the current situation. For this reason if I had a vote I would vote for adding some additional simple hooks so one can do some tuning and foremost troubleshoot when an app does run out of memory. |
I'm not sure what you're suggesting, exactly.
Assuming you mean runtime.MemStats.HeapInUse (and friends), note that this can vary depending on where you are in a GC cycle. Perhaps more interesting is MemStats.NextGC, which tells you what heap size this GC cycle is trying to keep you below. This changes only once per GC cycle.
runtime/debug.SetGCPercent lets you change this. Right now this triggers a full STW GC, but in Go 1.9 this operation will let you change GOGC on the fly without triggering a GC (unless you set it low enough that you have to immediately start a GC, of course :) |
|
Nice long proposal write-up :-). I'm trying to understand the tl;dr; ... The proposal seems to come down do "periodically measure live data size and set GCPercent such that GC is triggered before the desired total heap size is reached". As mentioned in the proposal, this can be done/approximated today in the app itself using runtime.MemStats and debug.SetGCPercent. As far as I can tell the following changes to the runtime would be desirable to improve this:
As a user I'm still left wondering a bit what a reasonable goal in all of this is. I'm imagining something like "for the vast majority of Go apps the tuning of GCPercent allows 80% of memory to be used for live data with moderate GC overhead and 90% with high to very high GC overhead". Maybe someone in the Go community has informed intuition about specific numbers. The answer to requests to have some callback or rescue option when memory allocation fails would be that instead GC overhead exceeding N% or GCPercent falling below below M% should be used to trigger said rescue action. |
I did an experiment to use GCPercent to constrain heap size and while the principle works as expected, it does look sufficient to me. I'm working on an app that digests some giant CSVs where memory consumption is an issue. I'm running with GCPercent=25 to try and contain the memory overhead. I'm running with gctrace=1 and the highest heap size number I see is 797MB:
A little later after some memory has been freed I grab MemStats and get the following HeapXxx stats which show 1.2GB of heap (all gctrace outputs since the above were lower):
Data grabbed from top at about that time seems to agree with the heap stats (code/stack size are not significant):
I was trying to keep the memory used by my process to 613MB*1.25=767MB using GCPercent but clearly that's not really working. |
I have an urgent need in a tool that helps to understand what's actually going in a Go process address space, why RSS keeps growing despite of memory limitations and so on. Also I need to use cgo, which makes problem even more complicated. Currently I have to use a set of tools like For example, I see that process has 5GB |
@vitalyisaev2, please open a new issue or send an email to [email protected]. In it, please elaborate on what you mean by "Go runtime says that it takes 2GB". The runtime exports many different statistics, and it's important to know which one you're talking about. I would start by looking closely at all of the |
@aclements, what do you think the decision is here that NeedsDecision refers to? |
For the record, there's a change to improve how memory is restored to the OS that is scheduled to be released on Go 1.12 for Linux and Go 1.13 on iOS, preventing OOM: #29844. |
This slightly rearranges gcSetTriggerRatio to compute the goal before computing the other controls. This will simplify implementing the heap limit, which needs to control the absolute goal and flow the rest of the control parameters from this. For #16843. Change-Id: I46b7c1f8b6e4edbee78930fb093b60bd1a03d75e Reviewed-on: https://go-review.googlesource.com/c/go/+/46750 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Rick Hudson <[email protected]>
Closing as a duplicate of #29696. |
Change https://golang.org/cl/227767 mentions this issue: |
Tracking bug for some way for applications to monitor memory usage, apply backpressure, stay within limits, etc.
Related previous issues: #5049, #14162
The text was updated successfully, but these errors were encountered: