-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: runtime: GC Callback to handle CPU deathspirals with GOMEMLIMIT #59324
Comments
cc @mknyszek @golang/runtime |
Because there's no guarantee finalizers will run, I don't think this is actually true. Also, they tend to trigger too slowly in practice to be useful for this. On the other hand your proposal introduces a way to impact and observe GC behavior much more immediately and I would like to avoid opening that door. Specifically, I think this part is problematic:
That being said, the runtime does already have a mechanism to limit GC CPU time close to the memory limit. The CPU time limit is set to 50% over a 1 second window. I would like to collect more evidence and data on program behavior near the limit before we try to add an API to work around it. It may be that the 50% limit is just not an aggressive enough limit, or the time window is too small or too large to be a good default. It's true that this will be application dependent in general, but we may just have a poor default here. The Go runtime should be successfully evading a "true" death spiral with room for the application to still fall over. However, it's definitely more difficult to deal with the case where memory goes high and then stays there for a long time (e.g. in the case of a slow memory leak), but I don't think there's a good general solution there without also losing the ability to handle transient spikes in memory use effectively, or making the GC behavior of applications even more complex and likely more fragile to code changes. In sum, I think one of these three things has to give: There may be a way out, but I don't see it yet. Also, a feedback mechanism is theoretically useful, but I must point out that when we've experimented with a feedback mechanism in the past and nobody used it in practice. Instead, where feedback mechanisms are appropriate, applications tended to use a mix of other signals, such as the memory metrics obtained from For instance, if your use-case would be to panic inside the callback passed to I think in #58106 the original poster mentioned tracking |
Intresting thx, it sounds like I have an issue in my testing methodology. I'll try to implement my 90% of GOMEMLIMIT panic using |
After retrying, the runtime does indeed limit itself and do not run GC continously. |
I have been experimenting with using
GOMEMLIMIT
to increase how many my mostly IO bound service is capable to handle.It works great, it also work great if used together with a very high
GOGC
to reduce CPU utilisation (if I allocate 32GiB to a service, I don't care that it is only using 4 and thus running the GC once it reach 8, I might as well wait until the 32 have been used and run GC less often).Sadly if I am about to OOM instead of crashing it will try to run the GC up to continuously (see #58106), which is undesirable.
I think different solutions are very application dependent that why I think a function like this would be helpfull:
Rational behind the different points:
This allows this to be an optional feature that may or may not be implemented by the various go implementations.
However this forces an implementation to either implement this or not, you can't be inconsistent about this (within the lifetime of one program).
The goal of this is to allow programs to handle the CPU death spiral.
Let's assume my solution is that the
sizeAfterGc
is withing 90% of theGOMEMLIMIT
I have set, Ipanic
. (which is most likely what I would want to use in my cases).If an other GC is triggered followed up by a stop the world event, my
panic
attempt may be stopped by the next GC.But this is also flexible enough to allow for more options, maybe instead of
panic
ing Itime.Sleep
for 3 seconds, if I run out of ram well too bad the program crash, but maybe this is a the peak of a transient event that I can live through if whatever IO is allowed to progress.It's probably unhealthy if multiple callbacks are registered as they may fight each other trying to slow down the runtime.
It maybe looks like this will now allow users to implement their own GC logic, by purposely setting an extremely low
GOMEMLIMIT
(like 1 byte), and then blocking in the callback. And yes it does in fact allows this but I don't think this is something new.You can already use finalizers to get a callback from the GC and combined with an extremely high
GOGC
and then callingruntime.GC
in your code you can implement your own custom GC logic. However unlike finalizers a synchronous callback allows you to effectively handle the death CPU spiral.Related:
The text was updated successfully, but these errors were encountered: