-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VPA: Protect against runaway pods #2359
Comments
/assign bskiba |
3 seems a bit problematic, as this means that effectively VPA would start changing pod's qos class which we are explicitly trying to avoid. My recommendation is a variation of 2 - remove the resource limit for CPU on the pod. We do recommend limits on memory but with memory you will get Out of Memory exceptions in situations you described, which will cause VPA to bump the memory recommendation up. |
I have been trying to understand the implications of letting VPA managing limit, and I have came to similar thought --- since CPU is "elastic" and cgroups can provide an effective isolation, removing CPU limit so it can't be changed in a surprised way seems to make sense; on the other hand, memory usage run-away can cause the entire system to OOM and trigger expensive reaping process and cause other issues, so limits on memory must remain at least to reduce the risk. However, I also want to suggest that managing limit should be made optional if it is not already so. I think the main use case of vpa is to make requested resource closer to actual resource so scheduling and load balancing can be efficient. Limit does not play a role here, afaik, and can have complexity that the main use case users do not want to carry. |
Instead of 3) I would also be happy if the proportional scaling of requested & limits can be disabled if the limit != requested. I see this definitely useful when both are equal to remain inside the same QoS class, but what is the benefit of changing the limit proportional in other cases? I found that fairly surprising. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Closing this in favour of #2387 that seems to be asking for same thing (configure if limits should be modified) |
/close |
@bskiba: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
After #2049 VPA is adjusting the specified limit proportionally to the requested resources.
This seems problematic when recommended resources become fairly low. E.g. in a statefulset we ran with limit: 1000m and requested: 500m and once recommended target got lowered to
25m
(due to low usage) with a limit of50m
. While this works normally, any spike in consumption now means that the pod is immediately throttled, the health check fails and the pod is recreated, but never seems to cause an increase of the recommended value.I would see three ways to prevent this:
MaxAllowed
as pod limits if no other limit is specified, which would allow sudden resource changes in certain boundariesOverall I like 3) the most, would you accept a PR implementing this change? Is there anything I'm missing?
The text was updated successfully, but these errors were encountered: