-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VPA Not honoring maxAllowed Memory Limit #6996
Comments
/area vertical-pod-autoscaler |
Hi @kmsarabu , you need to define containerPolicies(minAllowed and maxAllowed) in the CRD file (vpa-v1-crd.yaml) instead of VPA definition file because it's provides clear guidelines for how VerticalPodAutoscalers should manage resource scaling. |
and also could you share the logs of VPA definition(pod) ? |
also you can take refrence for CustomResourceDefinition (CRD) file from this page:https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/deploy/vpa-v1-crd.yaml |
/assign @raywainman |
@raywainman @voelzmo The StatefulSet resources request and limits resources:
limits:
cpu: "2"
memory: 8Gi
requests:
cpu: 500m
memory: 3Gi The VPA apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
annotations:
meta.helm.sh/release-name: sel-telemetry
meta.helm.sh/release-namespace: sel-telemetry
creationTimestamp: "2024-09-12T17:45:53Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: sel-telemetry-tempo-ingester-vpa
namespace: sel-telemetry
resourceVersion: "693765"
uid: ead8e8de-edd6-46e9-9aff-477640b450b2
spec:
resourcePolicy:
containerPolicies:
- containerName: '*'
controlledResources:
- cpu
- memory
controlledValues: RequestsAndLimits
maxAllowed:
cpu: 2000m
memory: 6Gi
minAllowed:
cpu: 1000m
memory: 4Gi
targetRef:
apiVersion: apps/v1
kind: StatefulSet
name: sel-telemetry-tempo-ingester
updatePolicy:
updateMode: Auto
status:
conditions:
- lastTransitionTime: "2024-09-12T17:46:53Z"
status: "True"
type: RecommendationProvided
recommendation:
containerRecommendations:
- containerName: ingester
lowerBound:
cpu: "1"
memory: 4Gi
target:
cpu: "1"
memory: 4Gi
uncappedTarget:
cpu: 25m
memory: 262144k
upperBound:
cpu: "1"
memory: 4Gi VPA is straight up ignoring the Resultant Pod YAML resources:
limits:
cpu: "4"
memory: "11453246122"
requests:
cpu: "1"
memory: 4Gi The problem can also be seen in Grafana, notice the 4 CPU cores Admission Controller Logs Containing "ingester"*
There were no logs in the updater containing "ingester" Updater Logs Containing "ingester"*
Deleting the VPA for this resource and restarting the StatefulSet results in the correct configuration as set by the StatfuSet's min and max resources. This makes using VPA untennable for us as it's doing the opposite of what we want it to. It's reserving all the resources of our cluster so new apps can't be deployed due to lack of available CPU when in reality a tiny bit of CPU is actually being used on the cluster. Is there a fix or do we need to abandon using VPA? |
My understanding is that VPA focuses on requests only. Limits are set as a ratio between the requests/limits before the Pod is processed by the VPA. See https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#limits-control There are methods to keep the limit lower, but they aren't in the VPA object itself. |
I do not believe this is correct.
We are setting a resource policy, it says it will conform to its limits, it is not. Its output even says what it will set it to, but then doesn't do that and sets it to something much higher. Same with the comments in the code: // Controls how the autoscaler computes recommended resources.
// The resource policy may be used to set constraints on the recommendations
// for individual containers.
// If any individual containers need to be excluded from getting the VPA recommendations, then
// it must be disabled explicitly by setting mode to "Off" under containerPolicies.
// If not specified, the autoscaler computes recommended resources for all containers in the pod,
// without additional constraints.
// +optional Focus on the `If not specified, the autoscaler computes recommended resources for all containers in the pod, without additional constraints." Which is saying if you don't specify a resourced policy then it will compute it, but if you do then it will use those constraints. Lastly, if it's not going to obey its own resource constraints for CPU and Memory upper bounds . . .. then why are they a thing? Why can I set them if it is, as you say, designed to ignore them? |
The logic that does the capping is here:
The way I'm (quickly) reading the code here - it should actually be capping the actual requests and limits. Is there possibly a bug here? (I'll try and spend a bit more time looking at this) |
Take a look at Lines 49 to 103 in a2b793d
That seems to be where the VPA grabs the requests and limits recommendations. Another reference is the GKE docs (which aren't the same as this project, but close enough as a useful datapoint):
|
Thanks for digging that up Adrian, that makes sense. I found this old issue talking about this: #2359 (comment) The recommendation there is to actually remove the CPU limit altogether and let the pod burst as needed, VPA will adjust CPU request from there (up to the cap). For memory, it recommends keeping the limit and then letting the OOMs trigger scale up by VPA. And that ultimately spun out #2387 asking for a way to give users more control over limit scaling which led to #3028. This ultimately introduced the ability to disable limit scaling altogether by changing |
Do we need to take action here? |
@adrianmoisey I would argue there are bugs here. The VPA should not be setting CPU limits higher than the resource policy stimulated and that it says it is going to. You can say it's a documentation issue if you want, but I really don't think it is. In any case, the product just doesn't feel ready to me. It's doing unexpected things which are not what I want a system that is actively changing deployments and restarting them doing in my k8s clusters. The purpose of a tool like this is to decrease the work we admins have to do to keep workloads healthy, this is feeling like it's doing the opposite to me. I'd love to know what the team is planning. From what I've read in other tasks, seems like there is very little bandwidth to improve VPA and it's kind of stagnating. Knowing what the future looks like would help us decide if we put it on hold or rip it out of our automation completely (right now we have it off) |
Limits are not a VPA feature, they are a kubernetes feature. The general consensus seems to be that most workload owners shouldn't be setting CPU limits. There are valid use-cases for CPU limits, but if you don't have this very particular use-case: don't use them. When using CPU limits, you're basically sacrificing performance for predictability, so this isn't what most people want.
Can you help me understand the harm that the current VPA behavior is causing? A CPU limit which is ridiculously high shouldn't cause any harm and be very comparable to not having a CPU limit at all (if not present, the limit is equal to the Node's capacity). I agree that it isn't adding any value above not specifying the CPU limit – but this is a configuration issue on the user's end then. Depending on your use-case, this thread already contains the possible options to get the desired behavior:
So maybe the way to start here is: what's your motivation to set limits for CPU and memory? And: Are the limits for your workload statically or should they be changed depending on the load on your workload? This should lead you to the correct way to configure your workload and its VPA settings. Regarding this specific issue: This works as designed. I don't think there's anything to do here from the VPA side. |
@voelzmo
Is the above correct? These are not bugs? |
If the VPA sets the requests to the same value as the max, it is not a bug.
If the VPA sets the requests of the Pod to the same value as the recommendation, it is not a bug. The primary use of the VPA is for it to set requests on Pods. |
@adrianmoisey I would also recommend updating the docs to inform users, clearly and succinctly what VPA is for as you see it because that is not what the docs currently indicate and we've wasted a LOT of time on this for a product that does not do what the docs led us to believe. Part of that was probably me going in with a preconceived notion of what something called "Vertical pod autoscaler" would do and maybe that's my bad. So from what you are saying, VPA is designed for setting requests and therefor to help the scheduler know where to schedule workloads. To answer one of your previous questions (which I didn't do in the last response because I didn't want to pollute the point of my question.). Yes, limits are very necessary in most workloads due to the noisy neighbor issue. We have build agents for Jenkins that will take every single CPU core you throw at them if you don't set a limit. If a workload has issues and gets into a loop it can do the same thing. It's the same principle as setting max RAM, which should be a requirement for any sane k8s admin IMHO. We have a kyverno policy for that. I'll take this back to the team to figure out what we want to do. I'm not sure we will see the point in using it as our goal was to help change the request and limits values as app requirements grow without us having to baby sit workloads. |
@sarg3nt I see that you're frustrated with the VPA behavior and feel like you wasted time, because it doesn't do what you feel you need for your workload. To take step back and look at your requirements: I think there might be a misunderstanding about what (CPU) limits are used for in the context of k8s. What you're writing regarding resource starvation in the absence of CPU limits and also in an earlier comment about VPA setting high limits leading to new apps not getting any resources leads me to believe that there is a misconception leading to your judgement about VPA's usefulness. Let me try to give a short summary of CPU limits and requests (source and probably lots of other blogposts about this.)
The scenario you're describing in which the lack of CPU limits cause processes to be starved or unable to be placed on a Node doesn't exist. CPU limits are a means to ensure that if you saw some batch process take 5 hours in the last run, you're sure that it will take 5 hours in the next run and not finish in 10 hours, because it only was that "fast" the last time, because there were free CPU resources that could be used. When you're arguing here about how VPA promises a thing that it doesn't do and this should be considered a bug, let me point you to literally the very first paragraph in the project's README:
This is what it does and nothing more. Setting requests based on usage, and (if not disabled) adjusting limits proportionally. So to answer your questions: this behavior is as designed and not a bug. However, I also don't think this is a very useful discussion to have. The question is: with an understanding of how CPU limits and requests work, I still think, you can configure VPA to achieve what you need for your workload. I outlined this in an earlier comment. We're also more than happy to accept a PR which clarifies the README section around limits a bit, so others don't run into the same misconception! I hope that helps and maybe VPA will be of use for you in the future! |
/kind documentation |
I am encountering an issue with the Vertical Pod Autoscaler (VPA) where it does not honor the maxAllowed resource limits for memory. Below is the VPA definition I am using:
After running a CPU stress test, the resulting resource limits observed on the pods are:
Despite setting the maxAllowed memory limit to 16Gi, the VPA scaled the memory up to 32Gi.
Steps to Reproduce:
Expected Behavior: The memory limit should not exceed the maxAllowed value of 16Gi.
Actual Behavior: The memory limit scales up to 32Gi, exceeding the maxAllowed value.
Could there be any known issues or configurations that might lead to this behavior? Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: