Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPA Not honoring maxAllowed Memory Limit #6996

Open
kmsarabu opened this issue Jul 2, 2024 · 19 comments
Open

VPA Not honoring maxAllowed Memory Limit #6996

kmsarabu opened this issue Jul 2, 2024 · 19 comments
Assignees
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation.

Comments

@kmsarabu
Copy link

kmsarabu commented Jul 2, 2024

I am encountering an issue with the Vertical Pod Autoscaler (VPA) where it does not honor the maxAllowed resource limits for memory. Below is the VPA definition I am using:

apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: test-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: test-deployment
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        controlledResources: ["cpu", "memory"]
        minAllowed:
          cpu: 500m
          memory: 4Gi
        maxAllowed:
          cpu: 4
          memory: 16Gi

After running a CPU stress test, the resulting resource limits observed on the pods are:

Limits:
  cpu:     4
  memory:  32Gi   <- this is more than VPA Object's MaxAllowed->memory?
Requests:
  cpu:      500m
  memory:   4Gi

Despite setting the maxAllowed memory limit to 16Gi, the VPA scaled the memory up to 32Gi.

Steps to Reproduce:

  1. Deploy a VPA with the provided configuration.
  2. Apply a CPU stress test to the target deployment.
  3. Observe the memory and CPU limits on the autoscaled pods.

Expected Behavior: The memory limit should not exceed the maxAllowed value of 16Gi.
Actual Behavior: The memory limit scales up to 32Gi, exceeding the maxAllowed value.

Could there be any known issues or configurations that might lead to this behavior? Thank you in advance for your help!

@kmsarabu kmsarabu added the kind/bug Categorizes issue or PR as related to a bug. label Jul 2, 2024
@adrianmoisey
Copy link
Member

/area vertical-pod-autoscaler

@Adarsh-verma-14
Copy link

Hi @kmsarabu , you need to define containerPolicies(minAllowed and maxAllowed) in the CRD file (vpa-v1-crd.yaml) instead of VPA definition file because it's provides clear guidelines for how VerticalPodAutoscalers should manage resource scaling.

@Adarsh-verma-14
Copy link

and also could you share the logs of VPA definition(pod) ?

@Adarsh-verma-14
Copy link

also you can take refrence for CustomResourceDefinition (CRD) file from this page:https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/deploy/vpa-v1-crd.yaml

@voelzmo
Copy link
Contributor

voelzmo commented Jul 15, 2024

/assign @raywainman

@sarg3nt
Copy link

sarg3nt commented Sep 12, 2024

@raywainman @voelzmo
I'm running into the same thing but with CPU.
We are trying to use VPA to help tune apps on our cluster, especially Open Telemetry, i.e. Prometheus, Grarfana, Tempo, etc.
Let's use the Tempo Ingester as an example: I'm setting requests and limits on the Tempo Ingester StatefulSet and creating a VPA as follows.

The StatefulSet resources request and limits

resources:       
  limits:        
    cpu: "2"     
    memory: 8Gi  
  requests:      
    cpu: 500m    
    memory: 3Gi  

The VPA

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
    meta.helm.sh/release-name: sel-telemetry
    meta.helm.sh/release-namespace: sel-telemetry
  creationTimestamp: "2024-09-12T17:45:53Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: sel-telemetry-tempo-ingester-vpa
  namespace: sel-telemetry
  resourceVersion: "693765"
  uid: ead8e8de-edd6-46e9-9aff-477640b450b2
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      controlledResources:
      - cpu
      - memory
      controlledValues: RequestsAndLimits
      maxAllowed:
        cpu: 2000m
        memory: 6Gi
      minAllowed:
        cpu: 1000m
        memory: 4Gi
  targetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: sel-telemetry-tempo-ingester
  updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2024-09-12T17:46:53Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: ingester
      lowerBound:
        cpu: "1"
        memory: 4Gi
      target:
        cpu: "1"
        memory: 4Gi
      uncappedTarget:
        cpu: 25m
        memory: 262144k
      upperBound:
        cpu: "1"
        memory: 4Gi

VPA is straight up ignoring the maxAllowed CPU of 2 and setting the pod to 4 CPU's

Resultant Pod YAML

resources:                  
  limits:                   
    cpu: "4"                
    memory: "11453246122"   
  requests:                 
    cpu: "1"                
    memory: 4Gi             

The problem can also be seen in Grafana, notice the 4 CPU cores
image

Admission Controller Logs Containing "ingester"*

I0912 18:10:56.111158       1 matcher.go:73] Let's choose from 25 configs for pod sel-telemetry/sel-telemetry-tempo-ingester-1 
I0912 18:10:56.111178       1 recommendation_provider.go:91] updating requirements for pod sel-telemetry-tempo-ingester-1.     
I0912 18:12:00.372082       1 matcher.go:73] Let's choose from 25 configs for pod sel-telemetry/sel-telemetry-tempo-ingester-0 
I0912 18:12:00.372103       1 recommendation_provider.go:91] updating requirements for pod sel-telemetry-tempo-ingester-0.     

There were no logs in the updater containing "ingester"

Updater Logs Containing "ingester"*

I0912 18:11:58.094754       1 update_priority_calculator.go:146] pod accepted for update sel-telemetry/sel-telemetry-tempo-ingester-0 with priority 1.3333333333333333 -
 processed recommendations:                                                                                                                                             
ingester: target: 4294968k 1000m; uncappedTarget: 262144k 25m;                                                                                                          
I0912 18:11:58.094796       1 updater.go:228] evicting pod sel-telemetry/sel-telemetry-tempo-ingester-0                                                                 

Deleting the VPA for this resource and restarting the StatefulSet results in the correct configuration as set by the StatfuSet's min and max resources.

This makes using VPA untennable for us as it's doing the opposite of what we want it to. It's reserving all the resources of our cluster so new apps can't be deployed due to lack of available CPU when in reality a tiny bit of CPU is actually being used on the cluster.

Is there a fix or do we need to abandon using VPA?

@adrianmoisey
Copy link
Member

My understanding is that VPA focuses on requests only. Limits are set as a ratio between the requests/limits before the Pod is processed by the VPA. See https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#limits-control

There are methods to keep the limit lower, but they aren't in the VPA object itself.

@sarg3nt
Copy link

sarg3nt commented Sep 16, 2024

My understanding is that VPA focuses on requests only. Limits are set as a ratio between the requests/limits before the Pod is processed by the VPA. See https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#limits-control

There are methods to keep the limit lower, but they aren't in the VPA object itself.

I do not believe this is correct.
The docs you linked to ay

When setting limits VPA will conform to resource policies.

We are setting a resource policy, it says it will conform to its limits, it is not. Its output even says what it will set it to, but then doesn't do that and sets it to something much higher.

Same with the comments in the code:

        // Controls how the autoscaler computes recommended resources.
	// The resource policy may be used to set constraints on the recommendations
	// for individual containers.
	// If any individual containers need to be excluded from getting the VPA recommendations, then
	// it must be disabled explicitly by setting mode to "Off" under containerPolicies.
	// If not specified, the autoscaler computes recommended resources for all containers in the pod,
	// without additional constraints.
	// +optional

Focus on the `If not specified, the autoscaler computes recommended resources for all containers in the pod, without additional constraints." Which is saying if you don't specify a resourced policy then it will compute it, but if you do then it will use those constraints.

Lastly, if it's not going to obey its own resource constraints for CPU and Memory upper bounds . . .. then why are they a thing? Why can I set them if it is, as you say, designed to ignore them?

@raywainman
Copy link
Contributor

The logic that does the capping is here:

// applyVPAPolicy updates recommendation if recommended resources are outside of limits defined in VPA resources policy

The way I'm (quickly) reading the code here - it should actually be capping the actual requests and limits. Is there possibly a bug here?

(I'll try and spend a bit more time looking at this)

@adrianmoisey
Copy link
Member

Take a look at

// GetContainersResources returns the recommended resources for each container in the given pod in the same order they are specified in the pod.Spec.
// If addAll is set to true, containers w/o a recommendation are also added to the list (and their non-recommended requests and limits will always be preserved if present),
// otherwise they're skipped (default behaviour).
func GetContainersResources(pod *core.Pod, vpaResourcePolicy *vpa_types.PodResourcePolicy, podRecommendation vpa_types.RecommendedPodResources, limitRange *core.LimitRangeItem,
addAll bool, annotations vpa_api_util.ContainerToAnnotationsMap) []vpa_api_util.ContainerResources {
resources := make([]vpa_api_util.ContainerResources, len(pod.Spec.Containers))
for i, container := range pod.Spec.Containers {
recommendation := vpa_api_util.GetRecommendationForContainer(container.Name, &podRecommendation)
if recommendation == nil {
if !addAll {
klog.V(2).Infof("no matching recommendation found for container %s, skipping", container.Name)
continue
}
klog.V(2).Infof("no matching recommendation found for container %s, using Pod request", container.Name)
resources[i].Requests = container.Resources.Requests
} else {
resources[i].Requests = recommendation.Target
}
defaultLimit := core.ResourceList{}
if limitRange != nil {
defaultLimit = limitRange.Default
}
containerControlledValues := vpa_api_util.GetContainerControlledValues(container.Name, vpaResourcePolicy)
if containerControlledValues == vpa_types.ContainerControlledValuesRequestsAndLimits {
proportionalLimits, limitAnnotations := vpa_api_util.GetProportionalLimit(container.Resources.Limits, container.Resources.Requests, resources[i].Requests, defaultLimit)
if proportionalLimits != nil {
resources[i].Limits = proportionalLimits
if len(limitAnnotations) > 0 {
annotations[container.Name] = append(annotations[container.Name], limitAnnotations...)
}
}
}
// If the recommendation only contains CPU or Memory (if the VPA was configured this way), we need to make sure we "backfill" the other.
// Only do this when the addAll flag is true.
if addAll {
cpuRequest, hasCpuRequest := container.Resources.Requests[core.ResourceCPU]
if _, ok := resources[i].Requests[core.ResourceCPU]; !ok && hasCpuRequest {
resources[i].Requests[core.ResourceCPU] = cpuRequest
}
memRequest, hasMemRequest := container.Resources.Requests[core.ResourceMemory]
if _, ok := resources[i].Requests[core.ResourceMemory]; !ok && hasMemRequest {
resources[i].Requests[core.ResourceMemory] = memRequest
}
cpuLimit, hasCpuLimit := container.Resources.Limits[core.ResourceCPU]
if _, ok := resources[i].Limits[core.ResourceCPU]; !ok && hasCpuLimit {
resources[i].Limits[core.ResourceCPU] = cpuLimit
}
memLimit, hasMemLimit := container.Resources.Limits[core.ResourceMemory]
if _, ok := resources[i].Limits[core.ResourceMemory]; !ok && hasMemLimit {
resources[i].Limits[core.ResourceMemory] = memLimit
}
}
}
return resources
}

That seems to be where the VPA grabs the requests and limits recommendations.
The limits come from vpa_api_util.GetProportionalLimit(). If you follow that code path, it ends up as a calculation based on the request/limit ratio of the incoming Pod.

Another reference is the GKE docs (which aren't the same as this project, but close enough as a useful datapoint):
https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler#containerresourcepolicy_v1_autoscalingk8sio

maxAllowed | ResourceList
Specifies the maximum CPU request and memory request allowed for the container. By default, there is no maximum applied.

@raywainman
Copy link
Contributor

Thanks for digging that up Adrian, that makes sense.

I found this old issue talking about this: #2359 (comment)

The recommendation there is to actually remove the CPU limit altogether and let the pod burst as needed, VPA will adjust CPU request from there (up to the cap).

For memory, it recommends keeping the limit and then letting the OOMs trigger scale up by VPA.

And that ultimately spun out #2387 asking for a way to give users more control over limit scaling which led to #3028.

This ultimately introduced the ability to disable limit scaling altogether by changing RequestsAndLimits to RequestsOnly - does that help in this case? Then you could set your pod limits to the max allowed values and let VPA handle the requests?

@adrianmoisey
Copy link
Member

Do we need to take action here?
I assume clarifying the documentation may make sense? Specifically https://github.com/kubernetes/autoscaler/tree/a2b793d530022f7ffb96d26a47995a08d5ae343e/vertical-pod-autoscaler#limits-control

@sarg3nt
Copy link

sarg3nt commented Sep 18, 2024

@adrianmoisey I would argue there are bugs here. The VPA should not be setting CPU limits higher than the resource policy stimulated and that it says it is going to. You can say it's a documentation issue if you want, but I really don't think it is.
If the developers do not believe that VPA should be setting limits, and doing it correctly, then why have it? Rip it out and make it clear it is for requests only. Having it half way working and IMHO actively harming the cluster is MUCH worse than not having it at all.
Please correct me if I'm missing something obvious here.

In any case, the product just doesn't feel ready to me. It's doing unexpected things which are not what I want a system that is actively changing deployments and restarting them doing in my k8s clusters. The purpose of a tool like this is to decrease the work we admins have to do to keep workloads healthy, this is feeling like it's doing the opposite to me.

I'd love to know what the team is planning. From what I've read in other tasks, seems like there is very little bandwidth to improve VPA and it's kind of stagnating. Knowing what the future looks like would help us decide if we put it on hold or rip it out of our automation completely (right now we have it off)

@voelzmo
Copy link
Contributor

voelzmo commented Sep 30, 2024

If the developers do not believe that VPA should be setting limits, and doing it correctly, then why have it?

Limits are not a VPA feature, they are a kubernetes feature. The general consensus seems to be that most workload owners shouldn't be setting CPU limits. There are valid use-cases for CPU limits, but if you don't have this very particular use-case: don't use them. When using CPU limits, you're basically sacrificing performance for predictability, so this isn't what most people want.
For memory limits, the consensus seems to be different: If you know your workload well enough, you can use them to detect memory leaks and other unwanted behavior.

Having it half way working and IMHO actively harming the cluster is MUCH worse than not having it at all.

Can you help me understand the harm that the current VPA behavior is causing? A CPU limit which is ridiculously high shouldn't cause any harm and be very comparable to not having a CPU limit at all (if not present, the limit is equal to the Node's capacity). I agree that it isn't adding any value above not specifying the CPU limit – but this is a configuration issue on the user's end then.

Depending on your use-case, this thread already contains the possible options to get the desired behavior:

  • If you need limits for your workload: set them. Depending on you workload's behavior, they should either stay static, because they're hard boundaries, or they should scale together with the requests (which are adjusted based on the load on your workload). This can be done with either using RequestsOnly or the default RequestsAndLimits
  • If you don't need limits (or have no good means identifying reasonable values for your workload): don't set them
  • maxAllowed is a mechanism that's mainly used to prevent Pods from becoming un-schedulable. There might be other use-cases for specifying this, I'm not sure. And scheduling only considers requests, not limits, so you are probably settings in a way that ensures your Pods still fit on your nodes.

So maybe the way to start here is: what's your motivation to set limits for CPU and memory? And: Are the limits for your workload statically or should they be changed depending on the load on your workload? This should lead you to the correct way to configure your workload and its VPA settings.

Regarding this specific issue: This works as designed. I don't think there's anything to do here from the VPA side.
There is no simple solution here, as kubernetes and also VPA are quite complex, because there are so many different use-cases and scenarios out there. All these settings exist for a reason.

@sarg3nt
Copy link

sarg3nt commented Sep 30, 2024

@voelzmo
Thank you for getting back to me.
I just want to clarify the following two items.

  1. Setting a VPA resource policy max allowed value is designed to NOT set the maximum allowed values for VPA to set that resource too. That this is by design and not a bug.
  2. The VPA output indicating it will set it to 2 CPU cores and then sets it to 4 is also by design and not a bug.

Is the above correct? These are not bugs?
Thank you.

@adrianmoisey
Copy link
Member

Setting a VPA resource policy max allowed value is designed to NOT set the maximum allowed values for VPA to set that resource too. That this is by design and not a bug.

If the VPA sets the requests to the same value as the max, it is not a bug.
If the VPA sets the limits to a value higher than the max, it it also not a bug.

The VPA output indicating it will set it to 2 CPU cores and then sets it to 4 is also by design and not a bug.

If the VPA sets the requests of the Pod to the same value as the recommendation, it is not a bug.
If the VPA sets the limits of the Pod to a higher value than the recommendation and RequestsAndLimits is set (which is default), it it also not a bug.

The primary use of the VPA is for it to set requests on Pods.

@sarg3nt
Copy link

sarg3nt commented Sep 30, 2024

@adrianmoisey
Then I would recommend removing the ability to set the max values in the policy as they are VERY confusing to an end user and from what I'm gathering you are saying, don't seem to do anything.

I would also recommend updating the docs to inform users, clearly and succinctly what VPA is for as you see it because that is not what the docs currently indicate and we've wasted a LOT of time on this for a product that does not do what the docs led us to believe. Part of that was probably me going in with a preconceived notion of what something called "Vertical pod autoscaler" would do and maybe that's my bad.

So from what you are saying, VPA is designed for setting requests and therefor to help the scheduler know where to schedule workloads.

To answer one of your previous questions (which I didn't do in the last response because I didn't want to pollute the point of my question.). Yes, limits are very necessary in most workloads due to the noisy neighbor issue. We have build agents for Jenkins that will take every single CPU core you throw at them if you don't set a limit. If a workload has issues and gets into a loop it can do the same thing. It's the same principle as setting max RAM, which should be a requirement for any sane k8s admin IMHO. We have a kyverno policy for that.

I'll take this back to the team to figure out what we want to do. I'm not sure we will see the point in using it as our goal was to help change the request and limits values as app requirements grow without us having to baby sit workloads.

@voelzmo
Copy link
Contributor

voelzmo commented Oct 7, 2024

@sarg3nt I see that you're frustrated with the VPA behavior and feel like you wasted time, because it doesn't do what you feel you need for your workload.

To take step back and look at your requirements: I think there might be a misunderstanding about what (CPU) limits are used for in the context of k8s. What you're writing regarding resource starvation in the absence of CPU limits and also in an earlier comment about VPA setting high limits leading to new apps not getting any resources leads me to believe that there is a misconception leading to your judgement about VPA's usefulness.

Let me try to give a short summary of CPU limits and requests (source and probably lots of other blogposts about this.)

  • CPU limits determine a hard upper bound on how much CPU time a container can get, even when there is no contender for CPU on a Node. That's why I wrote earlier: CPU limits are trading off predictability for performance. Your workload will never use more cycles than you defined as a limit, even if there were 20 idling CPU cores.
  • CPU requests are used for the Operating System's scheduler to determine the relative weight of a container compared to the other containers on that Node. If a container A has twice the CPU requests of container B, this mean that container A will get twice as much guaranteed CPU cycles than container B.
    • In case there is no contention for the CPU resource, this will still allow container B to consume more CPU cycles than container A, if e.g. container A doesn't have any load and returns its assigned CPU slices immediately to the OS scheduler.
    • In case of CPU contention (the containers on the Node want to consume more CPU cycles than there are available at a given slice), the OS CFS scheduler will still assign CPU slices based on the relative weight between the containers, as determined by the CPU requests
    • CPU requests are also used by the k8s scheduler to determine, if a new workload can still fit on a Node. Note that this doesn't consider how much CPU is used currently by any component. So even when a container is currently using all of the available CPU (because there are no contenders for it), this doesn't mean that no workload will be scheduled on that Node.

The scenario you're describing in which the lack of CPU limits cause processes to be starved or unable to be placed on a Node doesn't exist. CPU limits are a means to ensure that if you saw some batch process take 5 hours in the last run, you're sure that it will take 5 hours in the next run and not finish in 10 hours, because it only was that "fast" the last time, because there were free CPU resources that could be used.

When you're arguing here about how VPA promises a thing that it doesn't do and this should be considered a bug, let me point you to literally the very first paragraph in the project's README:

Vertical Pod Autoscaler (VPA) frees users from the necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.

This is what it does and nothing more. Setting requests based on usage, and (if not disabled) adjusting limits proportionally. So to answer your questions: this behavior is as designed and not a bug.

However, I also don't think this is a very useful discussion to have. The question is: with an understanding of how CPU limits and requests work, I still think, you can configure VPA to achieve what you need for your workload. I outlined this in an earlier comment.

We're also more than happy to accept a PR which clarifies the README section around limits a bit, so others don't run into the same misconception!

I hope that helps and maybe VPA will be of use for you in the future!

@adrianmoisey
Copy link
Member

/kind documentation

@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation.
Projects
None yet
Development

No branches or pull requests

7 participants