Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for autoscaling/v2-based HorizontalPodAutoscaler on Kubernetes v1.23+ #2462

Closed
tomkerkhove opened this issue Jan 11, 2022 · 19 comments · Fixed by #3606
Closed

Support for autoscaling/v2-based HorizontalPodAutoscaler on Kubernetes v1.23+ #2462

tomkerkhove opened this issue Jan 11, 2022 · 19 comments · Fixed by #3606
Assignees
Labels
feature All issues for new features that have been committed to help wanted Looking for support from community needs-discussion stale-bot-ignore All issues that should not be automatically closed by our stale bot

Comments

@tomkerkhove
Copy link
Member

Proposal

HPA has graduated to GA and we should use autoscaling/v2 for all the HPAs that we create on Kubernetes v1.23+.

https://kubernetes.io/blog/2021/12/07/kubernetes-1-23-release-announcement/#horizontalpodautoscaler-v2-graduates-to-ga

Use-Case

Provide alignment with Kubernetes upstream.

Anything else?

No response

@tomkerkhove tomkerkhove added needs-discussion feature-request All issues for new features that have not been committed to labels Jan 11, 2022
@zroubalik zroubalik added the stale-bot-ignore All issues that should not be automatically closed by our stale bot label Jan 11, 2022
@JorTurFer
Copy link
Member

In this case, what approach are we going to follow? Supporting both depending on the cluster?

@tomkerkhove
Copy link
Member Author

Based on the Kubernetes version it is installed on I'd try to use the latest

@tomkerkhove tomkerkhove added feature All issues for new features that have been committed to and removed feature-request All issues for new features that have not been committed to labels Jan 13, 2022
@tomkerkhove
Copy link
Member Author

We should plan for this as autoscaling/v2beta1 for HorizontalPodAutoscaler will be removed in Kubernetes 1.25 and autoscaling/v2beta2 in Kubernetes 1.26.

https://kubernetes.io/blog/2022/04/07/upcoming-changes-in-kubernetes-1-24/#looking-ahead

@tomkerkhove tomkerkhove added the help wanted Looking for support from community label May 31, 2022
@beingashwin
Copy link

Hi,
can someone please point me to a updated helm chart which i can refer to ?
Post upgrading aks to v1.25.6 i am facing issues . Argocd is pointing to use v2 for HPA and i tried that n yet am getting issues probably the syntax of helm chart has changed in some way..using a {if} block in metrics is not supported or changed

@JorTurFer
Copy link
Member

This changed is applied in the KEDA operator code and how it creates the HPA. From ArgoCD pov, you shouldn't see any change because the generated HPA isn't managed by ArgoCD.
nvm, the first version with this change applied is v2.9

@beingashwin
Copy link

well i jusr want the updated schema...i thought i can get a updated schema for V2

@JorTurFer
Copy link
Member

want the updated schema

What do you mean? autoscaling/v2 schema? I don't get your question, sorry

@beingashwin
Copy link

yes i mean that ...its probably working now..though the memory is not correctly getting fetched by metrics need to check on that...argocd is syncing every 5 secs due to autoscaling.... thanks anyway

@JorTurFer
Copy link
Member

yes i mean that ...its probably working now..though the memory is not correctly getting fetched by metrics need to check on that...argocd is syncing every 5 secs due to autoscaling.... thanks anyway

I still don't get you, sorry. What problem are you seeing? Could you describe it?

@beingashwin
Copy link

@JorTurFer well sure let me try
So the issue is we have argocd deployed on k8s cluster and when i configure hpa for cpu and memory metrics argocd every 5 sec tries to sync
only with cpu it seems to work fine as the avg utilisation is well below limit
I am assuming the utilisation is making it constantly sync and doesn't allow enough time for hpa to be in synced status

@JorTurFer
Copy link
Member

JorTurFer commented Jun 2, 2023

So the issue is we have argocd deployed on k8s cluster and when i configure hpa for cpu and memory metrics argocd every 5 sec tries to sync

Do you mean an ScaledObject with CPU/Memroy trigger or just an HPA directly?

We also use ArgoCD and I haven't seen this behavior, could you share your ScaledObject to replicate it in my local?

What ArcoCD version are you using?

@beingashwin
Copy link

beingashwin commented Jun 2, 2023

@JorTurFer The argocd version is 2.7 aks is 1.25.6 and i have used the below manifest file

{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: example-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: example-deployment
minReplicas: 2
maxReplicas: 5
behavior:
scaleDown:
stabilizationWindowSeconds: 300
scaleUp:
stabilizationWindowSeconds: 300
{{- if .Values.autoscaling.metricInterval }}
periodicity:
intervalSeconds: 600
{{- end }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}

  • type: Resource
    resource:
    name: cpu
    target:
    type: Utilization
    averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
  • type: Resource
    resource:
    name: memory
    target:
    type: Utilization
    averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
    {{- end }}

and in the values yaml file provided the below value
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
metricInterval: 300

@JorTurFer
Copy link
Member

JorTurFer commented Jun 2, 2023

But, is that related with KEDA somehow? I mean, that's an HPA and KEDA doesn't manage it, KEDA generates its own HPAs for ScaledObjects, but the cpu and memory metrics are served by the k8s metrics server, not by KEDA

@JorTurFer
Copy link
Member

JorTurFer commented Jun 2, 2023

BTW, what is this for?
image

I have checked the HPA spec and I haven't found that property xD

@beingashwin
Copy link

beingashwin commented Jun 2, 2023

@JorTurFer well i added it only after i saw that argocd is syncing every 5 sec else it was pretty straight forward without behavior block...did you try on your local with both cpu and memory?
i suspected that memory metrics is too volatile and probably that is making replicas to spin up too quickly so though of introducing parameters to make it slow...periodicity is described as something which makes HPA wait before checking metrics

@JorTurFer
Copy link
Member

JorTurFer commented Jun 2, 2023

periodicity is described as something which makes HPA wait before checking metrics

Do you have any link about that? I thought that the HPA Controller has a fixed period :)

did you try on your local with both cpu and memory?

No, we use cpu + prometheus or cpu + nats

i suspected that memory metrics is too volatile and probably that is making replicas to spin up too quickly

Based on your manifest, the HPA controller won't scale out/in more than once each 300 seconds, I don't think that's the reason behind the syncing tbh. Do you have autosync enabled?

@JorTurFer JorTurFer reopened this Jun 2, 2023
@beingashwin
Copy link

beingashwin commented Jun 2, 2023

Do you have any link about that? I thought that the HPA Controller has a fixed period :)

well i dont have it handy i merged my code with just cpu and new api version n schema next sprint will start again hunting this

No, we use cpu + prometheus or cpu + nats

whats nats?? you told you have argocd so you dont see this sync issue? downscale also seems to be not working correctly

Based on your manifest, the HPA controller won't scale out/in more than once each 300 seconds, I don't think that's the reason behind the syncing tbh. Do you have autosync enabled?

yes we do have autosync enabled which retries every 5 sec....but y it happens when i include memory+cpu and not with cpu only

@JorTurFer
Copy link
Member

JorTurFer commented Jun 2, 2023

Hi,
NATS it's a message streaming stuff for queues, my ScaledObjects use CPU and NATS Jestream as triggers, so my HPA uses them. I use ArgoCD to deploy the applications and no, I don't see that problem that you are refering.

If you have autosync every 5 seconds, maybe there is something modifying your workload automatically and ArgoCD is trying to reconcile it but IDK what can be as the only manifest that you have sent is for an HPA and KEDA doesn't accept HPAs (so I'm not sure how is your problem related with this issue or with KEDA, that's the thing that I'm trying to figure out 😄 )

I'd suggest disabling the autosync for a while and check which differences are detected by ArgoCD. I mean, if you don't autosync the Application every 5 seconds and there are mismatches between real status and ArgoCD desired status, you will see the differences and then, you will know which resources are triggering the sync all the time

@beingashwin
Copy link

ok thanks for the hint..have a good day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature All issues for new features that have been committed to help wanted Looking for support from community needs-discussion stale-bot-ignore All issues that should not be automatically closed by our stale bot
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants