Cluster autoscaling for AKS: Request Rate Throttling has been detected for your Cluster #1432

kwaazaar · 2020-12-17T12:36:32Z

kwaazaar
Dec 17, 2020

Out of the box on my first attempt to investigate Keda I run into scaling issues with my cluster.

Expected Behavior

Cluster autoscaling should keep working. It's the only reason why running functions on AKS (or other auto-scaling clusters) makes sense.

Actual Behavior

Keda is scaling to 100 replicas. My cluster needs to autoscale to provide the power to run these replicas (from 1 until its maximum of 10 nodes). This all works, but after a few of these 'bursts' (the queue gets filled up quickly periodically), I get these errors on the cluster/VMSS it's not scaling anymore.
Maybe the solution would be to not use 100 replicas as the default max. 10 maybe makes more sense.
And maybe advice to use resource limits in the deployment of the function, because by default it scheduled all the replicas on a single node, which caused my whole cluster to 'stumble'.

Steps to Reproduce the Problem

See above.

Logs from KEDA operator

No errors. Problem is AKS, but indirectly caused by Keda behavior.

Specifications

KEDA Version: 1.1.0
Platform & Version: AKS
Kubernetes Version: 1.18.10
Scaler(s): azure-servicebus

tomkerkhove · 2020-12-17T15:01:37Z

tomkerkhove
Dec 17, 2020
Collaborator

Thanks for the report but can you explain a bit further please? This sounds more like an AKS problem rather than KEDA?

0 replies

mboutet · 2020-12-18T22:58:11Z

mboutet
Dec 18, 2020

You don't give enough information on your deployment config, but some points to consider:

Make sure to set adequate resources request for your deployment. Kubernetes uses these to calculate if a pod can be scheduled on a node. If you don't have resources request or if they are too low, k8s will schedule too much pods per node which can lead to resources starvation on that node.
You can use the topologySpreadConstraints feature in the pod spec to tell Kubernetes to spread the pods across multiple topology domains. In your case the topology domain would be the hostname.

Resources limit could help prevent your pods from overusing resources, but Kubernetes won't use them to schedule your pods.

0 replies

kwaazaar · 2020-12-24T08:57:22Z

kwaazaar
Dec 24, 2020
Author

The generated deployment (by func kubernetes deploy) does not contain resource requests. I had to add those indeed to prevent the node to become not ready. But my point was about cluster autoscaling of AKS: it's so slow and Keda is creating many replicas (100), that I run into throtling issues with Azure. So it's indeed not a bug in Keda, but a bit more guidance would be nice, so the cluster doesn't 'collapse' on my first attempt to use Keda.
So this is probably a documentation issue. Just like those resource requests that should be added to the generated yaml. Or maybe add them commented out (because it's hard to guess the required resources) to at least inform the developer that it's important to add.

0 replies

tomkerkhove · 2020-12-24T09:06:57Z

tomkerkhove
Dec 24, 2020
Collaborator

Thanks for letting us know, would you mind opening an issue on http://github.com/azure/azure-functions-core-tools for the func kubernetes deploy please?

I've opened #1450 & kedacore/charts#112 to provide these resources out-of-the-box as guidance

0 replies

mboutet · 2020-12-24T15:35:08Z

mboutet
Dec 24, 2020

@kwaazaar, you could also configure the scaling behaviour to prevent too much pods from being created in a short amount of time. See support-for-configurable-scaling-behavior which can be configured in the ScaledObject spec. For instance, you could limit the rate at which the pods are scaled up as well as decrease the stabilizationWindowSeconds for the scale down. That way, the HPA won't overreact to small spikes and will more often re-evaluate if the deployments can be scaled down. In the end, this will result in less node being created by the cluster autoscaler and less calls to the AKS backend.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster autoscaling for AKS: Request Rate Throttling has been detected for your Cluster #1432

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Cluster autoscaling for AKS: Request Rate Throttling has been detected for your Cluster #1432

kwaazaar Dec 17, 2020

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

Specifications

Replies: 5 comments

tomkerkhove Dec 17, 2020 Collaborator

mboutet Dec 18, 2020

kwaazaar Dec 24, 2020 Author

tomkerkhove Dec 24, 2020 Collaborator

mboutet Dec 24, 2020

kwaazaar
Dec 17, 2020

tomkerkhove
Dec 17, 2020
Collaborator

mboutet
Dec 18, 2020

kwaazaar
Dec 24, 2020
Author

tomkerkhove
Dec 24, 2020
Collaborator

mboutet
Dec 24, 2020