This tutorial will walk you through how to setup cluster scaling using cluster-autoscaler
to enable worker node autoscaling based on multiple metrics within Kubernetes.
All configuration files for this chapter are in the cluster-scaling
directory.
Check the size of your worker node autoscaling group and make sure the MaxSize is greater than 2. You can do this in the AWS Console, or use the CLI command below
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names nodes.<cluster-name>
For example
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names nodes.cluster.k8s.local
{
"AutoScalingGroups": [
{
"AutoScalingGroupARN": "arn:aws:autoscaling:us-east-1:123456789012:autoScalingGroup:xxx:autoScalingGroupName/nodes.cluster.k8s.local",
.
.
"MaxSize": 10,
}
The exact value can be identified using the command:
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names nodes.cluster.k8s.local | jq .AutoScalingGroups[0].MaxSize
If the value of MaxSize
is 2 (which means you can’t scale beyond 2 worker nodes) then update it using kops using the following command:
kops edit ig nodes
It opens up the cluster configuration file as shown:
Using cluster from kubectl context: example.cluster.k8s.local
apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2017-10-29T01:02:55Z labels: kops.k8s.io/cluster: example.cluster.k8s.local name: nodes spec: image: kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-07-28 machineType: t2.medium maxSize: 5 minSize: 5 role: Node subnets: - eu-central-1a - eu-central-1b - eu-central-1c
Change the value of maxSize
to 10
. Save the configuration file and exit out of the editor.
Update cluster configuration:
kops update cluster --name example.cluster.k8s.local --yes
It shows output as:
$ kops update cluster --name example.cluster.k8s.local --yes
I1029 14:09:37.853640 15831 apply_cluster.go:420] Gossip DNS: skipping DNS validation
I1029 14:09:38.925971 15831 executor.go:91] Tasks: 0 done / 81 total; 36 can run
I1029 14:09:40.200162 15831 executor.go:91] Tasks: 36 done / 81 total; 15 can run
I1029 14:09:41.293803 15831 executor.go:91] Tasks: 51 done / 81 total; 22 can run
I1029 14:09:43.034786 15831 executor.go:91] Tasks: 73 done / 81 total; 5 can run
I1029 14:09:43.868003 15831 executor.go:91] Tasks: 78 done / 81 total; 3 can run
I1029 14:09:44.655604 15831 executor.go:91] Tasks: 81 done / 81 total; 0 can run
I1029 14:09:44.991551 15831 update_cluster.go:247] Exporting kubecfg for cluster
Kops has set your kubectl context to example.cluster.k8s.local
Cluster changes have been applied to the cloud.
Changes may require instances to restart: kops rolling-update cluster
Check that there is no need to do a rolling update of the cluster:
$ kops rolling-update cluster
Using cluster from kubectl context: example.cluster.k8s.local
NAME STATUS NEEDUPDATE READY MIN MAX NODES
master-eu-central-1a Ready 0 1 1 1 1
master-eu-central-1b Ready 0 1 1 1 1
master-eu-central-1c Ready 0 1 1 1 1
nodes Ready 0 5 5 10 5
No rolling-update required.
Each worker running in your Kubernetes cluster MUST have the necessary IAM Policy attached. If you previously set up your cluster using kops
you can modify the existing permissions on your worker IAM Role or you can manually create a new policy and attach it to the IAM Role. Both methods are shown below.
Use kops to edit the cluster configuration
kops edit cluster --name example.cluster.k8s.local
This opens up an editor showing the configuration of the cluster. Add the additionalPolicies
section below to the end of the Cluster spec; your spec should look something like this
spec:
.
.
topology:
dns:
type: Public
masters: public
nodes: public
additionalPolicies:
node: |
[
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": ["*"]
}
]
Note, the first few lines are shown for continuity.
Update the cluster:
$ kops update cluster --name example.cluster.k8s.local --yes
I1029 15:25:24.068325 21411 apply_cluster.go:420] Gossip DNS: skipping DNS validation
I1029 15:25:25.002684 21411 executor.go:91] Tasks: 0 done / 81 total; 36 can run
I1029 15:25:26.359336 21411 executor.go:91] Tasks: 36 done / 81 total; 15 can run
I1029 15:25:27.378808 21411 executor.go:91] Tasks: 51 done / 81 total; 22 can run
I1029 15:25:29.512767 21411 executor.go:91] Tasks: 73 done / 81 total; 5 can run
I1029 15:25:30.338608 21411 executor.go:91] Tasks: 78 done / 81 total; 3 can run
I1029 15:25:31.189236 21411 executor.go:91] Tasks: 81 done / 81 total; 0 can run
I1029 15:25:31.586799 21411 update_cluster.go:247] Exporting kubecfg for cluster
Kops has set your kubectl context to example.cluster.k8s.local
Cluster changes have been applied to the cloud.
Changes may require instances to restart: kops rolling-update cluster
There is no need to rolling update the cluster.
The policy below must be attached to the role assigned to the Kubernetes worker nodes. The role definition exists in the file templates/asg-policy.json
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ], "Resource": "*" } ] }
To configure these permissions, you need to create the policy using the command below.
aws iam create-policy --policy-document file://templates/asg-policy.json --policy-name ClusterAutoScaling
You will see a response similar to this:
$ aws iam create-policy --policy-document file://templates/asg-policy.json --policy-name ClusterAutoScaling
=> {
"Policy": {
"PolicyName": "ClusterAutoScaling",
"PolicyId": "ANPAJVCFZ6I4OL6BGFGD2",
"Arn": "arn:aws:iam::<account-id>:policy/ClusterAutoScaling",
"Path": "/",
"DefaultVersionId": "v1",
"AttachmentCount": 0,
"IsAttachable": true,
"CreateDate": "2017-10-05T20:35:54.964Z",
"UpdateDate": "2017-10-05T20:35:54.964Z"
}
}
Then attach the policy to the role assigned to the Kubernetes worker nodes. To attach the policy to the IAM Role, you first need to get the name of the role; if you set up your cluster using kops
, this will be nodes.[DOMAIN]
such as nodes.cluster.k8s.local
From the output of the create-policy
command get the .Policy.Arn
attribute and use that to add the policy to the role. Alternatively, you can use this convenience command which retrieves your AWS Account ID using AWS CLI:
aws iam attach-role-policy --role-name nodes.cluster.k8s.local --policy-arn arn:aws:iam::`aws sts get-caller-identity --output text --query 'Account'`:policy/ClusterAutoScaling
Before running the command below, update the following attributes in file templates/2-10-autoscaler.yaml
:
-
command --nodes
to the name of your nodes ASG -
env.value
to the name of your region
You can find the name of nodes ASG using this command
$ aws autoscaling describe-auto-scaling-groups --query 'AutoScalingGroups[].AutoScalingGroupName' [ "master-eu-central-1a.masters.cluster.k8s.local", "master-eu-central-1b.masters.cluster.k8s.local", "master-eu-central-1c.masters.cluster.k8s.local", "nodes.cluster.k8s.local" ]
The last value in this output is the name of the nodes ASG. If the default cluster name of example.cluster.k8s.local
was used to create the cluster, then there is no need to make any changes to the configuration file.
Now, install the cluster-autoscaler
with a configuration of min: 2, max: 10, name: cluster-autoscaler
$ kubectl apply -f templates/2-10-autoscaler.yaml deployment "cluster-autoscaler" created
Once this is deployed you can view the logs by running
kubectl logs deployment/cluster-autoscaler --namespace=kube-system
The output will be shown as:
I1029 22:49:19.880269 1 main.go:225] Cluster Autoscaler 0.6.0
I1029 22:49:19.995396 1 leaderelection.go:179] attempting to acquire leader lease...
I1029 22:49:20.075665 1 leaderelection.go:189] successfully acquired lease kube-system/cluster-autoscaler
I1029 22:49:20.075796 1 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"cluster-autoscaler", UID:"6677810d-bcfb-11e7-a483-0681c180117e", APIVersion:"v1", ResourceVersion:"140681", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' cluster-autoscaler-33142225-z150r became leader
I1029 22:49:20.076730 1 reflector.go:198] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:144
. . .
I1029 22:50:21.488144 1 cluster.go:89] Fast evaluation: node ip-172-20-109-10.eu-central-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: kube-dns-autoscaler-4184363331-jh7jb
I1029 22:50:21.488152 1 cluster.go:75] Fast evaluation: ip-172-20-75-132.eu-central-1.compute.internal for removal
I1029 22:50:21.488172 1 cluster.go:89] Fast evaluation: node ip-172-20-75-132.eu-central-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: kube-dns-729475360-z4d1r
I1029 22:50:23.324479 1 leaderelection.go:204] successfully renewed lease kube-system/cluster-autoscaler
To validate that the cluster-autoscaler
is properly working you can use the aws
CLI to request the current DesiredCapacity
of your ASG with
export ASG_NAME=nodes.cluster.k8s.local aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].DesiredCapacity'
You should see a result of 5, or whatever was the initial size of your cluster.
Check the max size of your cluster:
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].MaxSize' 10
This correctly shows 10, as was set earlier in this chapter.
Then you can deploy an application which requests more resources than your cluster has available see templates/dummy-resource-offers.yaml
for reference.
Note
|
Depending on the size of your cluster this might not trigger autoscaling. Increase the replicas: 10 count to the necessary amount you need to fill your clusters resources.
|
$ kubectl apply -f templates/dummy-resource-offers.yaml service "greeter" created deployment "greeter" created
After this loads you can use the describe-auto-scaling-groups
command again to see the DesiredCapacity
change.
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].DesiredCapacity'
If you have deployed Heapster, as described in the Cluster Monitoring lab, you can use this command to see the resource usage of your nodes:
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-172-20-109-10.eu-central-1.compute.internal 543m 27% 1722Mi 44%
ip-172-20-44-33.eu-central-1.compute.internal 125m 12% 2120Mi 57%
ip-172-20-75-132.eu-central-1.compute.internal 607m 30% 1733Mi 44%
ip-172-20-41-77.eu-central-1.compute.internal 450m 22% 1703Mi 44%
ip-172-20-85-128.eu-central-1.compute.internal 86m 8% 2049Mi 55%
ip-172-20-93-108.eu-central-1.compute.internal 534m 26% 1747Mi 45%
ip-172-20-106-93.eu-central-1.compute.internal 522m 26% 1734Mi 44%
ip-172-20-101-20.eu-central-1.compute.internal 101m 5% 2046Mi 55%
Once auto-scaling triggers, you should see a result of a higher number of nodes than original; this may take a few minutes:
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query '.AutoScalingGroups[0].DesiredCapacity' 5
If auto-scaling does not trigger, then you can increase the number of replicas using the command:
$ kubectl scale --replicas=30 deployment/greeter deployment "greeter" scaled
Now, auto-scaling may trigger, based upon your cluster configuration. The updated query for ASG may look like as shown:
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].DesiredCapacity' 10
It takes a few minutes for the additional worker nodes to start and become part of the cluster. The updated nodes information is now shown:
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-172-20-85-128.eu-central-1.compute.internal 74m 7% 2088Mi 56%
ip-172-20-93-108.eu-central-1.compute.internal 25m 1% 1734Mi 44%
ip-172-20-109-10.eu-central-1.compute.internal 26m 1% 1716Mi 44%
ip-172-20-86-51.eu-central-1.compute.internal 24m 1% 1075Mi 27%
ip-172-20-51-221.eu-central-1.compute.internal 21m 1% 1074Mi 27%
ip-172-20-61-253.eu-central-1.compute.internal 22m 1% 1075Mi 27%
ip-172-20-41-77.eu-central-1.compute.internal 31m 1% 1716Mi 44%
ip-172-20-106-93.eu-central-1.compute.internal 27m 1% 1745Mi 45%
ip-172-20-101-20.eu-central-1.compute.internal 94m 4% 2078Mi 56%
ip-172-20-44-33.eu-central-1.compute.internal 112m 11% 2148Mi 58%
ip-172-20-116-218.eu-central-1.compute.internal 22m 1% 1070Mi 27%
ip-172-20-44-50.eu-central-1.compute.internal 18m 0% 1076Mi 27%
ip-172-20-75-132.eu-central-1.compute.internal 27m 1% 1723Mi 44%