Name	Name	Last commit message	Last commit date
parent directory ..
templates	templates
readme.adoc	readme.adoc

Kubernetes Cluster Auto-scaling

Table of Contents

Pre-Requisites
Deploy Worker Autoscaler
Validation

This tutorial will walk you through how to setup cluster scaling using cluster-autoscaler to enable worker node autoscaling based on multiple metrics within Kubernetes.

Pre-Requisites

All configuration files for this chapter are in the cluster-scaling directory.

Autoscaling group size

Check the size of your worker node autoscaling group and make sure the MaxSize is greater than 2. You can do this in the AWS Console, or use the CLI command below

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names nodes.<cluster-name>

For example

$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names nodes.cluster.k8s.local
{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupARN": "arn:aws:autoscaling:us-east-1:123456789012:autoScalingGroup:xxx:autoScalingGroupName/nodes.cluster.k8s.local",
            .
            .
            "MaxSize": 10,
}

The exact value can be identified using the command:

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names nodes.cluster.k8s.local | jq .AutoScalingGroups[0].MaxSize

If the value of MaxSize is 2 (which means you can’t scale beyond 2 worker nodes) then update it using kops using the following command:

kops edit ig nodes

It opens up the cluster configuration file as shown:

Using cluster from kubectl context: example.cluster.k8s.local

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-10-29T01:02:55Z
  labels:
    kops.k8s.io/cluster: example.cluster.k8s.local
  name: nodes
spec:
  image: kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-07-28
  machineType: t2.medium
  maxSize: 5
  minSize: 5
  role: Node
  subnets:
  - eu-central-1a
  - eu-central-1b
  - eu-central-1c

Change the value of maxSize to 10. Save the configuration file and exit out of the editor.

Update cluster configuration:

kops update cluster --name example.cluster.k8s.local --yes

It shows output as:

$ kops update cluster --name example.cluster.k8s.local --yes
I1029 14:09:37.853640   15831 apply_cluster.go:420] Gossip DNS: skipping DNS validation
I1029 14:09:38.925971   15831 executor.go:91] Tasks: 0 done / 81 total; 36 can run
I1029 14:09:40.200162   15831 executor.go:91] Tasks: 36 done / 81 total; 15 can run
I1029 14:09:41.293803   15831 executor.go:91] Tasks: 51 done / 81 total; 22 can run
I1029 14:09:43.034786   15831 executor.go:91] Tasks: 73 done / 81 total; 5 can run
I1029 14:09:43.868003   15831 executor.go:91] Tasks: 78 done / 81 total; 3 can run
I1029 14:09:44.655604   15831 executor.go:91] Tasks: 81 done / 81 total; 0 can run
I1029 14:09:44.991551   15831 update_cluster.go:247] Exporting kubecfg for cluster
Kops has set your kubectl context to example.cluster.k8s.local

Cluster changes have been applied to the cloud.


Changes may require instances to restart: kops rolling-update cluster

Check that there is no need to do a rolling update of the cluster:

$ kops rolling-update cluster
Using cluster from kubectl context: example.cluster.k8s.local

NAME                  STATUS  NEEDUPDATE  READY MIN MAX NODES
master-eu-central-1a  Ready   0           1     1   1   1
master-eu-central-1b  Ready   0           1     1   1   1
master-eu-central-1c  Ready   0           1     1   1   1
nodes                 Ready   0           5     5   10  5

No rolling-update required.

IAM permissions

Each worker running in your Kubernetes cluster MUST have the necessary IAM Policy attached. If you previously set up your cluster using kops you can modify the existing permissions on your worker IAM Role or you can manually create a new policy and attach it to the IAM Role. Both methods are shown below.

Using Kops to update the IAM policy

Use kops to edit the cluster configuration

kops edit cluster --name example.cluster.k8s.local

This opens up an editor showing the configuration of the cluster. Add the additionalPolicies section below to the end of the Cluster spec; your spec should look something like this

spec:
.
.
  topology:
    dns:
      type: Public
    masters: public
    nodes: public
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
          ],
          "Resource": ["*"]
        }
      ]

Note, the first few lines are shown for continuity.

Update the cluster:

$ kops update cluster --name example.cluster.k8s.local --yes
I1029 15:25:24.068325   21411 apply_cluster.go:420] Gossip DNS: skipping DNS validation
I1029 15:25:25.002684   21411 executor.go:91] Tasks: 0 done / 81 total; 36 can run
I1029 15:25:26.359336   21411 executor.go:91] Tasks: 36 done / 81 total; 15 can run
I1029 15:25:27.378808   21411 executor.go:91] Tasks: 51 done / 81 total; 22 can run
I1029 15:25:29.512767   21411 executor.go:91] Tasks: 73 done / 81 total; 5 can run
I1029 15:25:30.338608   21411 executor.go:91] Tasks: 78 done / 81 total; 3 can run
I1029 15:25:31.189236   21411 executor.go:91] Tasks: 81 done / 81 total; 0 can run
I1029 15:25:31.586799   21411 update_cluster.go:247] Exporting kubecfg for cluster
Kops has set your kubectl context to example.cluster.k8s.local

Cluster changes have been applied to the cloud.


Changes may require instances to restart: kops rolling-update cluster

There is no need to rolling update the cluster.

Manually attaching the IAM policy to the role

The policy below must be attached to the role assigned to the Kubernetes worker nodes. The role definition exists in the file templates/asg-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup"
      ],
      "Resource": "*"
    }
  ]
}

To configure these permissions, you need to create the policy using the command below.

aws iam create-policy --policy-document file://templates/asg-policy.json --policy-name ClusterAutoScaling

You will see a response similar to this:

  $ aws iam create-policy --policy-document file://templates/asg-policy.json --policy-name ClusterAutoScaling
  => {
    "Policy": {
        "PolicyName": "ClusterAutoScaling",
        "PolicyId": "ANPAJVCFZ6I4OL6BGFGD2",
        "Arn": "arn:aws:iam::<account-id>:policy/ClusterAutoScaling",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "IsAttachable": true,
        "CreateDate": "2017-10-05T20:35:54.964Z",
        "UpdateDate": "2017-10-05T20:35:54.964Z"
    }
  }

Then attach the policy to the role assigned to the Kubernetes worker nodes. To attach the policy to the IAM Role, you first need to get the name of the role; if you set up your cluster using kops, this will be nodes.[DOMAIN] such as nodes.cluster.k8s.local

From the output of the create-policy command get the .Policy.Arn attribute and use that to add the policy to the role. Alternatively, you can use this convenience command which retrieves your AWS Account ID using AWS CLI:

aws iam attach-role-policy --role-name nodes.cluster.k8s.local --policy-arn arn:aws:iam::`aws sts get-caller-identity --output text --query 'Account'`:policy/ClusterAutoScaling

Deploy Worker Autoscaler

Before running the command below, update the following attributes in file templates/2-10-autoscaler.yaml:

command --nodes to the name of your nodes ASG
env.value to the name of your region

You can find the name of nodes ASG using this command

$ aws autoscaling describe-auto-scaling-groups --query 'AutoScalingGroups[].AutoScalingGroupName'
[
    "master-eu-central-1a.masters.cluster.k8s.local",
    "master-eu-central-1b.masters.cluster.k8s.local",
    "master-eu-central-1c.masters.cluster.k8s.local",
    "nodes.cluster.k8s.local"
]

The last value in this output is the name of the nodes ASG. If the default cluster name of example.cluster.k8s.local was used to create the cluster, then there is no need to make any changes to the configuration file.

Now, install the cluster-autoscaler with a configuration of min: 2, max: 10, name: cluster-autoscaler

$ kubectl apply -f templates/2-10-autoscaler.yaml
deployment "cluster-autoscaler" created

Once this is deployed you can view the logs by running

kubectl logs deployment/cluster-autoscaler --namespace=kube-system

The output will be shown as:

I1029 22:49:19.880269       1 main.go:225] Cluster Autoscaler 0.6.0
I1029 22:49:19.995396       1 leaderelection.go:179] attempting to acquire leader lease...
I1029 22:49:20.075665       1 leaderelection.go:189] successfully acquired lease kube-system/cluster-autoscaler
I1029 22:49:20.075796       1 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"cluster-autoscaler", UID:"6677810d-bcfb-11e7-a483-0681c180117e", APIVersion:"v1", ResourceVersion:"140681", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' cluster-autoscaler-33142225-z150r became leader
I1029 22:49:20.076730       1 reflector.go:198] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:144

. . .

I1029 22:50:21.488144       1 cluster.go:89] Fast evaluation: node ip-172-20-109-10.eu-central-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: kube-dns-autoscaler-4184363331-jh7jb
I1029 22:50:21.488152       1 cluster.go:75] Fast evaluation: ip-172-20-75-132.eu-central-1.compute.internal for removal
I1029 22:50:21.488172       1 cluster.go:89] Fast evaluation: node ip-172-20-75-132.eu-central-1.compute.internal cannot be removed: non-daemonset, non-mirrored, non-pdb-assigned kube-system pod present: kube-dns-729475360-z4d1r
I1029 22:50:23.324479       1 leaderelection.go:204] successfully renewed lease kube-system/cluster-autoscaler

Validation

To validate that the cluster-autoscaler is properly working you can use the aws CLI to request the current DesiredCapacity of your ASG with

export ASG_NAME=nodes.cluster.k8s.local
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].DesiredCapacity'

You should see a result of 5, or whatever was the initial size of your cluster.

Check the max size of your cluster:

$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].MaxSize'
10

This correctly shows 10, as was set earlier in this chapter.

Then you can deploy an application which requests more resources than your cluster has available see templates/dummy-resource-offers.yaml for reference.

Note	Depending on the size of your cluster this might not trigger autoscaling. Increase the `replicas: 10` count to the necessary amount you need to fill your clusters resources.

$ kubectl apply -f templates/dummy-resource-offers.yaml
service "greeter" created
deployment "greeter" created

After this loads you can use the describe-auto-scaling-groups command again to see the DesiredCapacity change.

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].DesiredCapacity'

If you have deployed Heapster, as described in the Cluster Monitoring lab, you can use this command to see the resource usage of your nodes:

$ kubectl top nodes
NAME                                             CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
ip-172-20-109-10.eu-central-1.compute.internal   543m         27%       1722Mi          44%
ip-172-20-44-33.eu-central-1.compute.internal    125m         12%       2120Mi          57%
ip-172-20-75-132.eu-central-1.compute.internal   607m         30%       1733Mi          44%
ip-172-20-41-77.eu-central-1.compute.internal    450m         22%       1703Mi          44%
ip-172-20-85-128.eu-central-1.compute.internal   86m          8%        2049Mi          55%
ip-172-20-93-108.eu-central-1.compute.internal   534m         26%       1747Mi          45%
ip-172-20-106-93.eu-central-1.compute.internal   522m         26%       1734Mi          44%
ip-172-20-101-20.eu-central-1.compute.internal   101m         5%        2046Mi          55%

Once auto-scaling triggers, you should see a result of a higher number of nodes than original; this may take a few minutes:

$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query '.AutoScalingGroups[0].DesiredCapacity'
5

If auto-scaling does not trigger, then you can increase the number of replicas using the command:

$ kubectl scale --replicas=30 deployment/greeter
deployment "greeter" scaled

Now, auto-scaling may trigger, based upon your cluster configuration. The updated query for ASG may look like as shown:

$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names=$ASG_NAME --query 'AutoScalingGroups[0].DesiredCapacity'
10

It takes a few minutes for the additional worker nodes to start and become part of the cluster. The updated nodes information is now shown:

$ kubectl top nodes
NAME                                              CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
ip-172-20-85-128.eu-central-1.compute.internal    74m          7%        2088Mi          56%
ip-172-20-93-108.eu-central-1.compute.internal    25m          1%        1734Mi          44%
ip-172-20-109-10.eu-central-1.compute.internal    26m          1%        1716Mi          44%
ip-172-20-86-51.eu-central-1.compute.internal     24m          1%        1075Mi          27%
ip-172-20-51-221.eu-central-1.compute.internal    21m          1%        1074Mi          27%
ip-172-20-61-253.eu-central-1.compute.internal    22m          1%        1075Mi          27%
ip-172-20-41-77.eu-central-1.compute.internal     31m          1%        1716Mi          44%
ip-172-20-106-93.eu-central-1.compute.internal    27m          1%        1745Mi          45%
ip-172-20-101-20.eu-central-1.compute.internal    94m          4%        2078Mi          56%
ip-172-20-44-33.eu-central-1.compute.internal     112m         11%       2148Mi          58%
ip-172-20-116-218.eu-central-1.compute.internal   22m          1%        1070Mi          27%
ip-172-20-44-50.eu-central-1.compute.internal     18m          0%        1076Mi          27%
ip-172-20-75-132.eu-central-1.compute.internal    27m          1%        1723Mi          44%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-scaling

cluster-scaling

readme.adoc

Kubernetes Cluster Auto-scaling

Pre-Requisites

Autoscaling group size

IAM permissions

Using Kops to update the IAM policy

Manually attaching the IAM policy to the role

Deploy Worker Autoscaler

Validation

Files

cluster-scaling

Directory actions

More options

Directory actions

More options

Latest commit

History

cluster-scaling

Folders and files

parent directory

readme.adoc

Kubernetes Cluster Auto-scaling

Pre-Requisites

Autoscaling group size

IAM permissions

Using Kops to update the IAM policy

Manually attaching the IAM policy to the role

Deploy Worker Autoscaler

Validation