Merge pull request #1983 from drewhemm/mixed-instances-policy-documen…

…tation Added documentation about AWS ASG MixedInstancesPolicy
kubernetes · May 10, 2019 · 750001f · 750001f
2 parents e8b1d04 + e108a29
commit 750001f
Show file tree

Hide file tree

Showing 2 changed files with 105 additions and 0 deletions.
diff --git a/cluster-autoscaler/cloudprovider/aws/MixedInstancePolicy.md b/cluster-autoscaler/cloudprovider/aws/MixedInstancePolicy.md
@@ -0,0 +1,69 @@
+The following is an excerpt from a CloudFormation template showing how a MixedInstancesPolicy can be used with ClusterAutoscaler:
+
+```json
+{
+    "Resources": {
+        "LaunchTemplate": {
+            "Type": "AWS::EC2::LaunchTemplate",
+            "Properties": {
+                "LaunchTemplateName": "memory-opt-2xlarge",
+                "LaunchTemplateData": {
+                    "InstanceType": "r5.2xlarge"
+                }
+            }
+        },
+        "ASGA": {
+            "Type": "AWS::AutoScaling::AutoScalingGroup",
+            "Properties": {
+                "MinSize": 1,
+                "MaxSize": 10,
+                "MixedInstancesPolicy": {
+                    "InstancesDistribution": {
+                        "OnDemandBaseCapacity": 0,
+                        "OnDemandPercentageAboveBaseCapacity": 0
+                    },
+                    "LaunchTemplate": {
+                        "LaunchTemplateSpecification": {
+                            "LaunchTemplateId": {
+                                "Ref": "LaunchTemplate"
+                            },
+                            "Version": {
+                                "Fn::GetAtt": [
+                                    "LaunchTemplate",
+                                    "LatestVersionNumber"
+                                ]
+                            }
+                        },
+                        "Overrides": [
+                            {
+                                "InstanceType": "r5.2xlarge"
+                            },
+                            {
+                                "InstanceType": "r5d.2xlarge"
+                            },
+                            {
+                                "InstanceType": "i3.2xlarge"
+                            },
+                            {
+                                "InstanceType": "r5a.2xlarge"
+                            },
+                            {
+                                "InstanceType": "r5ad.2xlarge"
+                            }
+                        ]
+                    }
+                },
+                "VPCZoneIdentifier": [
+                    "subnet-###############"
+                ],
+            }
+        },
+        "ASGB": {},
+        "ASGC": {}
+    }
+}
+```
+
+[r5.2xlarge](https://aws.amazon.com/ec2/instance-types/#Memory_Optimized) is the 'base' instance type, with overrides for r5d.2xlarge, i3.2xlarge, r5a.2xlarge and r5ad.2xlarge. 
+
+Note how one Auto Scaling Group is created per Availability Zone, since CA does not currently support ASGs that span multiple Availability Zones. See [Common Notes and Gotchas](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#common-notes-and-gotchas).
diff --git a/cluster-autoscaler/cloudprovider/aws/README.md b/cluster-autoscaler/cloudprovider/aws/README.md
@@ -175,6 +175,42 @@ If you'd like to scale node groups from 0, an `autoscaling:DescribeLaunchConfigu
 }
 ```
 
+## Using AutoScalingGroup MixedInstancesPolicy
+
+It is possible to use Cluster Autoscaler with a [mixed instances policy](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-autoscaling-autoscalinggroup-mixedinstancespolicy.html), to enable diversification across on-demand and spot instances, of multiple instance types in a single ASG. When using spot instances, this increases the likelihood of successfully launching a spot instance to add the desired capacity to the cluster versus a single instance type, which may be in short supply.
+
+Note that the instance types should have the same amount of RAM and number of CPU cores, since this is fundamental to CA's scaling calculations. Using mismatched instances types can produce unintended results.
+
+Additionally, there are other factors which affect scaling, such as node labels. If you are currently using `nodeSelector` with the [beta.kubernetes.io/instance-type](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#interlude-built-in-node-labels) label, you will need to apply a common propagating label to the ASG and use that instead, since the instance-type label can no longer be relied upon. One may also use auto-generated tags such as `aws:cloudformation:stack-name` for this purpose. [Node affinity and anti-affinity](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) are not affected in the same way, since these selectors natively accept multiple values; one must add all the configured instances types to the list of values, for example:
+
+```yaml
+spec:
+  affinity:
+    nodeAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        nodeSelectorTerms:
+        - matchExpressions:
+          - key: beta.kubernetes.io/instance-type
+            operator: In
+            values:
+            - r5.2xlarge
+            - r5d.2xlarge
+            - i3.2xlarge
+            - r5a.2xlarge
+            - r5ad.2xlarge
+```
+
+### Example usage:
+
+* Create a [Launch Template](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-autoscaling-autoscalinggroup-launchtemplate.html) (LT) with an instance type, for example, r5.2xlarge. Consider this the 'base' instance type. Do not define any spot purchase options here.
+* Create an ASG with a MixedInstancesPolicy that refers to the newly-created LT.
+* Set LaunchTemplateOverrides to include the 'base' instance type r5.2xlarge and suitable alternatives, e.g. r5d.2xlarge, i3.2xlarge, r5a.2xlarge and r5ad.2xlarge. Differing processor types and speeds should be evaluated depending on your use-case(s).
+* Set [InstancesDistribution](https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_InstancesDistribution.html) according to your needs.
+* See [Allocation Strategies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-purchase-options.htlm#asg-allocation-strategies) for information about the ASG fulfils capacity from the specified instance types.
+* Repeat by creating other LTs and ASGs, for example c5.18xlarge and c5n.18xlarge or a bunch of similar burstable instances.
+
+See CloudFormation example [here](MixedInstancePolicy.md).
+
 ## Common Notes and Gotchas:
 - The `/etc/ssl/certs/ca-bundle.crt` should exist by default on ec2 instance in your EKS cluster. If you use other cluster privision tools like [kops](https://github.com/kubernetes/kops) with different operating systems other than Amazon Linux 2, please use `/etc/ssl/certs/ca-certificates.crt` or correct path on your host instead for the volume hostPath in your cluster autoscaler manifest.
 - Cluster autoscaler does not support Auto Scaling Groups which span multiple Availability Zones; instead you should use an Auto Scaling Group for each Availability Zone and enable the [--balance-similar-node-groups](../../FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler) feature. If you do use a single Auto Scaling Group that spans multiple Availability Zones you will find that AWS unexpectedly terminates nodes without them being drained because of the [rebalancing feature](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#arch-AutoScalingMultiAZ).