Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Can't scale up from 0 #2418

Closed
mgalgs opened this issue Oct 2, 2019 · 32 comments
Closed

AWS: Can't scale up from 0 #2418

mgalgs opened this issue Oct 2, 2019 · 32 comments
Assignees
Labels
area/provider/aws Issues or PRs related to aws provider

Comments

@mgalgs
Copy link
Contributor

mgalgs commented Oct 2, 2019

Possibly related: #1754

I recently added three new node groups to my cluster using AWS spot instances. I initially set the minSize on each of the three new groups to 0, but CA was refusing to scale them up from 0. If I go into the EC2 console and manually force the ASG minSize up to 1 then CA gets unstuck and will continue scaling the group up as new requests come in.

I'm attaching the following files:

  • ca_logs.txt :: At this point I had forced one of my ASGs to have a minSize of 1 and maxSize of 4. That group filled up so CA was unable to scale it up any further. At this point it should have been scaling up the other two node groups, but they still had minSize=0 and thus CA refused to scale them up.
  • ca_logs_after_setting_min.txt :: This is after manually forcing the two other ASGs to have minSize=1. At this point CA starts scaling them up as expected.
  • ca_pod.txt :: Full get pod -o yaml of my CA

Is it not supported to have minSize=0 on AWS?

I'm running CA v1.14.5.

@mgalgs
Copy link
Contributor Author

mgalgs commented Oct 2, 2019

I should also mention that all of our workers do have the ec2:DescribeLaunchTemplateVersions permission. Our workers (including the one running CA) all have the following IAM policy attached:

image (1)

So I think we're satisfying the requirements from Scaling a node group to 0 in the docs.

FWIW, the cluster was created with eksctl v0.5.2 with nodeGroups[].iam.withAddonPolicies.autoScaler = true for all nodegroups.

@chnsh
Copy link

chnsh commented Oct 3, 2019

I'm facing this as well - additionally, I have 2 nodegroups (1 is an on-demand instance on AWS running the autoscaler), the other group is supposed to be a spot group where i want to deploy jobs - it does not scale up from 0.

I'm using autodiscovery feature

@chnsh
Copy link

chnsh commented Oct 3, 2019

so I upgraded to k8s.gcr.io/cluster-autoscaler:v1.16.1 and it did trigger auto scaling, now the problem is that I never see those nodes when I execute kubectl get nodes I do not see the nodes that have been spawned

@mgalgs
Copy link
Contributor Author

mgalgs commented Oct 3, 2019

Interesting. Hopefully if there's a fix for the 1.16 line it can be cherry picked back to 1.14 etc. since the docs recommend matching your CA version with your k8s version (maybe that's why yours isn't working, @chnsh).

@chnsh
Copy link

chnsh commented Oct 4, 2019

aah - possibly, I'll try tomorrow and update the thread.

So, I too am on v1.14.5 now and it did trigger autoscaling for me - that bit works fine, I still don't see those nodes in get nodes

@chnsh
Copy link

chnsh commented Oct 4, 2019

Okay so I am on CA v1.14.5 and I finally got it working - I had to upgrade eksctl from v0.5.2 to 0.6.0 - it scales from 0 now!

@mgalgs
Copy link
Contributor Author

mgalgs commented Oct 7, 2019

Interesting. I just tried eksctl v0.6.0 as well and it's still not scaling up from 0...

As a workaround I guess I'll set the minimum on these guys to 1, but it would be great for one of the devs to take a look at this.

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 8, 2019

@mgalgs Can you share eksctl cluster config? I can help reproduce the issue on our side.
I assume you have one OnDemand instance group to host your CA? You'd like to scale up spot instance group? Did you use node affinity or node selector in your tests? Did you use MixedInstancePolicy for your instance group?

@mgalgs
Copy link
Contributor Author

mgalgs commented Oct 8, 2019

@Jeffwan Sure, here's my config:

cluster.yml
# A simple example of ClusterConfig object:
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: mycluster
  region: us-west-2

vpc: {cidr: 10.42.0.0/16}

# cluster AZs must be set explicitly for single AZ nodegroup example to
# work
# https://github.com/weaveworks/eksctl/blob/c37657c1f4ff55ffed40139cf74aa828b37c2a1b/examples/05-advanced-nodegroups.yaml#L43
availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c"]

# Need separate nodegroups for cluster-autoscaler to work reliably.
# See https://github.com/kubernetes/autoscaler/pull/1802#issuecomment-474295002
nodeGroups:
  - name: ng-2a-ami-038a987c6425a84ad-m5-2xlarge-v2
    taints:
      spotty: "false:PreferNoSchedule"
    instanceType: m5.2xlarge
    availabilityZones: ["us-west-2a"]
    ami: ami-038a987c6425a84ad
    minSize: 1
    maxSize: 15
    privateNetworking: true
    ssh:
      publicKeyName: mykey
    iam:
      withAddonPolicies:
        autoScaler: true
  - name: ng-2b-ami-038a987c6425a84ad-m5-2xlarge-v2
    taints:
      spotty: "false:PreferNoSchedule"
    instanceType: m5.2xlarge
    availabilityZones: ["us-west-2b"]
    ami: ami-038a987c6425a84ad
    minSize: 1
    maxSize: 15
    privateNetworking: true
    ssh:
      publicKeyName: mykey
    iam:
      withAddonPolicies:
        autoScaler: true
  - name: ng-2c-ami-038a987c6425a84ad-m5-2xlarge-v2
    taints:
      spotty: "false:PreferNoSchedule"
    instanceType: m5.2xlarge
    availabilityZones: ["us-west-2c"]
    ami: ami-038a987c6425a84ad
    minSize: 1
    maxSize: 15
    privateNetworking: true
    ssh:
      publicKeyName: mykey
    iam:
      withAddonPolicies:
        autoScaler: true
  - name: ng-2a-ami-038a987c6425a84ad-spotty-v1
    labels:
      spotty: "true"
    taints:
      spotty: "true:NoSchedule"
    availabilityZones: ["us-west-2a"]
    ami: ami-038a987c6425a84ad
    minSize: 1
    maxSize: 10
    privateNetworking: true
    ssh:
      publicKeyName: mykey
    iam:
      withAddonPolicies:
        autoScaler: true
    instancesDistribution:
      maxPrice: 0.2
      instanceTypes: ["m4.2xlarge", "m5.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0  # no on-demand! only spot!
      spotInstancePools: 2  # use the 2 lowest price spot pools
  - name: ng-2b-ami-038a987c6425a84ad-spotty-v1
    labels:
      spotty: "true"
    taints:
      spotty: "true:NoSchedule"
    availabilityZones: ["us-west-2b"]
    ami: ami-038a987c6425a84ad
    minSize: 1
    maxSize: 10
    privateNetworking: true
    ssh:
      publicKeyName: mykey
    iam:
      withAddonPolicies:
        autoScaler: true
    instancesDistribution:
      maxPrice: 0.2
      instanceTypes: ["m4.2xlarge", "m5.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0  # no on-demand! only spot!
      spotInstancePools: 2  # use the 2 lowest price spot pools
  - name: ng-2c-ami-038a987c6425a84ad-spotty-v1
    labels:
      spotty: "true"
    taints:
      spotty: "true:NoSchedule"
    availabilityZones: ["us-west-2c"]
    ami: ami-038a987c6425a84ad
    minSize: 1
    maxSize: 10
    privateNetworking: true
    ssh:
      publicKeyName: mykey
    iam:
      withAddonPolicies:
        autoScaler: true
    instancesDistribution:
      maxPrice: 0.2
      instanceTypes: ["m4.2xlarge", "m5.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0  # no on-demand! only spot!
      spotInstancePools: 2  # use the 2 lowest price spot pools

I assume you have one OnDemand instance group to host your CA?

I have three OnDemand instance groups, each with a handful of instances inside, and yes, CA is hosted there.

You'd like to scale up spot instance group?

Yes

Did you use node affinity or node selector in your tests?

I tainted the spot nodes and added tolerations to workloads that can run on spots. It's all working as expected (spot-tolerant workloads are scheduled on spot nodes, non-spot-tolerant workloads avoid spot nodes) as long as I set the minSize of the group to 1.

Did you use MixedInstancePolicy for your instance group?

I believe eksctl uses that under the hood, yes. The resulting groups do seem to be using the feature:

image

@faheem-nadeem
Copy link

I get a similar error with kops provisioning mixed instance groups. I am scaling from zero.

Cluster-autoscaler: v1.14.5
kops: 1.14.0

Error:
Unable to build proper template node for <masked_asg_name>: Unable to get instance type from launch config or launch template

Kops instance group config:

kind: InstanceGroup
metadata:
  creationTimestamp: 2019-10-09T23:43:07Z
  generation: 4
  labels:
    kops.k8s.io/cluster: <masked_cluster_name>
  name: nodes-us-east-1a-gp-mix
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: "true"
    kubernetes.io/cluster/<masked_cluster_name>: "true"
  image: kope.io/k8s-1.14-debian-stretch-amd64-hvm-ebs-2019-08-16
  machineType: t3a.xlarge
  maxSize: 2
  minSize: 0
  mixedInstancesPolicy:
    instances:
    - t3a.xlarge
    - m5a.xlarge
    onDemandAboveBase: 50
    spotInstancePools: 2
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-us-east-1a-gp-mix
  role: Node
  rootVolumeSize: 50
  rootVolumeType: gp2
  subnets:
  - us-east-1a

IAM role policy attached to cluster autoscaler:

    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeTags",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions"
            ],
            "Resource": "*"
        }
    ]
}```

If I can help with debugging or providing any further logs / configurations to resolve this, please let me know :) 

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 11, 2019

@mgalgs
I did one test and it can scale up from 0 with your cluster spec. I use on ASG with onDemand and the other one with Spot.

Checking your logs, did you use any node selectors? Seems it fails on GeneralPredicates.
If you use node selector, you need to tag your ASG to let CA know the node labels since there's 0 node available to be used as template

I1002 20:08:23.584696       1 scale_up.go:411] No pod can fit to eksctl-streks3-nodegroup-ng-2b-ami-038a987c6425a84ad-spotty-v1-NodeGroup-GK80456OQZIA
I1002 20:08:23.584720       1 utils.go:237] Pod str-debug-pod-mgalgs-1570046860-579cb979f4-bgh25 can't be scheduled on eksctl-streks3-nodegroup-ng-2c-ami-038a987c6425a84ad-m5-2xlarge-NodeGroup-1HAI8RHKZ1X45, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector
I1002 20:08:23.584738       1 scale_up.go:411] No pod can fit to eksctl-streks3-nodegroup-ng-2c-ami-038a987c6425a84ad-m5-2xlarge-NodeGroup-1HAI8RHKZ1X45
I1002 20:08:23.584762       1 utils.go:237] Pod str-debug-pod-mgalgs-1570046860-579cb979f4-bgh25 can't be scheduled on eksctl-streks3-nodegroup-ng-2c-ami-038a987c6425a84ad-spotty-v1-NodeGroup-S9U82ABUOL3M, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 11, 2019

@faheem-cliqz I probably already fix the issue you meet. Please check this 58f3f23#diff-ade7b95627ea0dd6b6f4deee7f24fa7eR323-R331

We will have a release next week

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 11, 2019

/assign @Jeffwan

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 11, 2019

/area provider/aws

@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Oct 11, 2019
@mgalgs
Copy link
Contributor Author

mgalgs commented Oct 11, 2019

@Jeffwan
Hmm, if it was a nodeSelector problem why does it work fine if I put a minSize of 1 on the groups? Wouldn't it still refuse to schedule on that group if it was a nodeSelector problem?

Regarding the logs:

Checking your logs, did you use any node selectors? Seems it fails on GeneralPredicates.
If you use node selector, you need to tag your ASG to let CA know the node labels since there's 0 node available to be used as template

I1002 20:08:23.584696       1 scale_up.go:411] No pod can fit to eksctl-streks3-nodegroup-ng-2b-ami-038a987c6425a84ad-spotty-v1-NodeGroup-GK80456OQZIA
I1002 20:08:23.584720       1 utils.go:237] Pod str-debug-pod-mgalgs-1570046860-579cb979f4-bgh25 can't be scheduled on eksctl-streks3-nodegroup-ng-2c-ami-038a987c6425a84ad-m5-2xlarge-NodeGroup-1HAI8RHKZ1X45, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector

This one is expected since I had a nodeSelector on this pod to force it onto a node from the spot group (and this isn't the spot group).

I1002 20:08:23.584738       1 scale_up.go:411] No pod can fit to eksctl-streks3-nodegroup-ng-2c-ami-038a987c6425a84ad-m5-2xlarge-NodeGroup-1HAI8RHKZ1X45
I1002 20:08:23.584762       1 utils.go:237] Pod str-debug-pod-mgalgs-1570046860-579cb979f4-bgh25 can't be scheduled on eksctl-streks3-nodegroup-ng-2c-ami-038a987c6425a84ad-spotty-v1-NodeGroup-S9U82ABUOL3M, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector

This one is strange because that nodeGroup should have had the necessary labels to allow that pod (with its nodeSelector) to be scheduled on that node... Again, I don't see this problem when minSize of the group is set to 1, and if this was a nodeSelector problem it seems like I'd still have an issue scheduling the pod...

Are you testing with 1.14? If this is definitely fixed in 1.15 it might not even be worth troubleshooting here since we have a workaround (setting the group's minSize to 1).

@faheem-nadeem
Copy link

@faheem-cliqz I probably already fix the issue you meet. Please check this 58f3f23#diff-ade7b95627ea0dd6b6f4deee7f24fa7eR323-R331

We will have a release next week

Cool will get back to you with updates once you guys releases :)

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 12, 2019

@mgalgs
The difference between two is

  1. scale from 0 - CA build template from ASG LaunchTemplate or LaunchConfiguration, it won't know some of the kubernetes node labels. In order to properly build it, we need to add ASG tags and CA will convert tags to labels to construct the node template.
  2. scale from 1 - Since node is already there, CA will use real node as a template.

Do you have tag in your ASG? check here https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#scaling-a-node-group-to-0

If you still have the issues, I will try to see anything wrong in 1.14.

@mgalgs
Copy link
Contributor Author

mgalgs commented Oct 24, 2019

Do you have tag in your ASG?

I see... So node labels and taints need to be applied to the ASGs themselves as well. Looking at my aws autoscaling describe-tags output it appears that my ASGs were not tagged with corresponding tags for labels and taints. If anything it sounds like this might be a bug in eksctl. I'll file an issue over there.

@mgalgs mgalgs closed this as completed Oct 24, 2019
@mgalgs
Copy link
Contributor Author

mgalgs commented Oct 24, 2019

doh... This has already been raised on the eksctl project with a solution proposed.

Thank you for your help!

mgalgs added a commit to mgalgs/eksctl that referenced this issue Oct 24, 2019
When the cluster-autoscaler adds a new node to a group, it grabs an
existing node in the group and builds a "template" to launch a new node
identical to the one it grabbed from the group.

However, when scaling up from 0 there aren't any live nodes to reference to
build this template.  Instead, the cluster-autoscaler relies on tags in the
ASG to build the new node template.  This can cause unexpected behavior if
the pods triggering the scale-out are using node selectors or taints; CA
doesn't have sufficient information to decide if a new node launched in the
group will satisfy the request.

The long and short of it is that for CA to do its job properly we must tag
our ASGs corresponding to our labels and taints.  Add a note in the docs
about this since scaling up from 0 is a fairly common use case.

References:

  - kubernetes/autoscaler#2418
  - eksctl-io#1066
mgalgs added a commit to mgalgs/eksctl that referenced this issue Oct 24, 2019
When the cluster-autoscaler adds a new node to a group, it grabs an
existing node in the group and builds a "template" to launch a new node
identical to the one it grabbed from the group.

However, when scaling up from 0 there aren't any live nodes to reference to
build this template.  Instead, the cluster-autoscaler relies on tags in the
ASG to build the new node template.  This can cause unexpected behavior if
the pods triggering the scale-out are using node selectors or taints; CA
doesn't have sufficient information to decide if a new node launched in the
group will satisfy the request.

The long and short of it is that for CA to do its job properly we must tag
our ASGs corresponding to our labels and taints.  Add a note in the docs
about this since scaling up from 0 is a fairly common use case.

References:

  - kubernetes/autoscaler#2418
  - eksctl-io#1066
@d-baranowski
Copy link

I can't get this working in k8s.gcr.io/cluster-autoscaler :(

@onprema
Copy link

onprema commented Dec 5, 2019

I can't get this working in k8s.gcr.io/cluster-autoscaler :(

What autoscaler version are you using, and what eksctl version?

@d-baranowski
Copy link

Autoscaler v1.13.8. We don't use eksctl. We've manage all the infra using Terraform.
Info from eks console: Kubernetes version 1.13 Platform version eks.6.

Let me know if there is any additional info I can provide to help

@onprema
Copy link

onprema commented Dec 5, 2019

Autoscaler v1.13.8. We don't use eksctl. We've manage all the infra using Terraform.
Info from eks console: Kubernetes version 1.13 Platform version eks.6.

Let me know if there is any additional info I can provide to help

You may want to try to upgrade CA to 1.15

@Jeffwan
Copy link
Contributor

Jeffwan commented Dec 26, 2019

Scale up from 0 need tags on the ASG for CA the get template node. Could someone with problem share ASG settings?

@d-baranowski
Copy link

image
I presume you're after the tags. Let me know what other settings you'd like to see

@jvaibhav123
Copy link

jvaibhav123 commented Apr 29, 2020

Hi,

I have CA version 1.15.6 running my kubeadm cluster. The ASG is tagged correctly like below

k8s.io/cluster-autoscaler/node-template/label/nodeclass | spot
k8s.io/cluster-autoscaler/node-template/taint/spotenabled | 'false':NoSchedule
k8s.io/cluster-autoscaler/node-template/label/asg_type | small
node-role.kubernetes.io/nodeclass | spot

CA is configured with autodiscover ASG.

I have sample deployment like below

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: nginx
    nodeclass: spot
    project: test1
  name: nginx-asg-test
  namespace: test-monitoring
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: nginx
        nodeclass: spot
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/nodeclass
                operator: In
                values:
                - spot
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: nginx
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 200m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      nodeSelector:
        asg_type: small
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: spotenabled
        operator: Equal
        value: "false"

However CA doesnt scale up the ASG from 0. The pod always remain in pending status with below message.

Normal   NotTriggerScaleUp  8m13s (x8 over 12m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 7 node(s) didn't match node selector, 9 node(s) had taints that the pod didn't tolerate
  Normal   NotTriggerScaleUp  2m9s (x43 over 12m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 9 node(s) had taints that the pod didn't tolerate, 7 node(s) didn't match node selector
  Warning  FailedScheduling   69s (x18 over 12m)   default-scheduler   0/10 nodes are available: 10 node(s) didn't match node selector.

The CA logs show below

I0428 18:41:21.656806       1 utils.go:229] Pod nginx-asg-test-679df8c4f5-x4b6h can't be scheduled on dt-ue2-test-opa-resources-t3.medium-20200428T2358-0, predicate failed: PodToleratesNodeTaints predicate mismatch, reason: node(s) had taints that the pod didn't tolerate, taints on node: []v1.Taint{v1.Taint{Key:"spotenabled", Value:"'false'", Effect:"NoSchedule", TimeAdded:(*v1.Time)(nil)}}
I0428 18:41:21.656810       1 utils.go:221] Pod nginx-asg-test-679df8c4f5-nfq2x can't be scheduled on dt-ue2-test-opa-resources-t3.medium-20200428T2358-0. Used cached predicate check results

Toleration exists in deployment and taints are applied at ASG . I am not sure whats missing. Could you please help here?

@Jeffwan
Copy link
Contributor

Jeffwan commented Apr 29, 2020

@jvaibhav123

node-role.kubernetes.io/nodeclass this is your label key but seems you tag ASG use key nodeclas. Can you change that and try again?

@jvaibhav123
Copy link

Hi @Jeffwan Thanks for your response. I did try this however got similar results.

affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: nodeclass
                operator: In
                values:
                - spot

@jvaibhav123
Copy link

jvaibhav123 commented Apr 29, 2020

@Jeffwan This has been tested against 1.15.10 k8s version. We also tried with older version of k8s and got similar issue. The observation is once i keep minsize 1 then it did find the correct node and deploy. Latar on if i reduced minsize 0, it works as expected. However our problem is we do not want to keep minsize as 1 initially. Could you please help us here.

@Jeffwan
Copy link
Contributor

Jeffwan commented Apr 30, 2020

@jvaibhav123 Do you any any other restrictions on the pod? Does it request other resources?

@jvaibhav123
Copy link

@Jeffwan No there are no other restrictions. The example which has given above the actual use case except the image was different related to our application. Rest of the parameters are same.

@agconti
Copy link

agconti commented Jul 23, 2020

@mgalgs thanks for documenting the solution to this. I just ran into this and you saved my day! 💖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider
Projects
None yet
Development

No branches or pull requests

9 participants