CA does not scale up from zero nodes in group #903

wskinner · 2018-05-29T02:19:28Z

What happened:
Cluster Autoscaler will not scale up from zero nodes. However, it will scale up from one node.
I have a node group whose template includes p2.xlarge GPU instances. With zero running instances in my gpu-nodes node group, I create a new Job that requests 2 pods, each with 1 GPU. The pods are unschedulable, and CA logs show:
I0524 15:30:32.066956 1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"engine", Name:"distributed-job-2xp8n", UID:"34fa9255-5f67-11e8-bede-068abf0075c0", APIVersion:"v1", ResourceVersion:"98300", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)
The pods never get created, and no GPU instances get spun up.

What you expected to happen:
CA should scale up the cluster by adding two p2.xlarge instances to the gpu-nodes group.

How to reproduce it (as minimally and precisely as possible):
In a kops cluster on AWS:

Create a node group which includes p2.xlarge as the instance type.
Create a Job that requests 2 containers, each requesting a single nvidia.com/gpu.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.7", GitCommit:"b30876a5539f09684ff9fde266fda10b37738c9c", GitTreeState:"clean", BuildDate:"2018-01-16T21:52:38Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08
Kernel (e.g. uname -a): Linux ip-172-20-40-189 4.4.115-k8s Fix imports in cluster autoscaler after migrating it from contrib #1 SMP Thu Feb 8 15:37:40 UTC 2018 x86_64 GNU/Linux
Install tools: kops 1.8.1
Others:
CA image: gcr.io/google_containers/cluster-autoscaler:v1.0.5

The text was updated successfully, but these errors were encountered:

bskiba · 2018-05-29T07:40:40Z

@mumoshu I'm not sure how GPUs are handled by scale from 0 in aws, can you help?

mumoshu · 2018-05-29T08:25:41Z

@bskiba Hi. Unfortunately I'm not familiar with the specific feature. But I understand that we need to implement node templates to support it for aws. And according to the documentation it is implemented.

So,

@wskinner Did you add required tags to ASGs that backs your node groups? You need those tags for the scale-from-zero feature to function. Probably this part of the cluster-autoscaler doc helps!

wskinner · 2018-05-29T18:26:45Z

@mumoshu I thought I had added those tags but I hadn't. Now that I have them, I am seeing a different issue. The logs say scale up failed due to insufficient GPUs.

I0529 18:18:13.297024 1 utils.go:432] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop I0529 18:18:13.297494 1 scale_up.go:54] Pod engine/distributed-job-9xd62 is unschedulable I0529 18:18:13.297544 1 scale_up.go:54] Pod engine/distributed-job-4fkjq is unschedulable I0529 18:18:13.553111 1 scale_up.go:146] Scale-up predicate failed: PodFitsResources predicate mismatch, cannot put engine/distributed-job-9xd62 on template-node-for-gpu-nodes.<my-cluster>-5817535016063941893, reason: Insufficient nvidia.com/gpu I0529 18:18:13.553234 1 scale_up.go:146] Scale-up predicate failed: PodFitsResources predicate mismatch, cannot put engine/distributed-job-4fkjq on template-node-for-gpu-nodes.<my-cluster>-5817535016063941893, reason: Insufficient nvidia.com/gpu

The spec for gpu-nodes looks like this:

  image: <my image>
  kubelet:
    featureGates:
      DevicePlugins: "true"
  machineType: p2.xlarge
  maxSize: 4
  minSize: 0
  nodeLabels:
    can-scale-to-zero: "true"
    kops.k8s.io/instancegroup: gpu-nodes
  role: Node
  subnets:
  - us-west-2a

wskinner · 2018-06-01T17:56:51Z

@mumoshu Any idea what's going on here?

mumoshu · 2018-06-05T01:20:06Z

@wskinner Hi. Thanks for the report!

I took some time to read the relevant code, and it turns out the aws provider does seem to support the scale-from-zero feature for gpu nodes. So my guess is that we're still missing something. Would it be possible that your cluster-autoscaler is outdated, hence missing a fix relevant to the feature? Which version of CA are you using?

Basically, cluster-autoscaler needs to build a "node template" which tells how many cores, gpus, and how much memory a node provides. For aws provider we build it by fetching the launch configuration from the relevant asg:

autoscaler/cluster-autoscaler/cloudprovider/aws/aws_manager.go

Lines 201 to 224 in bcb4f9e

    
           func (m *AwsManager) getAsgTemplate(asg *asg) (*asgTemplate, error) { 
        
           	instanceTypeName, err := m.service.getInstanceTypeByLCName(asg.LaunchConfigurationName) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	if len(asg.AvailabilityZones) < 1 { 
        
           		return nil, fmt.Errorf("Unable to get first AvailabilityZone for %s", asg.Name) 
        
           	} 
        
           	az := asg.AvailabilityZones[0] 
        
           	region := az[0 : len(az)-1] 
        
           	if len(asg.AvailabilityZones) > 1 { 
        
           		glog.Warningf("Found multiple availability zones, using %s\n", az) 
        
           	} 
        
           	return &asgTemplate{ 
        
           		InstanceType: InstanceTypes[instanceTypeName], 
        
           		Region:       region, 
        
           		Zone:         az, 
        
           		Tags:         asg.Tags, 
        
           	}, nil 
        
           }

Also:

autoscaler/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go

Line 293 in bcb4f9e

template, err := ng.awsManager.getAsgTemplate(ng.asg)

autoscaler/cluster-autoscaler/cloudprovider/aws/aws_manager.go

Line 243 in bcb4f9e

    
           node.Status.Capacity[gpu.ResourceNvidiaGPU] = *resource.NewQuantity(template.InstanceType.GPU, resource.DecimalSI)

And we do have the correct number of GPUs set for p2.xlarge:

autoscaler/cluster-autoscaler/cloudprovider/aws/ec2_instance_types.go

Lines 546 to 551 in 8225983

    
           "p2.xlarge": { 
        
           	InstanceType: "p2.xlarge", 
        
           	VCPU:         4, 
        
           	MemoryMb:     62464, 
        
           	GPU:          1, 
        
           },

So it should just work if we setup things correctly. But please feel free to ask me anything.

mumoshu · 2018-06-05T01:23:45Z

@wskinner And your CA seems outdated to me

gcr.io/google_containers/cluster-autoscaler:v1.0.5

Can you upgrade it to 1.2.0?

wskinner · 2018-06-05T18:09:49Z

@mumoshu I'm on k8s 1.8, and this language scared me:

We strongly recommend using Cluster Autoscaler with version for which it was meant. We don't do ANY cross version testing so if you put the newest Cluster Autoscaler on an old cluster there is a big chance that it won't work as expected.

The compatibility matrix suggests only the 1.0.x branch is compatible with my Kubernetes version.
I did try 1.2.0 as you recommended, and encountered this:
I0605 18:08:10.826924 1 scale_up.go:59] Pod <podname1> is unschedulable I0605 18:08:10.826977 1 scale_up.go:59] Pod <podname2> is unschedulable I0605 18:08:11.533804 1 scale_up.go:186] No expansion options

mumoshu · 2018-06-07T01:47:55Z

@wskinner I understand your situation. Then my best recommendation is cherry-pick 4eb8391 into CA v1.0.x and build your own docker image from that.

alexnederlof · 2018-06-07T11:33:17Z

I reported the same thing in #929:

I setup a GPU pool, and autoscaler works fine scaling up from 1 to n nodes, but not from 0 to n nodes. The error message is:

I0605 11:27:29.865576       1 scale_up.go:54] Pod default/simple-gpu-test-6f48d9555d-l9822 is unschedulable
I0605 11:27:29.961051       1 scale_up.go:86] Upcoming 0 nodes
I0605 11:27:30.005163       1 scale_up.go:146] Scale-up predicate failed: PodFitsResources predicate mismatch, cannot put default/simple-gpu-test-6f48d9555d-l9822 on template-node-for-gpus.ci.k8s.local-5829202798403814789, reason: Insufficient nvidia.com/gpu
I0605 11:27:30.005262       1 scale_up.go:175] No pod can fit to gpus.ci.k8s.local
I0605 11:27:30.005324       1 scale_up.go:180] No expansion options
I0605 11:27:30.005393       1 static_autoscaler.go:299] Calculating unneeded nodes
I0605 11:27:30.008919       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"simple-gpu-test-6f48d9555d-l9822", UID:"3416d787-68b3-11e8-8e8f-0639a6e973b0", APIVersion:"v1", ResourceVersion:"12429157", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)
I0605 11:27:30.031707       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler

This is on Kubernetes 1.9.6 with autoscaler 1.1.2.

The nodes carry the label kops.k8s.io/instancegroup=gpus, which is also present in the autoscaler group on AWS:

{
            "ResourceType": "auto-scaling-group",
            "ResourceId": "gpus.ci.k8s.local",
            "PropagateAtLaunch": true,
            "Value": "gpus",
            "Key": "k8s.io/cluster-autoscaler/node-template/label/kops.k8s.io/instancegroup"
        },

If I start a node, I see it has the required capacity:

Capacity:
 cpu:             4
 memory:          62884036Ki
 nvidia.com/gpu:  1
 pods:            110

This is the simple deployment I use to test it:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: simple-gpu-test
spec: 
  replicas: 1
  template:
    metadata:
      labels:
        app: "simplegputest"
    spec:
      containers: 
      - name: "nvidia-smi-gpu"
        image: "nvidia/cuda:8.0-cudnn5-runtime"
        resources: 
          limits: 
             nvidia.com/gpu: 1 # requesting 1 GPU
        volumeMounts:
        - mountPath: /usr/local/nvidia
          name: nvidia
        command: [ "/bin/bash", "-c", "--" ]
        args: [ "while true; do nvidia-smi; sleep 5; done;" ]
      volumes:
      - hostPath:
          path: /usr/local/nvidia
        name: nvidia

alexnederlof · 2018-06-07T11:56:11Z

Hmm I'm not a go developer, so I'm not sure how I can patch some of the improvements in 1.2 to 1.1 like the ones from #648

osterman · 2018-08-29T00:44:10Z

Fwiw, after upgrading to 1.2.0 it works for me. Not sure if/what other regressions were introduced, but autoscaling from zero works and I get pods scheduled on the instances.

fejta-bot · 2018-11-27T01:19:06Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-12-27T02:02:20Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-01-26T02:46:39Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-01-26T02:46:47Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

icy · 2019-12-04T11:11:18Z

I'm using autoscaler 1.14.6 (k8s.gcr.io/cluster-autoscaler:v1.14.6), and I get this issue: The autoscaler doesn't scale out when there isn't any node in the group. Error I found

I1204 11:12:56.381397       1 utils.go:237] Pod circleci-2c9db67ab-elasticsearch-5cc75895f9-kmhvs can't be scheduled on eks-euw1-local20191203121554568900000005, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector
I1204 11:12:56.381493       1 utils.go:237] Pod circleci-2c9db67ab-postgres-5d79d447c7-6vtmh can't be scheduled on eks-euw1-local20191203121554568900000005, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector
I1204 11:12:56.381509       1 utils.go:237] Pod circleci-2c9db67ab-redis-848667559b-5z4p8 can't be scheduled on eks-euw1-local20191203121554568900000005, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector

MaciekPytel · 2019-12-04T13:32:43Z

@icy When scaling from 0 nodes CA guesses how a new node would look like and checks if the pending pods would be able to run on this node. In your case the node predicted by CA doesn't have the label requested by pod using nodeSelector or nodeAffinity.
The logic of guessing what labels the first node in a given node group would have is specific to a given cloudprovider and (unless you're using hosted k8s such as GKE) requires some kind of manual tagging of underlying cloudprovider autoscaling group. The details are described in README of each cloudprovider.

Continue E2E cleanup if apiserver is down

bskiba added bug area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider labels May 29, 2018

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed bug labels Jun 5, 2018

alexnederlof mentioned this issue Jun 7, 2018

Autoscaler doesn't recognize nvidia.com/gpu when scaling up from 0 to n nodes on AWS. #929

Closed

alexnederlof mentioned this issue Jun 7, 2018

AWS: Fix nvidia gpu resource name, re-enable ScaleToZero #648

Merged

consideRatio mentioned this issue Oct 25, 2018

WIP: A deployment story - Using GPUs on GKE jupyterhub/zero-to-jupyterhub-k8s#994

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 27, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 27, 2018

k8s-ci-robot closed this as completed Jan 26, 2019

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024

Merge pull request kubernetes#903 from alculquicondor/enhance-e2e

f7d64bc

Continue E2E cleanup if apiserver is down

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA does not scale up from zero nodes in group #903

CA does not scale up from zero nodes in group #903

wskinner commented May 29, 2018

bskiba commented May 29, 2018

mumoshu commented May 29, 2018

wskinner commented May 29, 2018 •

edited

Loading

wskinner commented Jun 1, 2018

mumoshu commented Jun 5, 2018

mumoshu commented Jun 5, 2018

wskinner commented Jun 5, 2018 •

edited

Loading

mumoshu commented Jun 7, 2018

alexnederlof commented Jun 7, 2018

alexnederlof commented Jun 7, 2018

osterman commented Aug 29, 2018

fejta-bot commented Nov 27, 2018

fejta-bot commented Dec 27, 2018

fejta-bot commented Jan 26, 2019

k8s-ci-robot commented Jan 26, 2019

icy commented Dec 4, 2019 •

edited

Loading

MaciekPytel commented Dec 4, 2019

CA does not scale up from zero nodes in group #903

CA does not scale up from zero nodes in group #903

Comments

wskinner commented May 29, 2018

bskiba commented May 29, 2018

mumoshu commented May 29, 2018

wskinner commented May 29, 2018 • edited Loading

wskinner commented Jun 1, 2018

mumoshu commented Jun 5, 2018

mumoshu commented Jun 5, 2018

wskinner commented Jun 5, 2018 • edited Loading

mumoshu commented Jun 7, 2018

alexnederlof commented Jun 7, 2018

alexnederlof commented Jun 7, 2018

osterman commented Aug 29, 2018

fejta-bot commented Nov 27, 2018

fejta-bot commented Dec 27, 2018

fejta-bot commented Jan 26, 2019

k8s-ci-robot commented Jan 26, 2019

icy commented Dec 4, 2019 • edited Loading

MaciekPytel commented Dec 4, 2019

wskinner commented May 29, 2018 •

edited

Loading

wskinner commented Jun 5, 2018 •

edited

Loading

icy commented Dec 4, 2019 •

edited

Loading