Cluster Autoscaling Quickstart #881

qwinkler · 2020-04-26T09:07:20Z

Hello guys. Thanks for such a great project!

As I understood, it is possible to integrate the cluster-autoscaler with machine-controller. It there any guide about how to do it?

First of all, I created the cluster using this quickstart.
Then I installed the cluster-autoscaler with clusterapi provider:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: autoscaler-cluster-api
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: autoscaler-cluster-api
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: autoscaler-cluster-api
  namespace: default
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: "autoscaler"
  name: autoscaler-cluster-api
spec:
  ports:
    - port: 8085
      protocol: TCP
      targetPort: 8085
      name: http
  selector:
    app: "autoscaler"
  type: "ClusterIP"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: "autoscaler"
  name: autoscaler-cluster-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: "autoscaler"
  template:
    metadata:
      labels:
        app: "autoscaler"
    spec:
      containers:
        - name: autoscaler
          image: "k8s.gcr.io/cluster-autoscaler:v1.18.0"
          imagePullPolicy: "IfNotPresent"
          command:
            - ./cluster-autoscaler
            - --cloud-provider=clusterapi
            - --namespace=default
            - --logtostderr=true
            - --stderrthreshold=info
            - --v=4
          livenessProbe:
            httpGet:
              path: /health-check
              port: 8085
          ports:
            - containerPort: 8085
      serviceAccountName: autoscaler-cluster-api

Also I created some pods and they are in the Pending state, because there is no available worker nodes, so it cannot be scheduled. I did it to test the autoscaler. As there is no available node, then the autoscaler should create them. The MachineDeployment that I created with these annotations ignoring my annotations. It just created 1 worker node and that's it. Here is my MachineDeployment:

apiVersion: "cluster.k8s.io/v1alpha1"
kind: MachineDeployment
annotations:
  cluster.k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
  cluster.k8s.io/cluster-api-autoscaler-node-group-max-size: "3"
metadata:
  name: test-worker
  namespace: kube-system
spec:
  paused: false
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  minReadySeconds: 0
  selector:
    matchLabels:
      foo: bar
  template:
    metadata:
      labels:
        foo: bar
    spec:
      providerSpec:
        value:
          sshPublicKeys:
            - "my_ssh_key.pub here"
          cloudProvider: "hetzner"
          cloudProviderSpec:
            token:
              secretKeyRef:
                namespace: kube-system
                name: cloud-provider-credentials
                key: HZ_TOKEN
            serverType: "cx11"
            networks:
              - "network_created_in_tutorial"
          operatingSystem: "ubuntu"
          operatingSystemSpec:
            distUpgradeOnBoot: false
      versions:
        kubelet: "v1.16.1"

Here is the cluster autoscaler logs:

I0426 07:55:24.738536       1 reflector.go:211] Listing and watching *v1.CSINode from k8s.io/client-go/informers/factory.go:135
E0426 07:55:24.741292       1 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSINode: the server could not find the requested resource
I0426 07:55:24.908025       1 reflector.go:211] Listing and watching *unstructured.Unstructured from k8s.io/client-go/dynamic/dynamicinformer/informer.go:91
E0426 07:55:24.910140       1 reflector.go:178] k8s.io/client-go/dynamic/dynamicinformer/informer.go:91: Failed to list *unstructured.Unstructured: the server could not find the requested resource
I0426 07:55:28.712184       1 reflector.go:211] Listing and watching *unstructured.Unstructured from k8s.io/client-go/dynamic/dynamicinformer/informer.go:91
E0426 07:55:28.714695       1 reflector.go:178] k8s.io/client-go/dynamic/dynamicinformer/informer.go:91: Failed to list *unstructured.Unstructured: the server could not find the requested resource
I0426 07:55:28.892196       1 reflector.go:211] Listing and watching *v1.CSINode from k8s.io/client-go/informers/factory.go:135
E0426 07:55:28.894531       1 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSINode: the server could not find the requested resource
I0426 07:55:30.304634       1 reflector.go:211] Listing and watching *unstructured.Unstructured from k8s.io/client-go/dynamic/dynamicinformer/informer.go:91
E0426 07:55:30.309674       1 reflector.go:178] k8s.io/client-go/dynamic/dynamicinformer/informer.go:91: Failed to list *unstructured.Unstructured: the server could not find the requested resource
I0426 07:55:30.518605       1 reflector.go:211] Listing and watching *unstructured.Unstructured from k8s.io/client-go/dynamic/dynamicinformer/informer.go:91
E0426 07:55:30.522108       1 reflector.go:178] k8s.io/client-go/dynamic/dynamicinformer/informer.go:91: Failed to list *unstructured.Unstructured: the server could not find the requested resource

What am I doing wrong? Maybe I need to reconfigure something?

The text was updated successfully, but these errors were encountered:

kron4eg · 2020-04-26T11:31:31Z

Hi,

IIRC cluster.k8s.io/cluster-api-autoscaler-node-group-min-size annotation can not be < 1: https://github.com/kubernetes/autoscaler/blob/972e30a5d9eece175a54fa5dfc0ed902b34f02b1/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_utils.go#L92-L94
As additional debug set please increase --v=4 in cluster-autoscaler deployment, and probably move it to the kube-system from default namespace.

qwinkler · 2020-04-26T20:17:13Z

Thank you for your help. I increased the debug level in the cluster-autoscaler (--v=7) and moved it to the kube-system. Also I increased the minimum node group size to 1. The behaviour is still the same.

I found out, that autoscaler is looking for the cluster.x-k8s.io/v1alpha2 api, while the machine-controller is using cluster.k8s.io/v1alpha1:

I0426 20:11:52.246854       1 reflector.go:211] Listing and watching *unstructured.Unstructured from k8s.io/client-go/dynamic/dynamicinformer/informer.go:91
I0426 20:11:52.247234       1 round_trippers.go:420] GET https://10.96.0.1:443/apis/cluster.x-k8s.io/v1alpha2/machinedeployments?limit=500&resourceVersion=0
I0426 20:11:52.247276       1 round_trippers.go:427] Request Headers:
I0426 20:11:52.247295       1 round_trippers.go:431]     Accept: application/json
I0426 20:11:52.247308       1 round_trippers.go:431]     User-Agent: cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format
I0426 20:11:52.247323       1 round_trippers.go:431]     Authorization: Bearer <masked>
I0426 20:11:52.251500       1 round_trippers.go:446] Response Status: 404 Not Found in 4 milliseconds
E0426 20:11:52.251699       1 reflector.go:178] k8s.io/client-go/dynamic/dynamicinformer/informer.go:91: Failed to list *unstructured.Unstructured: the server could not find the requested resource

The rest of the logs looks okay:

I0426 20:11:53.410020       1 round_trippers.go:420] GET https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cluster-autoscaler
I0426 20:11:53.410092       1 round_trippers.go:427] Request Headers:
I0426 20:11:53.410110       1 round_trippers.go:431]     User-Agent: cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format
I0426 20:11:53.410128       1 round_trippers.go:431]     Authorization: Bearer <masked>
I0426 20:11:53.410139       1 round_trippers.go:431]     Accept: application/json, */*
I0426 20:11:53.414364       1 round_trippers.go:446] Response Status: 200 OK in 4 milliseconds
I0426 20:11:53.414765       1 round_trippers.go:420] PUT https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cluster-autoscaler
I0426 20:11:53.414786       1 round_trippers.go:427] Request Headers:
I0426 20:11:53.414795       1 round_trippers.go:431]     User-Agent: cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format
I0426 20:11:53.414804       1 round_trippers.go:431]     Authorization: Bearer <masked>
I0426 20:11:53.414810       1 round_trippers.go:431]     Content-Type: application/json
I0426 20:11:53.414817       1 round_trippers.go:431]     Accept: application/json, */*
I0426 20:11:53.418191       1 round_trippers.go:446] Response Status: 200 OK in 3 milliseconds
I0426 20:11:53.418434       1 leaderelection.go:272] successfully renewed lease kube-system/cluster-autoscaler
I0426 20:11:54.806481       1 pathrecorder.go:240] cluster-autoscaler: "/health-check" satisfied by exact match
I0426 20:11:55.418854       1 round_trippers.go:420] GET https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cluster-autoscaler
I0426 20:11:55.418913       1 round_trippers.go:427] Request Headers:
I0426 20:11:55.418929       1 round_trippers.go:431]     Accept: application/json, */*
I0426 20:11:55.418941       1 round_trippers.go:431]     User-Agent: cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format
I0426 20:11:55.418956       1 round_trippers.go:431]     Authorization: Bearer <masked>
I0426 20:11:55.425058       1 round_trippers.go:446] Response Status: 200 OK in 6 milliseconds
I0426 20:11:55.425542       1 round_trippers.go:420] PUT https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cluster-autoscaler
I0426 20:11:55.425562       1 round_trippers.go:427] Request Headers:
I0426 20:11:55.425575       1 round_trippers.go:431]     Accept: application/json, */*
I0426 20:11:55.425589       1 round_trippers.go:431]     Authorization: Bearer <masked>
I0426 20:11:55.425601       1 round_trippers.go:431]     User-Agent: cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format
I0426 20:11:55.425612       1 round_trippers.go:431]     Content-Type: application/json
I0426 20:11:55.429767       1 round_trippers.go:446] Response Status: 200 OK in 4 milliseconds
I0426 20:11:55.429995       1 leaderelection.go:272] successfully renewed lease kube-system/cluster-autoscaler

kron4eg · 2020-04-26T22:40:53Z

By looking at this:
https://github.com/kubernetes/autoscaler/blob/972e30a5d9eece175a54fa5dfc0ed902b34f02b1/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go#L44-L46

I think you need to set CAPI_GROUP=cluster.k8s.io environment variable in cluster-autoscaler deployment.

qwinkler · 2020-04-27T06:21:21Z

@kron4eg Thanks! I didn't found it, because this was added after the v1.18.0 release. I have to build the image myself from the master branch.

I faced with other problem:

I0427 06:14:18.043715       1 scale_up.go:326] Pod default/testpod-15 is unschedulable
I0427 06:14:18.044563       1 scale_up.go:364] Upcoming 0 nodes
I0427 06:14:18.044623       1 scale_up.go:441] No expansion options

Also, I found the strange logs:

I0427 06:14:18.052827       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"testpod-15", UID:"32a02e16-6396-4bd3-978f-06f781fdd94a", APIVersion:"v1", ResourceVersion:"17940", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added):

It is strange, because here is the pod's limits:

    Limits:
      cpu:     200m
      memory:  256Mi
    Requests:
      cpu:        200m
      memory:     256Mi

And I choose the cx31 server type (2 VCPU, 8 RAM)

And this is the only 1 Pending pod in all cluster.

UPD: I tried to manually scale the nodes. Scaling down do not work too:

I0427 08:31:14.565562       1 pre_filtering_processor.go:57] Skipping sm-control-plane-2 - no node group config
I0427 08:31:14.565662       1 pre_filtering_processor.go:57] Skipping sm-control-plane-3 - no node group config
I0427 08:31:14.565889       1 pre_filtering_processor.go:57] Skipping sm-pool1-86c4c676b7-mxh5p - no node group config
I0427 08:31:14.566061       1 pre_filtering_processor.go:57] Skipping sm-test-5546dff48b-vb88h - no node group config
I0427 08:31:14.566236       1 pre_filtering_processor.go:57] Skipping sm-test-5546dff48b-vnlgb - no node group config
I0427 08:31:14.566375       1 pre_filtering_processor.go:57] Skipping sm-test-5546dff48b-lmpjx - no node group config
I0427 08:31:14.566523       1 pre_filtering_processor.go:57] Skipping sm-test-5546dff48b-7ql6c - no node group config
I0427 08:31:14.566557       1 pre_filtering_processor.go:57] Skipping sm-control-plane-1 - no node group config
I0427 08:31:14.566614       1 static_autoscaler.go:500] Scale down status: unneededOnly=false lastScaleUpTime=2020-04-27 06:14:08.038348682 +0000 UTC m=+18.287526670 lastScaleDownDeleteTime=2020-04-27 06:
14:08.038348923 +0000 UTC m=+18.287526907 lastScaleDownFailTime=2020-04-27 06:14:08.038349147 +0000 UTC m=+18.287527130 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I0427 08:31:14.566701       1 static_autoscaler.go:513] Starting scale down
I0427 08:31:14.566948       1 scale_down.go:867] No candidates for scale down

kron4eg · 2020-04-27T13:58:08Z

To tell you the truth, I'm not really sure why it doesn't work, we are yet to see / testdrive the integration, see #391.

qwinkler · 2020-04-28T09:33:42Z

@kron4eg After some debugging, I found, that I created the wrong annotations for my MachineDeployment 🤦
I fixed it and now it works like a charm!

To make autoscaler work you will need:
Build your own docker image, because these changes weren't released yet (https://github.com/kubernetes/autoscaler/blob/972e30a5d9eece175a54fa5dfc0ed902b34f02b1/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go#L44-L46):

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/cluster-autoscaler
make build-in-docker && make make-image
docker tag staging-k8s.gcr.io/cluster-autoscaler:dev username/cluster-autoscaler:tag
docker push username/cluster-autoscaler:tag

Set --cloud-provider flag and CAPI_GROUP in the cluster-autoscaler deployment. Example:

   spec:
      containers:
        - image: "username/cluster-autoscaler:tag"
          command:
            - ./cluster-autoscaler
            - --cloud-provider=clusterapi
            - --namespace=kube-system
            - --logtostderr=true
            - --stderrthreshold=info
            - --v=4
          env:
            - name: CAPI_GROUP
              value: "cluster.k8s.io"

Create the new node with correct annotations:

apiVersion: "cluster.k8s.io/v1alpha1"
kind: MachineDeployment
metadata:
  name: autoscaling-pool
  namespace: kube-system
  annotations:
    cluster.k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
    cluster.k8s.io/cluster-api-autoscaler-node-group-max-size: "3"

kron4eg · 2020-04-28T09:36:29Z

oh... hehe. thanks for the update @smoulderme !

qwinkler added the triage/support Indicates an issue that is a support question. label Apr 26, 2020

qwinkler closed this as completed Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Autoscaling Quickstart #881

Cluster Autoscaling Quickstart #881

qwinkler commented Apr 26, 2020

kron4eg commented Apr 26, 2020 •

edited

Loading

qwinkler commented Apr 26, 2020

kron4eg commented Apr 26, 2020

qwinkler commented Apr 27, 2020 •

edited

Loading

kron4eg commented Apr 27, 2020

qwinkler commented Apr 28, 2020

kron4eg commented Apr 28, 2020

Cluster Autoscaling Quickstart #881

Cluster Autoscaling Quickstart #881

Comments

qwinkler commented Apr 26, 2020

kron4eg commented Apr 26, 2020 • edited Loading

qwinkler commented Apr 26, 2020

kron4eg commented Apr 26, 2020

qwinkler commented Apr 27, 2020 • edited Loading

kron4eg commented Apr 27, 2020

qwinkler commented Apr 28, 2020

kron4eg commented Apr 28, 2020

kron4eg commented Apr 26, 2020 •

edited

Loading

qwinkler commented Apr 27, 2020 •

edited

Loading