WIP: cluster autoscaler integration with machine API

kubernetes · Sep 9, 2018 · 46bd3ac · 46bd3ac
1 parent c86d0ab
commit 46bd3ac
Showing 1 changed file with 273 additions and 0 deletions.
diff --git a/keps/sig-autoscaling/0000-kep-template.md b/keps/sig-autoscaling/0000-kep-template.md
@@ -0,0 +1,273 @@
+---
+kep-number: 0
+title: My First KEP
+authors:
+  - "@janedoe"
+owning-sig: sig-xxx
+participating-sigs:
+  - sig-aaa
+  - sig-bbb
+reviewers:
+  - TBD
+  - "@alicedoe"
+approvers:
+  - TBD
+  - "@oscardoe"
+editor: TBD
+creation-date: yyyy-mm-dd
+last-updated: yyyy-mm-dd
+status: provisional
+see-also:
+  - KEP-1
+  - KEP-2
+replaces:
+  - KEP-3
+superseded-by:
+  - KEP-100
+---
+
+# Integrating cluster autoscaler (CA) with cluster-api
+
+The *filename* for the KEP should include the KEP number along with the title.
+The title should be lowercased and spaces/punctuation should be replaced with `-`.
+As the KEP is approved and an official KEP number is allocated, the file should be renamed.
+
+To get started with this template:
+1. **Pick a hosting SIG.**
+  Make sure that the problem space is something the SIG is interested in taking up.
+  KEPs should not be checked in without a sponsoring SIG.
+1. **Allocate a KEP number.**
+  Do this by (a) taking the next number in the `NEXT_KEP_NUMBER` file and (b) incrementing that number.
+  Include the updated `NEXT_KEP_NUMBER` file in your PR.
+1. **Make a copy of this template.**
+  Name it `NNNN-YYYYMMDD-my-title.md` where `NNNN` is the KEP number that was allocated.
+1. **Fill out the "overview" sections.**
+  This includes the Summary and Motivation sections.
+  These should be easy if you've preflighted the idea of the KEP with the appropriate SIG.
+1. **Create a PR.**
+  Assign it to folks in the SIG that are sponsoring this process.
+1. **Merge early.**
+  Avoid getting hung up on specific details and instead aim to get the goal of the KEP merged quickly.
+  The best way to do this is to just start with the "Overview" sections and fill out details incrementally in follow on PRs.
+  View anything marked as a `provisional` as a working document and subject to change.
+  Aim for single topic PRs to keep discussions focused.
+  If you disagree with what is already in a document, open a new PR with suggested changes.
+
+The canonical place for the latest set of instructions (and the likely source of this file) is [here](/keps/0000-kep-template.md).
+
+The `Metadata` section above is intended to support the creation of tooling around the KEP process.
+This will be a YAML section that is fenced as a code block.
+See the KEP process for details on each of these items.
+
+## Table of Contents
+
+TPB
+
+## Summary
+
+TPB
+
+## Motivation
+
+- delegate the responsibility of managing cloud providers out of the cluster autoscaler
+- in general, every tool has its own implementation of the cloud provider layer to communicate with cloud provider API (logic duplicated) => cluster-api project
+
+
+
+This section is for explicitly listing the motivation, goals and non-goals of this KEP.
+Describe why the change is important and the benefits to users.
+The motivation section can optionally provide links to [experience reports][] to demonstrate the interest in a KEP within the wider Kubernetes community.
+
+[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
+
+### Goals
+
+* cluster autoscaler is capable of scaling the cluster through machine API
+
+### Non-Goals
+
+* cluster autoscaler is able to autoprovision new node groups through machine API
+* cluster autoscaler is able to estimate node resource requirements through machine API
+* cluster autoscaler is able to use at least one pricing model on top of the machine API
+
+## Proposal
+
+Generalize the concept of node group autoscaling on the level of cloud provider
+by integrating with machine API layer of [sigs.k8s.io/cluster-api](https://github.com/kubernetes-sigs/cluster-api) project to
+build cloud provider-free implementation of cluster autoscaling mechanism
+([#1050](https://github.com/kubernetes/autoscaler/issues/1050)).
+
+Also suggest how to:
+* perform node autoprovision on the level of the machine API ([node autoprovisioning
+  upstream proposal](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/node_autoprovisioning.md)).
+* perform estimation of cluster autoscaling on the level of the machine API.
+* build various pricing models on top of the machine API.
+
+Cluster autoscaling builds on top of concept of node groups which effectively
+translates to cloud provider specific autoscaling groups, resp. autoscaling sets.
+In case of the machine API, the node group translates into machine set (living
+as native Kubernetes object).
+
+### Node group (scaling group)
+
+The autoscaler currently supports `AWS`, `Azure`, `GCE`, `GKE` providers.
+With `kubemark` as other cloud provider implementation that allows to create
+hollow kubernetes cluster with huge number of nodes over a small subset of real
+machines (for scaling testing purposes).
+Each of the currently supported cloud providers has its own concept of scaling groups:
+- AWS has autoscaling groups
+- Azure has autoscaling sets
+- GCE/GKE have autoscaling groups
+
+Autoscaling group allows to flexibly and natively increase/decrease number of instances
+within cloud provider based on actual cluster workload.
+
+TODO(jchaloup): describe advantages of auto scaling groups, why they are beneficial, etc.
+
+### Machine API
+
+Machine API (as part of the [cluster-api project](https://github.com/kubernetes-sigs/cluster-api/))
+allows to declaratively specify a machine without saying how to provision underlying instance.
+Machines can be grouped into a machine set that allows to specify a machine template
+and by setting a number replicas create required number of machines (followed
+by provisioning of underlying instances).
+With some mama's marmalade, one can represent an autoscaling group with a machine set.
+Though, the current implementation of the machineset controller needs to be improved
+to support autoscaling use cases. E.g. define delete strategy to pick appropriate machine
+to delete from the machine set ([#75](https://github.com/kubernetes-sigs/cluster-api/issues/75)).
+
+`Important`: future implementation of the autoscaling concept through machineset will not
+replace the current cloud specific implementation. The goal is to lay down a solution
+that will cover the most common use cases currently implemented by the autoscaler.
+It's important to draw a line into the sand so only necessary portion of the cloud
+autoscaling logic is re-implemented instead of re-inventing entire wheel.
+
+### Cloud provider interface
+The cluster autoscaler defines two interfaces (`NodeGroup` and `CloudProvider`
+from `k8s.io/autoscaler/cluster-autoscaler/cloudprovider` package)
+that a cloud provider needs to implemented in order to allow the cluster
+autoscaler to properly perform scaling operations.
+
+The `CloudProvider` interface allows to:
+- operate on top of node groups
+- work with pricing models (to estimate reasonable expenses)
+- check limit resources (check if maximum resources per node group are exceeded)
+
+The `NodeGroup` interface allows to operate with node groups themselves:
+- change size of a node group
+- list nodes belonging to a node group
+- autoprovision new node group
+- generate node template for a node group (to determine resource capacity of
+  new node in case a node group has zero size)
+
+In case of `AWS` the `NodeGroup` then corresponds to autoscaling groups,
+in case of `Azure` to autoscaling sets, etc.
+
+### Node template
+
+In case a node group has no node, the cluster autoscaler needs a way to generate
+a node template for a given node group before it can predict how much new resources
+will get allocated and thus how many new nodes are needed to scale up.
+
+Each of the mentioned cloud providers has its own implementation of constructing
+the node template with its own list of information pulled from cloud provider API:
+
+* Azure:
+  - instance type to determine resource capacity (CPU, GPU, Memory)
+  - instance tags to construct node labels
+  - generic node labels: Arch, OS, instance type, region, failure domain, hostname
+* AWS:
+  - instance type to determine resource capacity (CPU, GPU, Memory)
+  - instance tags to construct node labels and taints
+  - generic node labels: Arch, OS, instance type, region, failure domain, hostname
+* GCE:
+  - instance type to determine resource capacity (CPU, Memory)
+  - instance tags to construct node labels (e.g. gpu.ResourceNvidiaGPU)
+  - maximum number of pods: apiv1.ResourcePods
+  - taints (from instance metadata)
+  - kubelet system and reserved allocatable
+
+## Code status
+
+At this point the cluster autoscaler does not support node autoprovisioning (TODO(jchaloup): link to the proposal).
+Every implementation assumes every node group already exists in the cloud provider.
+Both `Create` and `Delete` operations return an error `ErrAlreadyExist` and the autoprovisioning is off.
+
+The autoscaler allows to either enumerate all node groups (via `--nodes` option)
+or automatically discover all node groups on-the-fly (via `--node-group-auto-discovery`).
+
+## Integration with machine API
+
+Given the scaling up/down operation corresponds to increasing/decreasing the number
+of nodes in a node group, it's sufficient to "just" change the number of replicas
+of corresponding machine set.
+
+### Node template
+
+The `machine` object holds all cloud provider specific information under
+`spec.providerConfig` field. Given only the actuator can interpret the provider
+config and there can be multiple actuator implementations for the same cloud provider,
+all information necessary to render the template must be contained in the machine spec (outside
+the provider config). This may lead to data duplication as one needs to provide
+a machine type in the config (e.g. to provision an instance) and cpu, gpu and memory
+requirements of the machine outside the config (e.g. to construct node object
+resource capacity).
+
+Implemented through labels the `machine` spec can look like:
+
+```yaml
+---
+apiVersion: cluster.k8s.io/v1alpha1
+kind: Machine
+metadata:
+  name: node-group-machine
+  namespace: application
+  labels:
+    sigs.k8s.io/cluster-autoscaler-resource-capacity: "cpu:200m,gpu:1,memory:4GB"
+    sigs.k8s.io/cluster-autoscaler-region: us-west-1
+...
+```
+
+Other option is to pull all the information from the `spec.providerConfig`.
+Which means the autoscaler needs to import actuator specific code, which forces
+to choose a single implementation of actuator for each cloud provider.
+That can be to restrictive.
+
+In both cases, if it can be assumed all machines/nodes in the same node group have the same
+system requirements, one can use generic kubelet's configuration discovery to get
+the kubelet's reserved and system requirements from the first machine in a given
+node group (given there is at least one node in the group).
+The same holds for node taints.
+
+
+### User Stories [optional]
+
+TPB
+
+### Implementation Details/Notes/Constraints [optional]
+
+TPB
+
+### Risks and Mitigations
+
+TPB
+
+## Graduation Criteria
+
+TPB
+
+## Implementation History
+
+TPB
+
+## Drawbacks [optional]
+
+TPB
+
+## Alternatives [optional]
+
+TPB
+
+## Infrastructure Needed [optional]
+
+TPB