Skip to content

Latest commit

 

History

History
215 lines (127 loc) · 10.7 KB

features.md

File metadata and controls

215 lines (127 loc) · 10.7 KB

Cluster API Provider AWS Feature Set

Introduction

We wish to make a feature model which allows us to see the common functionality that will need to be shared between AWS implementations.

Give each feature a unique number. Reference a feature by its number. If possible, provide a justification or requirement for the feature, which will help with prioritisation for a minimum viable product.

You can also write constraints: Feature B is optional sub-feature of A Feature C is a mandatory sub-feature of B Feature E is an alternative feature to C, D Feature F is mutually exclusive of Feature E Feature G requires Feature B

A minimum viable product will be a configuration of features, which when solved for all constraints provides the minimum list of features that will need to be developed.

Different MVPs may be possible - e.g. EKS vs. not EKS, but they may rely on shared components which will become the critical path.

Feature Set

0: AWS Cluster Provider

1: VPC Selection

2: The provider will need the ability to create a new VPC

  • Constraint: Mandatory sub-feature of 1

3: The provider provisions in an existing VPC, selecting the default VPC if none is specified

  • Constraint: Optional alternative to 2
  • Requirement: Some customers may wish to reuse VPCs in which they have existing infrastructure.

45: Etcd location

46: The provider deploys etcd as part of the control plane

  • Constraint: Mandatory sub-feature of 45
  • Requirement: For simple clusters a colocated etcd is the easiest way to operate a cluster

47: The provider deploys etcd externally to the control plane

  • Constraint: Alternative to 46
  • Requirement: For larger clusters etcd placed external to the control plane allows for independent control plane and datastore scaling

48: The provider can connect to a pre-existing etcd cluster

  • Constraint: Optional subfeature of 45
  • Requirement: An existing etcd store could be used to replace a cluster during upgrade or a complete cluster restore.

5: Control plane placement

6: The provider deploys the control plane in public subnets

  • Constraint: Mandatory sub-feature of 5
  • Requirement: For simple clusters without bastion hosts, allowing users to break-glass SSH to control plane nodes

7: The provider deploys the control plane in private subnets

8: The provider deploys control plane components to a single AZ

  • Constraint: Mandatory sub-feature of 5
  • Requirement: Architectural requirement for a particular customer workload

9: The provider deploys control plane components across multiple AZs

  • Constraint: Alternative to 8
  • Requirement: Robustness of control plane components

10: Worker node placement

11: Provider deploys worker nodes deployed to public subnets

  • Constraint: Mandatory sub-feature of 10
  • Requirement: For simple clusters without bastion hosts, allowing users to break-glass SSH to worker nodes

12: Provider deploys worker nodes to private subnets

  • Constraint: Alternative to 11
  • Requirement: AWS Well-Architected SEC 5, security requirements may require access via bastion hosts / VPN / Direct Connect

13: Provider deploys worker nodes to single AZ

  • Constraint: Mandatory sub-feature of 10
  • Requirement: Architectural requirement for a particular customer workload

14: Provider deploys worker nodes across multiple AZs

  • Constraint: Alternative to 13
  • Requirement: Robustness of cluster

15: Deploy worker nodes to a placement group

  • Constraint: Optional sub-feature of 10
  • Requirement: HPC type workload that requires fast interconnect between nodes

16: The provider deploys worker nodes to shared instances

  • Constraint: Mandatory sub-feature of 10
  • Requirement: Default behaviour / cost / availability of instances

17: The provider deploys worker nodes to dedicated EC2 instances

  • Constraint: Optional alternative to 16
  • Requirement: License requirements for a particular workload (e.g. Oracle) may require a dedicated instance

18: Worker node scaling methodology

19: Worker nodes are deployed individually or in batches not using auto-scaling groups

  • Constraint: Mandatory sub-feature of 18

20: Worker nodes are deployed via Auto-Scaling Groups using MachineSets

  • Constraint: Alternative to 19
  • Note: The implementation here would be significantly different to 19.

21: API Server Access

22: The API server is publicly accessible

  • Constraint: Mandatory sub-feature of 21
  • Requirement: Standard way of accessing k8s

23: The API server is not publicly accessible

  • Constraint: Alternative to 22
  • Requirement: Security requirement (e.g. someone’s interpretation of UK OFFICIAL) prohibits making API server endpoint publicly accessible

31: The API server is connected to a VPC via PrivateLink

  • Constraint: Sub-feature of 23 & 25
  • Requirement: Compliance requirements for API traffic to not transit public internet, e.g. UK OFFICIAL-SENSITIVE workloads. AWS recommend for FedRamp(?) and UK-OFFICIAL to use VPC or PrivateLink endpoints to connect publicly accessible regional services to VPCs to prevent traffic exiting the internal AWS network. The actual EKS endpoint, for example may still present itself with a public load balancer endpoint even if it’s connected by PrivateLink to the VPC.

43: The API server is accessible via a load balancer

  • Constraint: Sub-feature of 22 & 23
  • Requirement: For potential HA access OR public/private subnet distinctions, the API server is accessed via an AWS load balancer.

44: The API server is accessed directly via the IP address of cluster nodes hosting the API server

  • Constraint: Alternative to 43
  • Requirement: The IP addresses of each node hosting an API server is registered in DNS

24: Type of control plane

25: The control plane is EKS

26: The control plane is managed within the provider

  • Constraint: Alternative to 25
  • Requirement: Customer requires functionality not provided by EKS (e.g. admission controller, non-public API endpoint)

33: CRI

34: The provider deploys a credential helper for ECR

35: Container Hosts

36: The provider deploys to Amazon Linux 2

  • Constraint: Mandatory sub-feature of 35
  • Requirement: Parity with AWS recommendations

37: The provider deploys to CentOS / Ubuntu

  • Constraint: Alternative to 36
  • Requirement: Greater familiarity in the community (particularly Ubuntu), organisational requirements?

38: The provider deploys from arbitrary AMIs

  • Constraint: Alternative to 36. Sub-feature of 37
  • Requirement: Compliance requirement may require AMI that passes CIS Distribution Independent Linux Benchmark, or EC2 instances should have encrypted EBS root volumes for data loss prevention, requiring AMIs in the customer’s account.

39: The provider allows kubelet configuration to be customised, e.g. “--allow-privileged”

40: Arbitrary customisation of bootstrap script

  • Constraint: Sub-feature of 35
  • Requirement: Organisational or security requirements, e.g. NIST 800-190 & AWS Well-Architected controls recommend the installation of additional file integrity tools such as OSSEC, Tripwire etc…, some organisations may even mandate antivirus, etc… Cannot encode all of this as additional options, so some mix of 38 plus some ability to customise bootstrap would satisfy this without bringing too much variability into scope.

41: API Server configuration

42: The provider allows customisation of API Server

  • Constraint: Sub-feature of 41
  • Requirement: Example - Would need to enable AdmissionController for Istio automatic sidecar injection (note, EKS doesn’t allow customisation of webhooks at present, but may in the future).

TODO

  • HA / non-HA installs?

Out of Scope

Anything that can be applied after the cluster has come up with kubectl is not a cluster-api responsibility, including:

  • Monitoring / Logging
  • Many of the CNI options (at least Calico & AWS VPC CNI)
  • IAM identity for pods (e.g. kube2iam, kiam etc…)
  • ALB ingress

These should be addressed with documentation - we do not want the cluster-api provider to be a package manager for Kubernetes manifests. In addition, @roberthbailey has stated on 2018/08/14 that Google is working on a declarative cluster add on manager to be presented to sig-cluster-lifecycle for discussion later.