-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges #1802
feat: Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges #1802
Changes from 3 commits
ec607e2
2d06070
d8c4203
18de596
03e0ed6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,47 @@ | ||||||
provider "aws" { | ||||||
region = local.region | ||||||
} | ||||||
|
||||||
data "aws_availability_zones" "available" {} | ||||||
|
||||||
locals { | ||||||
cluster_name = format("%s-%s", basename(path.cwd), "shared") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets follow the current norm from what is used in other patterns:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
region = "us-west-2" | ||||||
|
||||||
vpc_cidr = "10.0.0.0/16" | ||||||
azs = slice(data.aws_availability_zones.available.names, 0, 3) | ||||||
|
||||||
tags = { | ||||||
Blueprint = local.cluster_name | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints" | ||||||
} | ||||||
} | ||||||
|
||||||
################################################################################ | ||||||
# VPC | ||||||
################################################################################ | ||||||
|
||||||
module "vpc" { | ||||||
source = "terraform-aws-modules/vpc/aws" | ||||||
version = "~> 5.0" | ||||||
|
||||||
name = local.cluster_name | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
cidr = local.vpc_cidr | ||||||
|
||||||
azs = local.azs | ||||||
private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)] | ||||||
public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)] | ||||||
|
||||||
enable_nat_gateway = true | ||||||
single_nat_gateway = true | ||||||
|
||||||
public_subnet_tags = { | ||||||
"kubernetes.io/role/elb" = 1 | ||||||
} | ||||||
|
||||||
private_subnet_tags = { | ||||||
"kubernetes.io/role/internal-elb" = 1 | ||||||
} | ||||||
|
||||||
tags = local.tags | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
output "vpc_id" { | ||
description = "Amazon EKS VPC ID" | ||
value = module.vpc.vpc_id | ||
} | ||
|
||
output "subnet_ids" { | ||
description = "Amazon EKS Subnet IDs" | ||
value = module.vpc.private_subnets | ||
} | ||
|
||
output "vpc_cidr" { | ||
description = "Amazon EKS VPC CIDR Block." | ||
value = local.vpc_cidr | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
terraform { | ||
required_version = ">= 1.0" | ||
|
||
required_providers { | ||
aws = { | ||
source = "hashicorp/aws" | ||
version = ">= 4.47" | ||
} | ||
} | ||
|
||
# ## Used for end-to-end testing on project; update to suit your needs | ||
# backend "s3" { | ||
# bucket = "<BUCKET_NAME>" | ||
# region = "<AWS_REGION>" | ||
# key = "e2e/istio-multi-cluster-vpc/terraform.tfstate" | ||
# } | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
# Cell-Based Architecture for Amazon EKS | ||
|
||
This example shows how to provision a cell based Amazon EKS cluster. | ||
|
||
* Deploy EKS Cluster with one managed node group in a VPC and AZ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the motivation for mixing Fargate, managed nodegroup, and Karpenter in this design? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was about showing how to use them in single AZ pattern. Removed the Fargate and using 1 managed node group + Karpenter now. |
||
* Deploy Fargate profiles to run `coredns`, `aws-load-balancer-controller`, and `karpenter` addons | ||
* Deploy Karpenter `Provisioner` and `AWSNodeTemplate` resources and configure them to run in AZ1 | ||
* Deploy sample deployment `inflate` with 0 replicas | ||
|
||
Refer to the [AWS Solution Guidance](https://aws.amazon.com/solutions/guidance/cell-based-architecture-for-amazon-eks/) for more details. | ||
|
||
## Prerequisites: | ||
|
||
Ensure that you have the following tools installed locally: | ||
|
||
1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) | ||
2. [kubectl](https://Kubernetes.io/docs/tasks/tools/) | ||
3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli) | ||
4. [helm](https://helm.sh/docs/helm/helm_install/) | ||
|
||
## Deploy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see the other pattern readmes for the "standard" README structure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
||
To provision this example: | ||
|
||
```sh | ||
terraform init | ||
terraform apply | ||
``` | ||
|
||
Enter `yes` at command prompt to apply | ||
|
||
## Validate | ||
|
||
The following command will update the `kubeconfig` on your local machine and allow you to interact with your EKS Cluster using `kubectl` to validate the deployment. | ||
|
||
1. Run `update-kubeconfig` command: | ||
|
||
```sh | ||
aws eks --region <REGION> update-kubeconfig --name <CLUSTER_NAME> | ||
``` | ||
|
||
2. List the nodes running currently | ||
|
||
```sh | ||
kubectl get node -o custom-columns='NODE_NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,INSTANCE-TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,INTERNAL-IP:.metadata.annotations.alpha\.kubernetes\.io/provided-node-ip' | ||
``` | ||
|
||
``` | ||
# Output should look like below | ||
NODE_NAME READY INSTANCE-TYPE AZ VERSION OS-IMAGE INTERNAL-IP | ||
fargate-ip-10-0-13-93.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-14-95.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-15-86.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-178.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-254.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-73.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
ip-10-0-12-14.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.12.14 | ||
ip-10-0-14-197.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.14.197 | ||
``` | ||
|
||
3. List out the pods running currently: | ||
|
||
```sh | ||
kubectl get pods,svc -n kube-system | ||
``` | ||
|
||
``` | ||
# Output should look like below | ||
NAME READY STATUS RESTARTS AGE | ||
pod/aws-load-balancer-controller-776868b4fb-2j9t6 1/1 Running 0 13h | ||
pod/aws-load-balancer-controller-776868b4fb-bzkrr 1/1 Running 0 13h | ||
pod/aws-node-2zhpc 2/2 Running 0 16h | ||
pod/aws-node-w897r 2/2 Running 0 16h | ||
pod/coredns-5c9679c87-bp6ws 1/1 Running 0 16h | ||
pod/coredns-5c9679c87-lw468 1/1 Running 0 16h | ||
pod/kube-proxy-6wp2k 1/1 Running 0 16h | ||
pod/kube-proxy-n8qtq 1/1 Running 0 16h | ||
|
||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE | ||
service/aws-load-balancer-webhook-service ClusterIP 172.20.44.77 <none> 443/TCP 14h | ||
service/kube-dns ClusterIP 172.20.0.10 <none> 53/UDP,53/TCP 17h | ||
``` | ||
|
||
4. Verify all the helm releases installed: | ||
|
||
```sh | ||
helm list -A | ||
``` | ||
|
||
``` | ||
# Output should look like below | ||
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION | ||
aws-load-balancer-controller kube-system 2 2023-10-18 23:07:36.089372 -0400 EDT deployed aws-load-balancer-controller-1.6.1 v2.6.1 | ||
karpenter karpenter 14 2023-10-19 08:25:12.313094 -0400 EDT deployed karpenter-v0.30.0 0.30.0 | ||
``` | ||
|
||
## Test | ||
|
||
1. Verify both Fargate nodes and EKS Managed Nodegroup worker nodes are deployed to single AZ | ||
|
||
```sh | ||
kubectl get node -o custom-columns='NODE_NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,INSTANCE-TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,INTERNAL-IP:.metadata.annotations.alpha\.kubernetes\.io/provided-node-ip' | ||
``` | ||
|
||
``` | ||
NODE_NAME READY INSTANCE-TYPE AZ VERSION OS-IMAGE INTERNAL-IP | ||
fargate-ip-10-0-13-93.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-14-95.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-15-86.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-178.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-254.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-73.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
ip-10-0-12-14.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.12.14 | ||
ip-10-0-14-197.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.14.197 | ||
``` | ||
|
||
2. Scale the `inflate` deployment to 20 replicas and watch for Karpenter to launch EKS worker nodes in correct AZ. | ||
|
||
```sh | ||
kubectl scale deployment inflate --replicas 20 | ||
``` | ||
|
||
``` | ||
deployment.apps/inflate scaled | ||
``` | ||
|
||
3. Wait for the pods become ready | ||
|
||
```sh | ||
kubectl wait --for=condition=ready pods --all --timeout 2m | ||
``` | ||
|
||
``` | ||
pod/inflate-75d744d4c6-5r5cv condition met | ||
pod/inflate-75d744d4c6-775wm condition met | ||
pod/inflate-75d744d4c6-7t225 condition met | ||
pod/inflate-75d744d4c6-945p4 condition met | ||
pod/inflate-75d744d4c6-b52gp condition met | ||
pod/inflate-75d744d4c6-d99fn condition met | ||
pod/inflate-75d744d4c6-dmnwm condition met | ||
pod/inflate-75d744d4c6-hrvvr condition met | ||
pod/inflate-75d744d4c6-j4hkl condition met | ||
pod/inflate-75d744d4c6-jwknj condition met | ||
pod/inflate-75d744d4c6-ldwts condition met | ||
pod/inflate-75d744d4c6-lqnr5 condition met | ||
pod/inflate-75d744d4c6-pctjh condition met | ||
pod/inflate-75d744d4c6-qdlkc condition met | ||
pod/inflate-75d744d4c6-qnzc5 condition met | ||
pod/inflate-75d744d4c6-r2cwj condition met | ||
pod/inflate-75d744d4c6-srmkb condition met | ||
pod/inflate-75d744d4c6-wf45j condition met | ||
pod/inflate-75d744d4c6-x9mwl condition met | ||
pod/inflate-75d744d4c6-xlbhl condition met | ||
``` | ||
|
||
4. Check all the nodes are in the correct AZ | ||
|
||
```sh | ||
kubectl get node -o custom-columns='NODE_NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,INSTANCE-TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,INTERNAL-IP:.metadata.annotations.alpha\.kubernetes\.io/provided-node-ip' | ||
``` | ||
``` | ||
NODE_NAME READY INSTANCE-TYPE AZ VERSION OS-IMAGE INTERNAL-IP | ||
fargate-ip-10-0-13-93.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-14-95.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-15-86.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-178.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-254.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
fargate-ip-10-0-8-73.us-west-2.compute.internal True <none> us-west-2a v1.28.2-eks-f8587cb Amazon Linux 2 <none> | ||
ip-10-0-12-14.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.12.14 | ||
ip-10-0-14-197.us-west-2.compute.internal True m5.large us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.14.197 | ||
ip-10-0-3-161.us-west-2.compute.internal True c6gn.8xlarge us-west-2a v1.28.1-eks-43840fb Amazon Linux 2 10.0.3.161 | ||
``` | ||
|
||
## Destroy | ||
|
||
To teardown and remove the resources created in this example: | ||
|
||
```sh | ||
terraform destroy -target="module.eks_blueprints_addons" -auto-approve | ||
terraform destroy -auto-approve | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the cluster-per-AZ design requires splitting up the Terraform configurations into multiple directories. We should collapse this back down to a single directory, but have multiple cluster definitions - one for each AZ used. This can be shown with a set of definitions split into multiple files - for example:
az1.tf
az1.yaml
az2.tf
az2.yaml
az3.tf
az3.yaml
Within each of these AZ specific Terraform files we'll have:
Then within each of the AZ specific YAML files will be the Karpenter specific manifests for that AZ and cluster within that AZ.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, made changes as suggested. We were following the istio multi-cluster pattern structure before.