Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster autoscaler feature #1551

Merged
merged 6 commits into from
Aug 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,55 +1,56 @@
## Terraform scripts to provision AKS cluster #
***
The Azure Kubernetes Service (AKS) is the service for deploying, managing, and scaling containerized applications with Kubernetes. This repo contain terraform files, to automate the AKS cluster deployment.
***
#### Nice to know:
- AKS require two resources group. First resource group which is managed by user and contains only the Kubernetes service resource. Second resource group, known as the node resource group, contains all of the infrastructure resources associated with the cluster. These resources include the Kubernetes node VMs, virtual networking, and storage. AKS automatically remove the node resource group whenever the cluster is removed.
- AKS require service principal to create additional resources but managed identities are essentially a wrapper around service principals, and make their management simpler. It is better to use Managed identities since are passwordless.
- AKS require default node pool which serve the primary purpose of hosting critical system pods such as CoreDNS and tunnelfront and operating system must be Linux. In additional node pools user can choose which operation system will be installed (Linux/Windows)
#### Features:
- Different operating systems (Linux/Windows)
- Different virtual machines specification
- Different cluster specification (size, location, name, public ip, k8s version, name etc...)
- Different network drivers (basic/advance)
- Different resource groups
- Possibility to create new network or use existing network and resource group. AKS will just join to the existing network.


#### Requirements:
- Azure account
- kubectl
- terraform

#### Deployment:
- Initialize working directory:
``` terraform init```
- Create execution plan:
``` terraform plan```
- Create a cluster:
``` terraform apply```
- Destroy the cluster:
``` terraform destroy```
- When cluster is deployed, you should be already connected to the cluster using kubeconfig, which You can get from output. In order to save Your kubeconfig use the following command: ``` echo "$(terraform output kube_config)" > ./kubeconfig. ```
- Remember to export KUBECONFIG environment variable in order to use proper configuration file: ```export KUBECONFIG=./kubeconfig ```
#### Files overview:
- azure-aks.tf
Provisions all the resources (Resources Groups with default node pool and additional, network profile, vm and cluster specification etc...) required to set up an AKS cluster in the private subnets.
- azure-resource-groups.tf
Provisions the resource groups to nicely group AKS resources.
- azure-security-groups.tf
Provisions the security groups used by all virtual machines in new cluster. AKS by default create own customized security group only for cluster life time.
- azure-vnet.tf
Provisions a new Network for AKS cluster
- azure-subnet.tf
Provisions a new Subnet for AKS cluster
- main.tf
Setup azure providers.
- output.tf
Defines the output configuration.
- vars.tf
Sets the main component versions and setup vars used in other files.
***

#### Sources
- https://www.terraform.io/docs/providers/azurerm/r/kubernetes_cluster
- https://docs.microsoft.com/en-us/azure/aks/
# AKS design document:
## Goals:
Provide AKS cluster setup from Epiphany Platform. Fully managed master cluster by Azure.
## Use cases:
Azure Kubernetes Service (AKS) is the service for deploying, managing, and scaling containerized applications with Kubernetes. This repo contain terraform files (will be integrated with epiphany soon) to automate the AKS cluster deployment. You can use AKS in order to speed up development, increase security, use only resources which You need and scale at speed.
## Nice to know:
- AKS require two resources group. First resource group which is managed by user and contains only the Kubernetes service resource. Second resource group, known as the node resource group, contains all of the infrastructure resources associated with the cluster. These resources include the Kubernetes node VMs, virtual networking, and storage. AKS automatically remove the node resource group whenever the cluster is removed.
- AKS require service principal to create additional resources but managed identities are essentially a wrapper around service principals, and make their management simpler. It is better to use Managed identities since are passwordless.
- AKS require default node pool which serve the primary purpose of hosting critical system pods such as CoreDNS and tunnelfront and operating system must be Linux. In additional node pools user can choose which operation system will be installed (Linux/Windows)
## Features:
- Different operating systems (Linux/Windows)
- Different virtual machines specification
- Different cluster specification (size, location, name, public ip, k8s version, name etc...)
- Different network drivers (basic/advance)
- Different resource groups
- Possibility to create new network or use existing network and resource group. AKS will just join to the existing network.
- Cluster autoscaling
## Requirements:
- Azure account
- kubectl
- terraform
## Deployment:
- Initialize working directory:
``` terraform init```
- Create execution plan:
``` terraform plan```
- Create a cluster:
``` terraform apply```
- Destroy the cluster:
``` terraform destroy```
- When cluster is deployed, you should be already connected to the cluster using kubeconfig, which You can get from output. In order to save Your kubeconfig use the following command: ``` echo "$(terraform output kube_config)" > ./kubeconfig. ```
- Remember to export KUBECONFIG environment variable in order to use proper configuration file: ```export KUBECONFIG=./kubeconfig ```
## Files overview:
- azure-aks.tf
Provisions all the resources (Resources Groups with default node pool and additional, network profile, vm and cluster specification etc...) required to set up an AKS cluster in the private subnets.
- azure-resource-groups.tf
Provisions the resource groups to nicely group AKS resources.
- azure-security-groups.tf
Provisions the security groups used by all virtual machines in new cluster. AKS by default create own customized security group only for cluster life time.
- azure-vnet.tf
Provisions a new Network for AKS cluster
- azure-subnet.tf
Provisions a new Subnet for AKS cluster
- main.tf
Setup azure providers.
- output.tf
Defines the output configuration.
- vars.tf
Declare variables with description and default values.
- terraform.tfvars
Sets the main component versions and define variables used in other files.
## Design:
![AKS cluster](epiphany-aks.png)
#### Sources
- https://www.terraform.io/docs/providers/azurerm/r/kubernetes_cluster
- https://docs.microsoft.com/en-us/azure/aks/
53 changes: 0 additions & 53 deletions docs/design-docs/aks/aks.md

This file was deleted.

76 changes: 76 additions & 0 deletions docs/design-docs/aks/terraform-scripts/azure-aks.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
resource "azurerm_kubernetes_cluster" "aks" {
name = "${var.prefix}-aks"
location = var.location
dns_prefix = var.prefix
resource_group_name = var.existing_resource_group_name != "" ? var.existing_resource_group_name : azurerm_resource_group.aks-rg[0].name
node_resource_group = "${var.prefix}-rg-worker"
kubernetes_version = var.kubernetes_version


default_node_pool {
name = substr(var.default_node_pool.name, 0, 12)
node_count = var.default_node_pool.node_count
vm_size = var.default_node_pool.vm_size
vnet_subnet_id = var.existing_vnet_subnet_id != "" ? var.existing_vnet_subnet_id : azurerm_subnet.subnet[0].id
orchestrator_version = var.kubernetes_version
os_disk_size_gb = var.default_node_pool.os_disk_size_gb
enable_node_public_ip = var.nodes_public_ip
type = var.default_node_pool.type
enable_auto_scaling = var.default_node_pool.enable_auto_scaling
min_count = var.default_node_pool.min_count
max_count = var.default_node_pool.max_count
}

identity {
type = "SystemAssigned"
}

linux_profile {
admin_username = var.linux_admin_username
ssh_key {
key_data = file(var.public_ssh_key_path)
}
}

network_profile {
network_plugin = var.network_policy == "azure" ? "azure" : var.network_plugin
network_policy = var.network_policy
}

addon_profile {
kube_dashboard {
enabled = true
}
}
auto_scaler_profile {
balance_similar_node_groups = var.auto_scaler_profile.balance_similar_node_groups
max_graceful_termination_sec = var.auto_scaler_profile.max_graceful_termination_sec
scale_down_delay_after_add = var.auto_scaler_profile.scale_down_delay_after_add
scale_down_delay_after_delete = var.auto_scaler_profile.scale_down_delay_after_delete
scale_down_delay_after_failure = var.auto_scaler_profile.scale_down_delay_after_failure
scan_interval = var.auto_scaler_profile.scan_interval
scale_down_unneeded = var.auto_scaler_profile.scale_down_unneeded
scale_down_unready = var.auto_scaler_profile.scale_down_unready
scale_down_utilization_threshold = var.auto_scaler_profile.scale_down_utilization_threshold
}

tags = {
Environment = var.prefix
}
}

resource "azurerm_kubernetes_cluster_node_pool" "aks" {
count = var.additional_cluster_node_pools.node_count > "0" ? 1 : 0
kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
name = substr(var.additional_cluster_node_pools.name, 0, 6)
node_count = var.additional_cluster_node_pools.node_count
vm_size = var.additional_cluster_node_pools.vm_size
vnet_subnet_id = var.existing_vnet_subnet_id != "" ? var.existing_vnet_subnet_id : azurerm_subnet.subnet[0].id
orchestrator_version = var.kubernetes_version
os_disk_size_gb = var.additional_cluster_node_pools.os_disk_size_gb
enable_node_public_ip = var.nodes_public_ip
os_type = var.additional_cluster_node_pools.os_type
enable_auto_scaling = var.additional_cluster_node_pools.enable_auto_scaling
min_count = var.additional_cluster_node_pools.min_count
max_count = var.additional_cluster_node_pools.max_count
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
resource "azurerm_resource_group" "aks-rg" {
count = var.existing_resource_group_name != "" ? 0 : 1
name = "${var.prefix}-rg"
location = var.location
}
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
resource "azurerm_subnet_network_security_group_association" "aks-nsg-association" {
count = "${var.existing_vnet_subnet_id != "" ? 0 : 1}"
subnet_id = "${azurerm_subnet.subnet[0].id}"
network_security_group_id = "${azurerm_network_security_group.security_group_epiphany[0].id}"
count = var.existing_vnet_subnet_id != "" ? 0 : 1
subnet_id = azurerm_subnet.subnet[0].id
network_security_group_id = azurerm_network_security_group.security_group_epiphany[0].id
}
resource azurerm_network_security_group "security_group_epiphany" {
count = "${var.existing_vnet_subnet_id != "" ? 0 : 1}"
count = var.existing_vnet_subnet_id != "" ? 0 : 1
name = "aks-1-nsg"
location = "${azurerm_resource_group.aks-rg[0].location}"
resource_group_name = "${azurerm_resource_group.aks-rg[0].name}"
location = azurerm_resource_group.aks-rg[0].location
resource_group_name = azurerm_resource_group.aks-rg[0].name

security_rule {
name = "ssh"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
resource "azurerm_subnet" "subnet" {
count = "${var.existing_vnet_subnet_id != "" ? 0 : 1}"
count = var.existing_vnet_subnet_id != "" ? 0 : 1
name = "${var.prefix}-subnet-aks"
address_prefixes = "${var.aks_address_prefix}"
address_prefixes = var.aks_address_prefix
resource_group_name = azurerm_resource_group.aks-rg[0].name
virtual_network_name = azurerm_virtual_network.vnet[0].name
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
resource "azurerm_virtual_network" "vnet" {
count = "${var.existing_vnet_subnet_id != "" ? 0 : 1}"
count = var.existing_vnet_subnet_id != "" ? 0 : 1
name = "${var.prefix}-network-aks"
address_space = "${var.vnet_address_space}"
address_space = var.vnet_address_space
resource_group_name = azurerm_resource_group.aks-rg[0].name
location = azurerm_resource_group.aks-rg[0].location
}
8 changes: 8 additions & 0 deletions docs/design-docs/aks/terraform-scripts/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
provider "azurerm" {
version = "=2.22.0"
features {}
}

terraform {
required_version = "=0.12.6"
}
12 changes: 12 additions & 0 deletions docs/design-docs/aks/terraform-scripts/output.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
output "kubeconfig" {
sensitive = true
value = azurerm_kubernetes_cluster.aks.kube_config_raw
}

output "kubeconfig_export" {
value = "terraform output kubeconfig > ./kubeconfig_${var.prefix} && export KUBECONFIG=./kubeconfig_${var.prefix}"
}

output "cluster_endpoint" {
value = azurerm_kubernetes_cluster.aks.kube_config.0.host
}
Original file line number Diff line number Diff line change
@@ -1,34 +1,44 @@
existing_vnet_subnet_id = ""
existing_resource_group_name = ""
prefix = "ropu"
location = "North Central US"
kubernetes_version = "1.17.7"
vnet_address_space = ["10.1.1.0/24"]
aks_address_prefix = ["10.1.1.0/24"]
public_ssh_key_path = "~/.ssh/id_rsa.pub"
nodes_public_ip = false
network_plugin = "azure"
network_policy = "azure"
linux_admin_username = "operation"
default_node_pool = {
name = "linux"
node_count = 1
vm_size = "Standard_DS2_v2"
os_disk_size_gb = "50"
type = "VirtualMachineScaleSets"
enable_auto_scaling = false
min_count = null
max_count = null
}
additional_cluster_node_pools = {
name = "windows"
node_count = 0
vm_size = "Standard_DS2_v2"
os_type = "Windows"
os_disk_size_gb = "50"
type = "VirtualMachineScaleSets"
enable_auto_scaling = false
min_count = null
max_count = null
}

existing_vnet_subnet_id = ""
existing_resource_group_name = ""
prefix = "ropu"
location = "North Central US"
kubernetes_version = "1.17.7"
vnet_address_space = ["10.1.1.0/24"]
aks_address_prefix = ["10.1.1.0/24"]
public_ssh_key_path = "~/.ssh/id_rsa.pub"
nodes_public_ip = false
network_plugin = "azure"
network_policy = "azure"
linux_admin_username = "operation"
default_node_pool = {
name = "linux"
node_count = 1
vm_size = "Standard_DS2_v2"
os_disk_size_gb = "50"
type = "VirtualMachineScaleSets"
enable_auto_scaling = false
min_count = null
max_count = null
}
additional_cluster_node_pools = {
name = "windows"
node_count = 0
vm_size = "Standard_DS2_v2"
os_type = "Windows"
os_disk_size_gb = "50"
type = "VirtualMachineScaleSets"
enable_auto_scaling = false
min_count = null
max_count = null
}
auto_scaler_profile = {
balance_similar_node_groups = false
max_graceful_termination_sec = "600"
scale_down_delay_after_add = "10m"
scale_down_delay_after_delete = "10s"
scale_down_delay_after_failure = "10m"
scan_interval = "10s"
scale_down_unneeded = "10m"
scale_down_unready = "10m"
scale_down_utilization_threshold = "0.5"
}
Loading