Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS enabling cilium support does not work due to not being able to set network_policy to cilium #23339

Closed
1 task done
derek-andrews-work opened this issue Sep 20, 2023 · 4 comments · Fixed by #23342
Closed
1 task done

Comments

@derek-andrews-work
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

v1.1.9

AzureRM Provider Version

3.73.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

# Module
## main.tf
data "azurerm_user_assigned_identity" "this" {
  name                = var.umi_name    # Required
  resource_group_name = var.umi_rg_name # Required
}

resource "azurerm_kubernetes_cluster" "this" {
  name                              = var.cluster_name                          # Required
  location                          = var.location                              # Required
  resource_group_name               = var.resource_group_name                   # Required
  sku_tier                          = var.sku_tier                              # Optional, defaults to Paid
  dns_prefix                        = var.cluster_name                          # Required
  kubernetes_version                = var.kubernetes_version                    # Required
  private_cluster_enabled           = var.private_cluster_enabled               # Optional, defaults to true
  private_dns_zone_id               = var.private_dns_zone_id
  public_network_access_enabled     = var.public_network_access_enabled         # Optional, defaults to false
  local_account_disabled            = var.local_account_disabled                # Optional, defaults to true
  automatic_channel_upgrade         = var.automatic_channel_upgrade             # Optional, defaults to patch
  oidc_issuer_enabled               = var.oidc_issuer_enabled                   # Optional, defaults to true
  workload_identity_enabled         = var.workload_identity_enabled             # Optional, defaults to true
  azure_policy_enabled              = var.azure_policy_enabled                  # Optional, defaults to true
  http_application_routing_enabled  = var.http_application_routing_enabled      # Optional, defaults to false
  role_based_access_control_enabled = var.role_based_access_control_enabled     # Optional, defaults to true
  disk_encryption_set_id            = var.disk_encryption_set_id                # Optional, defaults to null
  tags                              = var.tags                                  # Required

  maintenance_window {
    allowed {
      day   = var.maintenance_window_day   # Optional, defaults to Monday
      hours = var.maintenance_window_hours # Optional, defaults to 1,3
    }
  }
  
  oms_agent {
    log_analytics_workspace_id = var.log_analytics_workspace_id
  }
  
  key_vault_secrets_provider {
    secret_rotation_enabled  = var.secret_rotation_enabled # Optional, defaults to true
    secret_rotation_interval = var.secret_roation_interval # Optional, defaults to 2m
  }

  storage_profile {
    blob_driver_enabled         = var.blob_driver_enabled         # Optional, defaults to false
    disk_driver_enabled         = var.disk_driver_enabled         # Optional, defaults to true
    disk_driver_version         = var.disk_driver_version         # Optional, defaults to v1
    file_driver_enabled         = var.file_driver_enabled         # Optional, defaults to false
    snapshot_controller_enabled = var.snapshot_controller_enabled # Optional, defaults to true
  }

  network_profile {
    network_plugin     = var.network_plugin     # Optional, defaults to azure
    network_policy     = var.network_policy     # Optional, defaults to calico
    service_cidr       = var.service_cidr       # Required
    dns_service_ip     = var.dns_service_ip     # Required (Might be able to set this)
    docker_bridge_cidr = var.docker_bridge_cidr # Optional, defaults to "172.17.0.1/16"
    load_balancer_sku  = var.load_balancer_sku  # Optional, defaults to standard
    outbound_type      = var.outbound_type      # Optional, defaults to userDefinedRouting
    ebpf_data_plane    = var.ebpf_data_plane
  }

  default_node_pool {
    name                          = var.systempool_name                          # Required
    type                          = var.systempool_type                          # Optional, defaults to VirtualMachineScaleSets
    capacity_reservation_group_id = var.systempool_capacity_reservation_group_id # Optional, defaults to null
    node_labels                   = var.systempool_node_labels                   # Optional, defaults to null
    node_taints                   = var.systempool_node_taints                   # Optional, defaults to null
    vm_size                       = var.systempool_vm_size                       # Optional, defaults to Standard_DS4_v2
    vnet_subnet_id                = var.node_subnet_id       # Required but pulled from data block
    zones                         = var.systempool_availability_zones            # Optional, defaults to 1,2,3
    enable_auto_scaling           = var.systempool_enable_auto_scaling           # Optional, defaults to true
    max_count                     = var.systempool_max_count                     # Optional, defaults to 2
    min_count                     = var.systempool_min_count                     # Optional, defaults to 1
    os_disk_type                  = var.systempool_os_disk_type                  # Optional, defaults to Ephemeral
    os_disk_size_gb               = var.systempool_os_disk_size_gb               # Optional, defaults to 128
    max_pods                      = var.systempool_max_pods                      # Optional, defaults to 30
    enable_node_public_ip         = var.systempool_enable_node_public_ip         # Optional, defaults to false
    pod_subnet_id                 = var.pod_subnet_id       # Required but pulled from data block
    only_critical_addons_enabled  = var.systempool_only_critical_addons_enabled  # Optional, defaults to true
    tags                          = var.tags

    upgrade_settings {
      max_surge = var.max_surge
    }
  }

  auto_scaler_profile {
    balance_similar_node_groups      = var.balance_similar_node_groups      # Optional, defaults to true
    expander                         = var.expander                         # Optional, defaults to random
    max_graceful_termination_sec     = var.max_graceful_termination_sec     # Optional, defaults to 600
    max_node_provisioning_time       = var.max_node_provisioning_time       # Optional, defaults to 15m
    max_unready_nodes                = var.max_unready_nodes                # Optional, defaults to 3
    max_unready_percentage           = var.max_unready_percentage           # Optional, defaults to 45
    new_pod_scale_up_delay           = var.new_pod_scale_up_delay           # Optional, defaults to 10s
    scale_down_delay_after_add       = var.scale_down_delay_after_add       # Optional, defaults to 10m
    scale_down_delay_after_delete    = var.scale_down_delay_after_delete    # Optional, defaults to 10s
    scale_down_delay_after_failure   = var.scale_down_delay_after_failure   # Optional, defaults to 3m
    scan_interval                    = var.scan_interval                    # Optional, defaults to 10s
    scale_down_unneeded              = var.scale_down_unneeded              # Optional, defaults to 10m
    scale_down_unready               = var.scale_down_unready               # Optional, defaults to 20m
    scale_down_utilization_threshold = var.scale_down_utilization_threshold # Optional, defaults to 0.5
    empty_bulk_delete_max            = var.empty_bulk_delete_max            # Optional, defaults to 10
    skip_nodes_with_local_storage    = var.skip_nodes_with_local_storage    # Optional, defaults to false
    skip_nodes_with_system_pods      = var.skip_nodes_with_system_pods      # Optional, defaults to false
  }

  workload_autoscaler_profile {
    keda_enabled = var.keda_enabled
  }

  identity {
    type         = var.identity_type                                  # Optional, defaults to UserAssigned
    identity_ids = ["${data.azurerm_user_assigned_identity.this.id}"] # Required but pulled from data block
  }

  kubelet_identity {
    client_id                 = data.azurerm_user_assigned_identity.this.client_id    # Required but pulled from data block
    object_id                 = data.azurerm_user_assigned_identity.this.principal_id # Required but pulled from data block
    user_assigned_identity_id = data.azurerm_user_assigned_identity.this.id           # Required but pulled from data block
  }

  # key_management_service {
  #   key_vault_key_id         = var.key_vault_key_id
  #   key_vault_network_access = var.key_vault_network_access
  # }

  azure_active_directory_role_based_access_control {
    managed                = var.aad_managed            # Optional, defaults to true
    admin_group_object_ids = var.admin_group_object_ids # Required
    azure_rbac_enabled     = var.aad_rbac_enabled       # Optional, defaults to true
  }

  lifecycle {
    ignore_changes = [
      kubernetes_version,
    ]
  }
}
## variables.tf
# required variables
variable cluster_name {}
variable resource_group_name {}
variable location {}
variable log_analytics_workspace_id {}
variable admin_group_object_ids {}
variable kubernetes_version {}
variable private_dns_zone_id {}
variable umi_name {}
variable umi_rg_name {}
variable node_subnet_id {}
variable pod_subnet_id {}
variable "tags" {
  type        = map(string)
  default     = null
}

# These variables have defaults

variable service_cidr {
  default = "cidr"
}
variable dns_service_ip {
  default = "cidr"
}

## access
variable public_network_access_enabled {
  default = true
}
variable role_based_access_control_enabled {
  default = true
}
variable aad_managed {
  default = true
}
variable aad_rbac_enabled {
  default = true
}
variable identity_type {
  default = "UserAssigned"
}

# storage
variable blob_driver_enabled {
  default = true
}
variable disk_driver_enabled {
  default = true
}
variable disk_driver_version {
  default = "v1"
}
variable file_driver_enabled {
  default = true
}
variable snapshot_controller_enabled {
  default = true
}

## systempool
variable systempool_capacity_reservation_group_id {
  default = null
}
variable systempool_node_labels {
  default = null
}
variable systempool_node_taints {
  default = null
}
variable systempool_enable_auto_scaling {
  default = true
}
variable systempool_max_count {
  default = 9
}
variable systempool_min_count {
  default = 3
}
variable systempool_os_disk_type {
  default = "Ephemeral"
}
variable systempool_os_disk_size_gb {
  default = 128
}
variable systempool_only_critical_addons_enabled {
  default = true
}
variable systempool_enable_node_public_ip {
  default = false
}
variable systempool_max_pods {
  default = 110
}
variable systempool_availability_zones {
  default = ["1", "2", "3"] 
}
variable systempool_name {
  default = "systempool"
}
variable systempool_type {
  default = "VirtualMachineScaleSets"
}
variable systempool_vm_size {
  default = "Standard_D8a_v4"
}

## auto-scaler
variable balance_similar_node_groups {
  default = true
}
variable expander {
  default = "random"
}
variable max_graceful_termination_sec {
  default = 600
}
variable max_node_provisioning_time {
  default = "15m"
}
variable max_unready_nodes {
  default = 3
}
variable max_unready_percentage {
  default = 45
}
variable new_pod_scale_up_delay {
  default = "10s"
}
variable scale_down_delay_after_add {
  default = "10m"
}
variable scale_down_delay_after_delete {
  default = "10s"
}
variable scale_down_delay_after_failure {
  default = "3m"
}
variable scan_interval {
  default = "10s"
}
variable scale_down_unneeded {
  default = "10m"
}
variable scale_down_unready {
  default = "20m"
}
variable scale_down_utilization_threshold {
  default = 0.5
}
variable empty_bulk_delete_max {
  default = 10
}
variable skip_nodes_with_local_storage {
  default = false
}
variable skip_nodes_with_system_pods {
  default = false
}
variable keda_enabled {
  default= true
}

## network
variable outbound_type {
  default = "userDefinedRouting"
}
variable docker_bridge_cidr {
  default = "100.68.152.0/21"
}
variable load_balancer_sku {
  default = "standard"
}
variable network_plugin {
  default = "azure"
}
variable network_policy {
  default = "calico"
}
variable http_application_routing_enabled {
  default = false
}
variable ebpf_data_plane {
  default = null
}

## secrets
variable secret_rotation_enabled {
  default = true
}
variable secret_roation_interval {
  default = "2m"
}

## upgrades
variable automatic_channel_upgrade {
  default = "patch"
}
variable maintenance_window_enabled {
  default = true
}
variable max_surge {
  default = 1
}

variable workload_identity_enabled {
  default = true
}

## maintenance
variable maintenance_window_day {
  default = "Tuesday"
}
variable maintenance_window_hours {
  default = [1,4]
}

## other
variable sku_tier {
  default = "Standard"
}
variable azure_policy_enabled {
  default = true
}
variable private_cluster_enabled {
  default = true
}
variable local_account_disabled {
  default = true
}
variable oidc_issuer_enabled {
  default = true
}
variable disk_encryption_set_id {
  default = null
}

module aks {
  source                     = "gitlab"
  cluster_name               = var.cluster_name
  resource_group_name        = azurerm_resource_group.aks.name
  location                   = azurerm_resource_group.aks.location
  umi_name                   = "umi-${var.cluster_name}"
  umi_rg_name                = lookup(local.app_context_map[var.cluster_name], "rg_name", null)  
  log_analytics_workspace_id = data.azurerm_log_analytics_workspace.this.id    
  disk_encryption_set_id     = data.azurerm_disk_encryption_set.this.id   
  kubernetes_version         = "1.26"
  private_dns_zone_id        = data.azurerm_private_dns_zone.aks.id
  admin_group_object_ids     = var.admin_group_object_ids
  # key_vault_key_id           = data.azurerm_key_vault_key.this.id
  tags                       = local.tags
  ebpf_data_plane            = "cilium"
  network_policy             = "cilium"

  # systempool
  node_subnet_id         = data.azurerm_subnet.node.id
  pod_subnet_id          = data.azurerm_subnet.pod.id
  systempool_os_disk_type = "Managed"
  systempool_node_labels = {}
  systempool_node_taints = []
  systempool_min_count   = 3
  systempool_max_count   = 9
  systempool_vm_size     = "Standard_D8a_v4"
}

Debug Output/Panic Output

Error: expected network_profile.0.network_policy to be one of ["calico" "azure"], got cilium
│ 
│   with module.aks.azurerm_kubernetes_cluster.this,
│   on .terraform/modules/aks/main.tf line 52, in resource "azurerm_kubernetes_cluster" "this":
│   52:     network_policy     = var.network_policy     # Optional, defaults to calico

Error: updating Kubernetes Cluster (Subscription: "sub_id"
│ Resource Group Name: "rg_name"
│ Kubernetes Cluster Name: "cluster-name"): managedclusters.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="BadRequest" Message="Cilium dataplane requires network policy cilium." Target="networkProfile.networkPolicy"

Expected Behaviour

Cilium data plane should be setup

Actual Behaviour

Doco says support for cilium was added: #22952

However, when i set ebpf_data_plane= "cilium", it errors out saying that network_policy needs to be cilium, but it won't accept that as a value. I tried null as well with same error.

Steps to Reproduce

Build cluster without ebfp_data_plane set, cluster builds fine. Add ebpf_data_plane="cilium" and run apply again and you get the error about network_policy needing to be set to cilium. Set network_policy value and you get that it only accepts Azure or Calico.

Important Factoids

No response

References

No response

@github-actions github-actions bot added the v/3.x label Sep 20, 2023
@derek-andrews-work
Copy link
Author

@derek-andrews-work
Copy link
Author

Another note. My example i'm trying to update an existing cluster and this may or may not be supported. But even to build a new cluster, i need to pass cilium as the network-policy.

@rcskosir
Copy link
Contributor

Thank you for taking the time to open this issue. Please subscribe to PR #23342 created by @ms-henglu for this issue.

Copy link

github-actions bot commented May 6, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants