Skip to content

Commit

Permalink
Adding support for DWS for GKE nodepools (#2418)
Browse files Browse the repository at this point in the history
* Adding TPU limits for GKE cluster node auto-provisioning (NAP)

* rework of the cluster autoscaling configuration

* updated README

* adding queued_provisioning (DWS) attribute

* Adding support for DWS for GKE nodepools

* typo

* adding test for DWS

---------

Co-authored-by: Wiktor Niesiobędzki <[email protected]>
  • Loading branch information
aurelienlegrand and wiktorn authored Jul 10, 2024
1 parent 2a2c4a9 commit 78069ee
Show file tree
Hide file tree
Showing 5 changed files with 104 additions and 9 deletions.
2 changes: 1 addition & 1 deletion modules/gke-cluster-standard/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ module "cluster-1" {
subnetwork = var.subnet.self_link
secondary_range_blocks = {
pods = ""
services = "/20" # can be an empty string as well
services = "/20"
}
}
cluster_autoscaling = {
Expand Down
64 changes: 56 additions & 8 deletions modules/gke-nodepool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,14 +136,62 @@ module "cluster-1-nodepool-gpu-1" {
}
# tftest modules=1 resources=2 inventory=guest-accelerator.yaml
```

### Dynamic Workload Scheduler (DWS) & node pool configuration
This example uses Dynamic Workload Scheduler (DWS) to configure a GPU nodepool.

```hcl
module "cluster-1-nodepool-dws" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west4-a"
name = "nodepool-dws"
k8s_labels = { environment = "dev" }
service_account = {
create = true
email = "nodepool-gpu-1" # optional
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
node_config = {
machine_type = "g2-standard-4"
disk_size_gb = 50
disk_type = "pd-ssd"
ephemeral_ssd_count = 1
gvnic = true
spot = true
guest_accelerator = {
type = "nvidia-l4"
count = 1
gpu_driver = {
version = "LATEST"
}
}
}
nodepool_config = {
autoscaling = {
max_node_count = 10
min_node_count = 0
}
queued_provisioning = true
}
node_count = {
initial = 0
}
reservation_affinity = {
consume_reservation_type = "NO_RESERVATION"
}
}
# tftest modules=1 resources=2 inventory=dws.yaml
```
<!-- BEGIN TFDOC -->
## Variables

| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [cluster_name](variables.tf#L23) | Cluster name. | <code>string</code> || |
| [location](variables.tf#L48) | Cluster location. | <code>string</code> || |
| [project_id](variables.tf#L181) | Cluster project id. | <code>string</code> || |
| [project_id](variables.tf#L182) | Cluster project id. | <code>string</code> || |
| [cluster_id](variables.tf#L17) | Cluster id. Optional, but providing cluster_id is recommended to prevent cluster misconfiguration in some of the edge cases. | <code>string</code> | | <code>null</code> |
| [gke_version](variables.tf#L28) | Kubernetes nodes version. Ignored if auto_upgrade is set in management_config. | <code>string</code> | | <code>null</code> |
| [k8s_labels](variables.tf#L34) | Kubernetes labels applied to each node. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
Expand All @@ -153,13 +201,13 @@ module "cluster-1-nodepool-gpu-1" {
| [node_config](variables.tf#L65) | Node-level configuration. | <code title="object&#40;&#123;&#10; boot_disk_kms_key &#61; optional&#40;string&#41;&#10; disk_size_gb &#61; optional&#40;number&#41;&#10; disk_type &#61; optional&#40;string&#41;&#10; ephemeral_ssd_count &#61; optional&#40;number&#41;&#10; gcfs &#61; optional&#40;bool, false&#41;&#10; guest_accelerator &#61; optional&#40;object&#40;&#123;&#10; count &#61; number&#10; type &#61; string&#10; gpu_driver &#61; optional&#40;object&#40;&#123;&#10; version &#61; string&#10; partition_size &#61; optional&#40;string&#41;&#10; max_shared_clients_per_gpu &#61; optional&#40;number&#41;&#10; &#125;&#41;&#41;&#10; &#125;&#41;&#41;&#10; local_nvme_ssd_count &#61; optional&#40;number&#41;&#10; gvnic &#61; optional&#40;bool, false&#41;&#10; image_type &#61; optional&#40;string&#41;&#10; kubelet_config &#61; optional&#40;object&#40;&#123;&#10; cpu_manager_policy &#61; string&#10; cpu_cfs_quota &#61; optional&#40;bool&#41;&#10; cpu_cfs_quota_period &#61; optional&#40;string&#41;&#10; pod_pids_limit &#61; optional&#40;number&#41;&#10; &#125;&#41;&#41;&#10; linux_node_config &#61; optional&#40;object&#40;&#123;&#10; sysctls &#61; optional&#40;map&#40;string&#41;&#41;&#10; cgroup_mode &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10; local_ssd_count &#61; optional&#40;number&#41;&#10; machine_type &#61; optional&#40;string&#41;&#10; metadata &#61; optional&#40;map&#40;string&#41;&#41;&#10; min_cpu_platform &#61; optional&#40;string&#41;&#10; preemptible &#61; optional&#40;bool&#41;&#10; sandbox_config_gvisor &#61; optional&#40;bool&#41;&#10; shielded_instance_config &#61; optional&#40;object&#40;&#123;&#10; enable_integrity_monitoring &#61; optional&#40;bool&#41;&#10; enable_secure_boot &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#41;&#10; spot &#61; optional&#40;bool&#41;&#10; workload_metadata_config_mode &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; disk_type &#61; &#34;pd-balanced&#34;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [node_count](variables.tf#L124) | Number of nodes per instance group. Initial value can only be changed by recreation, current is ignored when autoscaling is used. | <code title="object&#40;&#123;&#10; current &#61; optional&#40;number&#41;&#10; initial &#61; number&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; initial &#61; 1&#10;&#125;">&#123;&#8230;&#125;</code> |
| [node_locations](variables.tf#L136) | Node locations. | <code>list&#40;string&#41;</code> | | <code>null</code> |
| [nodepool_config](variables.tf#L142) | Nodepool-level configuration. | <code title="object&#40;&#123;&#10; autoscaling &#61; optional&#40;object&#40;&#123;&#10; location_policy &#61; optional&#40;string&#41;&#10; max_node_count &#61; optional&#40;number&#41;&#10; min_node_count &#61; optional&#40;number&#41;&#10; use_total_nodes &#61; optional&#40;bool, false&#41;&#10; &#125;&#41;&#41;&#10; management &#61; optional&#40;object&#40;&#123;&#10; auto_repair &#61; optional&#40;bool&#41;&#10; auto_upgrade &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#41;&#10; placement_policy &#61; optional&#40;object&#40;&#123;&#10; type &#61; string&#10; policy_name &#61; optional&#40;string&#41;&#10; tpu_topology &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10; upgrade_settings &#61; optional&#40;object&#40;&#123;&#10; max_surge &#61; number&#10; max_unavailable &#61; number&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [pod_range](variables.tf#L168) | Pod secondary range configuration. | <code title="object&#40;&#123;&#10; secondary_pod_range &#61; object&#40;&#123;&#10; name &#61; string&#10; cidr &#61; optional&#40;string&#41;&#10; create &#61; optional&#40;bool&#41;&#10; enable_private_nodes &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [reservation_affinity](variables.tf#L186) | Configuration of the desired reservation which instances could take capacity from. | <code title="object&#40;&#123;&#10; consume_reservation_type &#61; string&#10; key &#61; optional&#40;string&#41;&#10; values &#61; optional&#40;list&#40;string&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [service_account](variables.tf#L196) | Nodepool service account. If this variable is set to null, the default GCE service account will be used. If set and email is null, a service account will be created. If scopes are null a default will be used. | <code title="object&#40;&#123;&#10; create &#61; optional&#40;bool, false&#41;&#10; email &#61; optional&#40;string&#41;&#10; oauth_scopes &#61; optional&#40;list&#40;string&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [sole_tenant_nodegroup](variables.tf#L207) | Sole tenant node group. | <code>string</code> | | <code>null</code> |
| [tags](variables.tf#L213) | Network tags applied to nodes. | <code>list&#40;string&#41;</code> | | <code>null</code> |
| [taints](variables.tf#L219) | Kubernetes taints applied to all nodes. | <code title="map&#40;object&#40;&#123;&#10; value &#61; string&#10; effect &#61; string&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [nodepool_config](variables.tf#L142) | Nodepool-level configuration. | <code title="object&#40;&#123;&#10; autoscaling &#61; optional&#40;object&#40;&#123;&#10; location_policy &#61; optional&#40;string&#41;&#10; max_node_count &#61; optional&#40;number&#41;&#10; min_node_count &#61; optional&#40;number&#41;&#10; use_total_nodes &#61; optional&#40;bool, false&#41;&#10; &#125;&#41;&#41;&#10; management &#61; optional&#40;object&#40;&#123;&#10; auto_repair &#61; optional&#40;bool&#41;&#10; auto_upgrade &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#41;&#10; placement_policy &#61; optional&#40;object&#40;&#123;&#10; type &#61; string&#10; policy_name &#61; optional&#40;string&#41;&#10; tpu_topology &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10; queued_provisioning &#61; optional&#40;bool, false&#41;&#10; upgrade_settings &#61; optional&#40;object&#40;&#123;&#10; max_surge &#61; number&#10; max_unavailable &#61; number&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [pod_range](variables.tf#L169) | Pod secondary range configuration. | <code title="object&#40;&#123;&#10; secondary_pod_range &#61; object&#40;&#123;&#10; name &#61; string&#10; cidr &#61; optional&#40;string&#41;&#10; create &#61; optional&#40;bool&#41;&#10; enable_private_nodes &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [reservation_affinity](variables.tf#L187) | Configuration of the desired reservation which instances could take capacity from. | <code title="object&#40;&#123;&#10; consume_reservation_type &#61; string&#10; key &#61; optional&#40;string&#41;&#10; values &#61; optional&#40;list&#40;string&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [service_account](variables.tf#L197) | Nodepool service account. If this variable is set to null, the default GCE service account will be used. If set and email is null, a service account will be created. If scopes are null a default will be used. | <code title="object&#40;&#123;&#10; create &#61; optional&#40;bool, false&#41;&#10; email &#61; optional&#40;string&#41;&#10; oauth_scopes &#61; optional&#40;list&#40;string&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [sole_tenant_nodegroup](variables.tf#L208) | Sole tenant node group. | <code>string</code> | | <code>null</code> |
| [tags](variables.tf#L214) | Network tags applied to nodes. | <code>list&#40;string&#41;</code> | | <code>null</code> |
| [taints](variables.tf#L220) | Kubernetes taints applied to all nodes. | <code title="map&#40;object&#40;&#123;&#10; value &#61; string&#10; effect &#61; string&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |

## Outputs

Expand Down
7 changes: 7 additions & 0 deletions modules/gke-nodepool/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,13 @@ resource "google_container_node_pool" "nodepool" {
}
}

dynamic "queued_provisioning" {
for_each = try(var.nodepool_config.queued_provisioning, false) ? [""] : []
content {
enabled = var.nodepool_config.queued_provisioning
}
}

node_config {
boot_disk_kms_key = var.node_config.boot_disk_kms_key
disk_size_gb = var.node_config.disk_size_gb
Expand Down
1 change: 1 addition & 0 deletions modules/gke-nodepool/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ variable "nodepool_config" {
policy_name = optional(string)
tpu_topology = optional(string)
}))
queued_provisioning = optional(bool, false)
upgrade_settings = optional(object({
max_surge = number
max_unavailable = number
Expand Down
39 changes: 39 additions & 0 deletions tests/modules/gke_nodepool/examples/dws.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

values:
module.cluster-1-nodepool-dws.google_container_node_pool.nodepool:
cluster: cluster-1
location: europe-west4-a
name: nodepool-dws
node_config:
- boot_disk_kms_key: null
disk_size_gb: 50
disk_type: pd-ssd
ephemeral_storage_config:
- local_ssd_count: 1
ephemeral_storage_local_ssd_config: []
guest_accelerator:
- count: 1
gpu_driver_installation_config:
- gpu_driver_version: LATEST
gpu_partition_size: null
gpu_sharing_config: null
type: nvidia-l4
gvnic: []
machine_type: g2-standard-4
project: myproject

counts:
google_container_node_pool: 1

0 comments on commit 78069ee

Please sign in to comment.