Skip to content

Commit

Permalink
Merge pull request #3458 from mr0re1/full_hd
Browse files Browse the repository at this point in the history
Add full defintion of `nodeset` to partition module
  • Loading branch information
mr0re1 authored Dec 23, 2024
2 parents 6bd418d + 3f0b32d commit 8e28be6
Show file tree
Hide file tree
Showing 3 changed files with 107 additions and 49 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ No resources.
| <a name="input_exclusive"></a> [exclusive](#input\_exclusive) | Exclusive job access to nodes. When set to true nodes execute single job and are deleted<br/>after job exits. If set to false, multiple jobs can be scheduled on one node. | `bool` | `true` | no |
| <a name="input_is_default"></a> [is\_default](#input\_is\_default) | Sets this partition as the default partition by updating the partition\_conf.<br/>If "Default" is already set in partition\_conf, this variable will have no effect. | `bool` | `false` | no |
| <a name="input_network_storage"></a> [network\_storage](#input\_network\_storage) | DEPRECATED | <pre>list(object({<br/> server_ip = string,<br/> remote_mount = string,<br/> local_mount = string,<br/> fs_type = string,<br/> mount_options = string,<br/> client_install_runner = map(string)<br/> mount_runner = map(string)<br/> }))</pre> | `[]` | no |
| <a name="input_nodeset"></a> [nodeset](#input\_nodeset) | A list of nodesets.<br/>For type definition see community/modules/scheduler/schedmd-slurm-gcp-v6-controller/variables.tf::nodeset | `list(any)` | `[]` | no |
| <a name="input_nodeset"></a> [nodeset](#input\_nodeset) | A list of nodesets.<br/>For type definition see community/modules/scheduler/schedmd-slurm-gcp-v6-controller/variables.tf::nodeset | <pre>list(object({<br/> node_count_static = optional(number, 0)<br/> node_count_dynamic_max = optional(number, 1)<br/> node_conf = optional(map(string), {})<br/> nodeset_name = string<br/> additional_disks = optional(list(object({<br/> disk_name = optional(string)<br/> device_name = optional(string)<br/> disk_size_gb = optional(number)<br/> disk_type = optional(string)<br/> disk_labels = optional(map(string), {})<br/> auto_delete = optional(bool, true)<br/> boot = optional(bool, false)<br/> })), [])<br/> bandwidth_tier = optional(string, "platform_default")<br/> can_ip_forward = optional(bool, false)<br/> disable_smt = optional(bool, false)<br/> disk_auto_delete = optional(bool, true)<br/> disk_labels = optional(map(string), {})<br/> disk_size_gb = optional(number)<br/> disk_type = optional(string)<br/> enable_confidential_vm = optional(bool, false)<br/> enable_placement = optional(bool, false)<br/> enable_oslogin = optional(bool, true)<br/> enable_shielded_vm = optional(bool, false)<br/> enable_maintenance_reservation = optional(bool, false)<br/> enable_opportunistic_maintenance = optional(bool, false)<br/> gpu = optional(object({<br/> count = number<br/> type = string<br/> }))<br/> dws_flex = object({<br/> enabled = bool<br/> max_run_duration = number<br/> use_job_duration = bool<br/> })<br/> labels = optional(map(string), {})<br/> machine_type = optional(string)<br/> maintenance_interval = optional(string)<br/> instance_properties_json = string<br/> metadata = optional(map(string), {})<br/> min_cpu_platform = optional(string)<br/> network_tier = optional(string, "STANDARD")<br/> network_storage = optional(list(object({<br/> server_ip = string<br/> remote_mount = string<br/> local_mount = string<br/> fs_type = string<br/> mount_options = string<br/> client_install_runner = optional(map(string))<br/> mount_runner = optional(map(string))<br/> })), [])<br/> on_host_maintenance = optional(string)<br/> preemptible = optional(bool, false)<br/> region = optional(string)<br/> service_account = optional(object({<br/> email = optional(string)<br/> scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])<br/> }))<br/> shielded_instance_config = optional(object({<br/> enable_integrity_monitoring = optional(bool, true)<br/> enable_secure_boot = optional(bool, true)<br/> enable_vtpm = optional(bool, true)<br/> }))<br/> source_image_family = optional(string)<br/> source_image_project = optional(string)<br/> source_image = optional(string)<br/> subnetwork_self_link = string<br/> additional_networks = optional(list(object({<br/> network = string<br/> subnetwork = string<br/> subnetwork_project = string<br/> network_ip = string<br/> nic_type = string<br/> stack_type = string<br/> queue_count = number<br/> access_config = list(object({<br/> nat_ip = string<br/> network_tier = string<br/> }))<br/> ipv6_access_config = list(object({<br/> network_tier = string<br/> }))<br/> alias_ip_range = list(object({<br/> ip_cidr_range = string<br/> subnetwork_range_name = string<br/> }))<br/> })))<br/> access_config = optional(list(object({<br/> nat_ip = string<br/> network_tier = string<br/> })))<br/> spot = optional(bool, false)<br/> tags = optional(list(string), [])<br/> termination_action = optional(string)<br/> reservation_name = optional(string)<br/> future_reservation = string<br/> startup_script = optional(list(object({<br/> filename = string<br/> content = string })), [])<br/><br/> zone_target_shape = string<br/> zone_policy_allow = set(string)<br/> zone_policy_deny = set(string)<br/> }))</pre> | `[]` | no |
| <a name="input_nodeset_dyn"></a> [nodeset\_dyn](#input\_nodeset\_dyn) | Defines dynamic nodesets, as a list. | <pre>list(object({<br/> nodeset_name = string<br/> nodeset_feature = string<br/> }))</pre> | `[]` | no |
| <a name="input_nodeset_tpu"></a> [nodeset\_tpu](#input\_nodeset\_tpu) | Define TPU nodesets, as a list. | <pre>list(object({<br/> node_count_static = optional(number, 0)<br/> node_count_dynamic_max = optional(number, 5)<br/> nodeset_name = string<br/> enable_public_ip = optional(bool, false)<br/> node_type = string<br/> accelerator_config = optional(object({<br/> topology = string<br/> version = string<br/> }), {<br/> topology = ""<br/> version = ""<br/> })<br/> tf_version = string<br/> preemptible = optional(bool, false)<br/> preserve_tpu = optional(bool, false)<br/> zone = string<br/> data_disks = optional(list(string), [])<br/> docker_image = optional(string, "")<br/> network_storage = optional(list(object({<br/> server_ip = string<br/> remote_mount = string<br/> local_mount = string<br/> fs_type = string<br/> mount_options = string<br/> })), [])<br/> subnetwork = string<br/> service_account = optional(object({<br/> email = optional(string)<br/> scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])<br/> }))<br/> project_id = string<br/> reserved = optional(string, false)<br/> }))</pre> | `[]` | no |
| <a name="input_partition_conf"></a> [partition\_conf](#input\_partition\_conf) | Slurm partition configuration as a map.<br/>See https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION | `map(string)` | `{}` | no |
Expand Down
108 changes: 106 additions & 2 deletions community/modules/compute/schedmd-slurm-gcp-v6-partition/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,112 @@ variable "nodeset" {
A list of nodesets.
For type definition see community/modules/scheduler/schedmd-slurm-gcp-v6-controller/variables.tf::nodeset
EOD
type = list(any)
default = []
type = list(object({
node_count_static = optional(number, 0)
node_count_dynamic_max = optional(number, 1)
node_conf = optional(map(string), {})
nodeset_name = string
additional_disks = optional(list(object({
disk_name = optional(string)
device_name = optional(string)
disk_size_gb = optional(number)
disk_type = optional(string)
disk_labels = optional(map(string), {})
auto_delete = optional(bool, true)
boot = optional(bool, false)
})), [])
bandwidth_tier = optional(string, "platform_default")
can_ip_forward = optional(bool, false)
disable_smt = optional(bool, false)
disk_auto_delete = optional(bool, true)
disk_labels = optional(map(string), {})
disk_size_gb = optional(number)
disk_type = optional(string)
enable_confidential_vm = optional(bool, false)
enable_placement = optional(bool, false)
enable_oslogin = optional(bool, true)
enable_shielded_vm = optional(bool, false)
enable_maintenance_reservation = optional(bool, false)
enable_opportunistic_maintenance = optional(bool, false)
gpu = optional(object({
count = number
type = string
}))
dws_flex = object({
enabled = bool
max_run_duration = number
use_job_duration = bool
})
labels = optional(map(string), {})
machine_type = optional(string)
maintenance_interval = optional(string)
instance_properties_json = string
metadata = optional(map(string), {})
min_cpu_platform = optional(string)
network_tier = optional(string, "STANDARD")
network_storage = optional(list(object({
server_ip = string
remote_mount = string
local_mount = string
fs_type = string
mount_options = string
client_install_runner = optional(map(string))
mount_runner = optional(map(string))
})), [])
on_host_maintenance = optional(string)
preemptible = optional(bool, false)
region = optional(string)
service_account = optional(object({
email = optional(string)
scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])
}))
shielded_instance_config = optional(object({
enable_integrity_monitoring = optional(bool, true)
enable_secure_boot = optional(bool, true)
enable_vtpm = optional(bool, true)
}))
source_image_family = optional(string)
source_image_project = optional(string)
source_image = optional(string)
subnetwork_self_link = string
additional_networks = optional(list(object({
network = string
subnetwork = string
subnetwork_project = string
network_ip = string
nic_type = string
stack_type = string
queue_count = number
access_config = list(object({
nat_ip = string
network_tier = string
}))
ipv6_access_config = list(object({
network_tier = string
}))
alias_ip_range = list(object({
ip_cidr_range = string
subnetwork_range_name = string
}))
})))
access_config = optional(list(object({
nat_ip = string
network_tier = string
})))
spot = optional(bool, false)
tags = optional(list(string), [])
termination_action = optional(string)
reservation_name = optional(string)
future_reservation = string
startup_script = optional(list(object({
filename = string
content = string })), [])

zone_target_shape = string
zone_policy_allow = set(string)
zone_policy_deny = set(string)
}))
default = []

validation {
condition = length(distinct(var.nodeset[*].nodeset_name)) == length(var.nodeset)
Expand Down
46 changes: 0 additions & 46 deletions tools/validate_configs/test_configs/node-groups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,51 +107,6 @@ deployment_groups:
settings:
partition_name: multns

## Explicitly set node partition with one nodeset
- id: single_nodeset_explicit_partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
settings:
partition_name: explns
is_default: true
nodeset:
- nodeset_name: expl
node_count_static: 0
node_count_dynamic_max: 4
enable_placement: false
node_conf: {}
additional_disks: []
additional_networks: []
bandwidth_tier: null
can_ip_forward: false
enable_smt: true
disk_auto_delete: true
disk_labels: {}
disk_size_gb: 50
disk_type: pd-standard
enable_confidential_vm: false
enable_oslogin: true
enable_shielded_vm: false
enable_spot_vm: false
gpu: null
instance_template: null
labels: $(vars.labels)
machine_type: n2-standard-16
maintenance_interval: ""
metadata: {}
min_cpu_platform: null
on_host_maintenance: TERMINATE
preemptible: false
reservation_name: null # will be replaced by default value empty string
service_account_email: null
shielded_instance_config: null
subnetwork_self_link: $(network.subnetwork_self_link)
spot_instance_config: null
source_image_family: null
source_image_project: null
source_image: null
tags: []
access_config: []

- id: slurm_login
source: community/modules/scheduler/schedmd-slurm-gcp-v6-login
use: [network]
Expand All @@ -165,7 +120,6 @@ deployment_groups:
- network
- single_nodeset_partition
- multiple_nodesets
- single_nodeset_explicit_partition
- homefs
- slurm_login
settings:
Expand Down

0 comments on commit 8e28be6

Please sign in to comment.