WARNING: This module is in active development and is therefore not guaranteed to work consistently. Expect the interface to change rapidly while warning exists.
This module creates a node group data structure intended to be input to the schedmd-slurm-gcp-v5-partition module.
Node groups allow adding heterogeneous node types to a partition, and hence running jobs that mix multiple node characteristics. See the heterogeneous jobs section of the SchedMD documentation for more information. An example of multiple node groups being used can be found in the slurm-gcp-v5-high-io.yaml blueprint.
To specify nodes from a specific node group in a partition, the --nodelist
(or -w
) flag can be used, for example:
srun -N 3 -p compute --nodelist cluster-compute-group-[0-2] hostname
Where the 3 nodes will be selected from the nodes cluster-compute-group-[0-2]
in the compute partition.
Additionally, depending on how the nodes differ, a constraint can be added via
the --constraint
(or -C
) flag or other flags such as --mincpus
can be
used to specify nodes with the desired characteristics.
The following code snippet creates a partition module using the node-group
module as input with:
- a max node count of 200
- VM machine type of
c2-standard-30
- partition name of "compute"
- default group name of "ghpc"
- connected to the
network1
module viause
- nodes mounted to homefs via
use
- id: node_group
source: community/modules/compute/schedmd-slurm-gcp-v5-node-group
settings:
node_count_dynamic_max: 200
machine_type: c2-standard-30
- id: compute_partition
source: community/modules/compute/schedmd-slurm-gcp-v5-partition
use:
- network1
- homefs
- node_group
settings:
partition_name: compute
The HPC Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.
Copyright 2022 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Name | Version |
---|---|
terraform | >= 0.13.0 |
>= 3.83 |
Name | Version |
---|---|
>= 3.83 |
No modules.
Name | Type |
---|---|
google_compute_default_service_account.default | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
access_config | Access configurations, i.e. IPs via which the node group instances can be accessed via the internet. | list(object({ |
[] |
no |
additional_disks | Configurations of additional disks to be included on the partition nodes. | list(object({ |
[] |
no |
bandwidth_tier | Configures the network interface card and the maximum egress bandwidth for VMs. - Setting platform_default respects the Google Cloud Platform API default values for networking.- Setting virtio_enabled explicitly selects the VirtioNet network adapter.- Setting gvnic_enabled selects the gVNIC network adapter (without Tier 1 high bandwidth).- Setting tier_1_enabled selects both the gVNIC adapter and Tier 1 high bandwidth networking.- Note: both gVNIC and Tier 1 networking require a VM image with gVNIC support as well as specific VM families and shapes. - See official docs for more details. |
string |
"platform_default" |
no |
can_ip_forward | Enable IP forwarding, for NAT instances for example. | bool |
false |
no |
disk_auto_delete | Whether or not the boot disk should be auto-deleted. | bool |
true |
no |
disk_labels | Labels specific to the boot disk. These will be merged with var.labels. | map(string) |
{} |
no |
disk_size_gb | Size of boot disk to create for the partition compute nodes. | number |
50 |
no |
disk_type | Boot disk type, can be either pd-ssd, local-ssd, or pd-standard. | string |
"pd-standard" |
no |
enable_confidential_vm | Enable the Confidential VM configuration. Note: the instance image must support option. | bool |
false |
no |
enable_oslogin | Enables Google Cloud os-login for user login and authentication for VMs. See https://cloud.google.com/compute/docs/oslogin |
bool |
true |
no |
enable_shielded_vm | Enable the Shielded VM configuration. Note: the instance image must support option. | bool |
false |
no |
enable_smt | Enables Simultaneous Multi-Threading (SMT) on instance. | bool |
false |
no |
enable_spot_vm | Enable the partition to use spot VMs (https://cloud.google.com/spot-vms). | bool |
false |
no |
gpu | Definition of requested GPU resources. | object({ |
null |
no |
instance_image | Defines the image that will be used in the node group VM instances. This value is overridden if any of source_image , source_image_family orsource_image_project are set.Expected Fields: name: The name of the image. Mutually exclusive with family. family: The image family to use. Mutually exclusive with name. project: The project where the image is hosted. Custom images must comply with Slurm on GCP requirements; it is highly advised to use the packer templates provided by Slurm on GCP when constructing custom slurm images. More information can be found in the slurm-gcp docs: https://github.com/SchedMD/slurm-gcp/blob/5.3.0/docs/images.md#public-image. |
map(string) |
{ |
no |
instance_template | Self link to a custom instance template, used in place of other VM instance definition variables. | string |
null |
no |
labels | Labels to add to partition compute instances. List of key key, value pairs. | any |
{} |
no |
machine_type | Compute Platform machine type to use for this partition compute nodes. | string |
"c2-standard-60" |
no |
metadata | Metadata, provided as a map. | map(string) |
{} |
no |
min_cpu_platform | The name of the minimum CPU platform that you want the instance to use. | string |
null |
no |
name | Name of the node group. | string |
"ghpc" |
no |
node_conf | Map of Slurm node line configuration. | map(any) |
{} |
no |
node_count_dynamic_max | Maximum number of dynamic nodes allowed in this partition. | number |
10 |
no |
node_count_static | Number of nodes to be statically created. | number |
0 |
no |
on_host_maintenance | Instance availability Policy. Note: Placement groups are not supported when on_host_maintenance is set to "MIGRATE" and will be deactivated regardless of the value of enable_placement. To support enable_placement, ensure on_host_maintenance is set to "TERMINATE". |
string |
"TERMINATE" |
no |
preemptible | Should use preemptibles to burst. | string |
false |
no |
project_id | Project in which the HPC deployment will be created. | string |
n/a | yes |
service_account | Service account to attach to the compute instances. If not set, the default compute service account for the given project will be used with the "https://www.googleapis.com/auth/cloud-platform" scope. |
object({ |
null |
no |
shielded_instance_config | Shielded VM configuration for the instance. Note: not used unless enable_shielded_vm is 'true'. - enable_integrity_monitoring : Compare the most recent boot measurements to the integrity policy baseline and return a pair of pass/fail results depending on whether they match or not. - enable_secure_boot : Verify the digital signature of all boot components, and halt the boot process if signature verification fails. - enable_vtpm : Use a virtualized trusted platform module, which is a specialized computer chip you can use to encrypt objects like keys and certificates. |
object({ |
{ |
no |
source_image | The custom VM image. It is recommended to use instance_image instead. |
string |
"" |
no |
source_image_family | The custom VM image family. It is recommended to use instance_image instead. |
string |
"" |
no |
source_image_project | The hosting the custom VM image. It is recommended to use instance_image instead. |
string |
"" |
no |
spot_instance_config | Configuration for spot VMs. | object({ |
null |
no |
tags | Network tag list. | list(string) |
[] |
no |
zone_policy_allow | Partition nodes will prefer to be created in the listed zones. If a zone appears in both zone_policy_allow and zone_policy_deny, then zone_policy_deny will take priority for that zone. |
set(string) |
[] |
no |
zone_policy_deny | Partition nodes will not be created in the listed zones. If a zone appears in both zone_policy_allow and zone_policy_deny, then zone_policy_deny will take priority for that zone. |
set(string) |
[] |
no |
Name | Description |
---|---|
node_groups | Details of the node group. Typically used as input to schedmd-slurm-gcp-v5-partition . |