This module creates a compute partition that can be used as input to the schedmd-slurm-gcp-v6-controller.
The partition module is designed to work alongside the
schedmd-slurm-gcp-v6-nodeset
module. A partition can be made up of one or
more nodesets, provided either through use
(preferred) or defined manually
in the nodeset
variable.
The following code snippet creates a partition module with:
- 2 nodesets added via
use
.- The first nodeset is made up of machines of type
c2-standard-30
. - The second nodeset is made up of machines of type
c2-standard-60
. - Both nodesets have a maximum count of 200 dynamically created nodes.
- The first nodeset is made up of machines of type
- partition name of "compute".
- connected to the
network
module viause
. - nodes mounted to homefs via
use
.
- id: nodeset_1
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use:
- network
settings:
name: c30
node_count_dynamic_max: 200
machine_type: c2-standard-30
- id: nodeset_2
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use:
- network
settings:
name: c60
node_count_dynamic_max: 200
machine_type: c2-standard-60
- id: compute_partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use:
- homefs
- nodeset_1
- nodeset_2
settings:
partition_name: compute
The Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.
Name | Version |
---|---|
terraform | >= 1.3 |
No providers.
No modules.
No resources.
Name | Description | Type | Default | Required |
---|---|---|---|---|
exclusive | Exclusive job access to nodes. When set to true nodes execute single job and are deleted after job exits. If set to false, multiple jobs can be scheduled on one node. |
bool |
true |
no |
is_default | Sets this partition as the default partition by updating the partition_conf. If "Default" is already set in partition_conf, this variable will have no effect. |
bool |
false |
no |
network_storage | DEPRECATED | list(object({ |
[] |
no |
nodeset | A list of nodesets. For type definition see community/modules/scheduler/schedmd-slurm-gcp-v6-controller/variables.tf::nodeset |
list(any) |
[] |
no |
nodeset_dyn | Defines dynamic nodesets, as a list. | list(object({ |
[] |
no |
nodeset_tpu | Define TPU nodesets, as a list. | list(object({ |
[] |
no |
partition_conf | Slurm partition configuration as a map. See https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION |
map(string) |
{} |
no |
partition_name | The name of the slurm partition. | string |
n/a | yes |
resume_timeout | Maximum time permitted (in seconds) between when a node resume request is issued and when the node is actually available for use. If null is given, then a smart default will be chosen depending on nodesets in partition. This sets 'ResumeTimeout' in partition_conf. See https://slurm.schedmd.com/slurm.conf.html#OPT_ResumeTimeout_1 for details. |
number |
300 |
no |
suspend_time | Nodes which remain idle or down for this number of seconds will be placed into power save mode by SuspendProgram. This sets 'SuspendTime' in partition_conf. See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTime_1 for details. NOTE: use value -1 to exclude partition from suspend. NOTE 2: if var.exclusive is set to true (default), nodes are deleted immediately after job finishes. |
number |
300 |
no |
suspend_timeout | Maximum time permitted (in seconds) between when a node suspend request is issued and when the node is shutdown. If null is given, then a smart default will be chosen depending on nodesets in partition. This sets 'SuspendTimeout' in partition_conf. See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTimeout_1 for details. |
number |
null |
no |
Name | Description |
---|---|
nodeset | Details of a nodesets in this partition |
nodeset_dyn | Details of a dynamic nodesets in this partition |
nodeset_tpu | Details of a TPU nodesets in this partition |
partitions | Details of a slurm partition |