This module creates a compute partition that can be used as input to the schedmd-slurm-gcp-v6-controller.
The partition module is designed to work alongside the
schedmd-slurm-gcp-v6-nodeset
module. A partition can be made up of one or
more nodesets, provided either through use
(preferred) or defined manually
in the nodeset
variable.
The following code snippet creates a partition module with:
- 2 nodesets added via
use
.- The first nodeset is made up of machines of type
c2-standard-30
. - The second nodeset is made up of machines of type
c2-standard-60
. - Both nodesets have a maximum count of 200 dynamically created nodes.
- The first nodeset is made up of machines of type
- partition name of "compute".
- connected to the
network
module viause
. - nodes mounted to homefs via
use
.
- id: nodeset_1
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use:
- network
settings:
name: c30
node_count_dynamic_max: 200
machine_type: c2-standard-30
- id: nodeset_2
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use:
- network
settings:
name: c60
node_count_dynamic_max: 200
machine_type: c2-standard-60
- id: compute_partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use:
- homefs
- nodeset_1
- nodeset_2
settings:
partition_name: compute
The Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.
Name | Version |
---|---|
terraform | >= 1.3 |
No providers.
No modules.
No resources.
Name | Description | Type | Default | Required |
---|---|---|---|---|
exclusive | Exclusive job access to nodes. | bool |
true |
no |
is_default | Sets this partition as the default partition by updating the partition_conf. If "Default" is already set in partition_conf, this variable will have no effect. |
bool |
false |
no |
network_storage | DEPRECATED | list(object({ |
[] |
no |
nodeset | A list of nodesets. For type definition see community/modules/scheduler/schedmd-slurm-gcp-v6-controller/variables.tf::nodeset |
list(any) |
[] |
no |
nodeset_dyn | Defines dynamic nodesets, as a list. | list(object({ |
[] |
no |
nodeset_tpu | Define TPU nodesets, as a list. | list(object({ |
[] |
no |
partition_conf | Slurm partition configuration as a map. See https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION |
map(string) |
{} |
no |
partition_name | The name of the slurm partition. | string |
n/a | yes |
resume_timeout | Maximum time permitted (in seconds) between when a node resume request is issued and when the node is actually available for use. If null is given, then a smart default will be chosen depending on nodesets in partition. This sets 'ResumeTimeout' in partition_conf. See https://slurm.schedmd.com/slurm.conf.html#OPT_ResumeTimeout_1 for details. |
number |
300 |
no |
suspend_time | Nodes which remain idle or down for this number of seconds will be placed into power save mode by SuspendProgram. This sets 'SuspendTime' in partition_conf. See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTime_1 for details. NOTE: use value -1 to exclude partition from suspend. |
number |
300 |
no |
suspend_timeout | Maximum time permitted (in seconds) between when a node suspend request is issued and when the node is shutdown. If null is given, then a smart default will be chosen depending on nodesets in partition. This sets 'SuspendTimeout' in partition_conf. See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTimeout_1 for details. |
number |
null |
no |
Name | Description |
---|---|
nodeset | Details of a nodesets in this partition |
nodeset_dyn | Details of a dynamic nodesets in this partition |
nodeset_tpu | Details of a TPU nodesets in this partition |
partitions | Details of a slurm partition |