Skip to content

Commit

Permalink
Merge pull request #2189 from GoogleCloudPlatform/slurm_5_10_2
Browse files Browse the repository at this point in the history
Update Slurm-GCP release to 5.10.2
  • Loading branch information
tpdownes authored Feb 5, 2024
2 parents f862ce1 + 8ee645c commit 4e85201
Show file tree
Hide file tree
Showing 51 changed files with 190 additions and 167 deletions.
2 changes: 1 addition & 1 deletion community/examples/AMD/hpc-amd-slurm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ deployment_groups:
# these images must match the images used by Slurm modules below because
# we are building OpenMPI with PMI support in libraries contained in
# Slurm installation
family: slurm-gcp-5-9-hpc-centos-7
family: slurm-gcp-5-10-hpc-centos-7
project: schedmd-slurm-public

- id: low_cost_node_group
Expand Down
5 changes: 3 additions & 2 deletions community/examples/hpc-slurm-chromedesktop.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,16 @@
blueprint_name: slurm-crd

vars:
enable_devel: true
project_id: ## Set GCP Project ID Here ##
deployment_name: slurm-crd-01
region: us-central1
zone: us-central1-c
instance_image_crd:
family: slurm-gcp-5-9-debian-11
family: slurm-gcp-5-10-debian-11
project: schedmd-slurm-public
instance_image:
family: slurm-gcp-5-9-hpc-centos-7
family: slurm-gcp-5-10-hpc-centos-7
project: schedmd-slurm-public

# Documentation for each of the modules used below can be found at
Expand Down
1 change: 1 addition & 0 deletions community/examples/hpc-slurm-local-ssd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
blueprint_name: hpc-slurm-local-ssd

vars:
enable_devel: true
project_id: ## Set GCP Project ID Here ##
deployment_name: hpc-localssd
region: us-central1
Expand Down
1 change: 1 addition & 0 deletions community/examples/hpc-slurm-ramble-gromacs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
blueprint_name: hpc-slurm-ramble-gromacs

vars:
enable_devel: true
project_id: ## Set GCP Project ID Here ##
deployment_name: hpc-slurm-ramble-gromacs
region: us-central1
Expand Down
5 changes: 3 additions & 2 deletions community/examples/hpc-slurm-ubuntu2004.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@
blueprint_name: hpc-slurm-ubuntu2004

vars:
enable_devel: true
project_id: ## Set GCP Project ID Here ##
deployment_name: slurm-gcp-v5
region: us-west4
zone: us-west4-c
instance_image:
# Please refer to the following link for the latest images:
# https://github.com/SchedMD/slurm-gcp/blob/master/docs/images.md#supported-operating-systems
family: slurm-gcp-5-9-ubuntu-2004-lts
# https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/docs/images.md#supported-operating-systems
family: slurm-gcp-5-10-ubuntu-2004-lts
project: schedmd-slurm-public
instance_image_custom: true

Expand Down
3 changes: 2 additions & 1 deletion community/examples/htc-slurm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@

# This blueprint provisions a cluster using the Slurm scheduler configured to
# efficiently run many short duration, loosely-coupled (non-MPI) jobs. See also:
# https://github.com/SchedMD/slurm-gcp/blob/master/docs/htc.md
# https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/docs/htc.md
# https://slurm.schedmd.com/high_throughput.html

blueprint_name: htc-slurm

vars:
enable_devel: true
project_id: ## Set GCP Project ID Here ##
deployment_name: htc-slurm
region: us-west4
Expand Down
1 change: 1 addition & 0 deletions community/examples/tutorial-starccm-slurm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
blueprint_name: starccm-on-slurm

vars:
enable_devel: true
project_id: ## Set GCP Project ID Here ##
deployment_name: starccm-slurm
region: us-central1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,8 @@ The HPC Toolkit team maintains the wrapper around the [slurm-on-gcp] terraform
modules. For support with the underlying modules, see the instructions in the
[slurm-gcp README][slurm-gcp-readme].
[slurm-on-gcp]: https://github.com/SchedMD/slurm-gcp
[slurm-gcp-readme]: https://github.com/SchedMD/slurm-gcp#slurm-on-google-cloud-platform
[slurm-on-gcp]: https://github.com/GoogleCloudPlatform/slurm-gcp
[slurm-gcp-readme]: https://github.com/GoogleCloudPlatform/slurm-gcp#slurm-on-google-cloud-platform
## License
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Expand Down Expand Up @@ -136,11 +136,12 @@ No modules.
| <a name="input_enable_spot_vm"></a> [enable\_spot\_vm](#input\_enable\_spot\_vm) | Enable the partition to use spot VMs (https://cloud.google.com/spot-vms). | `bool` | `false` | no |
| <a name="input_gpu"></a> [gpu](#input\_gpu) | DEPRECATED: use var.guest\_accelerator | <pre>object({<br> type = string<br> count = number<br> })</pre> | `null` | no |
| <a name="input_guest_accelerator"></a> [guest\_accelerator](#input\_guest\_accelerator) | List of the type and count of accelerator cards attached to the instance. | <pre>list(object({<br> type = string,<br> count = number<br> }))</pre> | `[]` | no |
| <a name="input_instance_image"></a> [instance\_image](#input\_instance\_image) | Defines the image that will be used in the Slurm node group VM instances.<br><br>Expected Fields:<br>name: The name of the image. Mutually exclusive with family.<br>family: The image family to use. Mutually exclusive with name.<br>project: The project where the image is hosted.<br><br>For more information on creating custom images that comply with Slurm on GCP<br>see the "Slurm on GCP Custom Images" section in docs/vm-images.md. | `map(string)` | <pre>{<br> "family": "slurm-gcp-5-9-hpc-centos-7",<br> "project": "schedmd-slurm-public"<br>}</pre> | no |
| <a name="input_instance_image"></a> [instance\_image](#input\_instance\_image) | Defines the image that will be used in the Slurm node group VM instances.<br><br>Expected Fields:<br>name: The name of the image. Mutually exclusive with family.<br>family: The image family to use. Mutually exclusive with name.<br>project: The project where the image is hosted.<br><br>For more information on creating custom images that comply with Slurm on GCP<br>see the "Slurm on GCP Custom Images" section in docs/vm-images.md. | `map(string)` | <pre>{<br> "family": "slurm-gcp-5-10-hpc-centos-7",<br> "project": "schedmd-slurm-public"<br>}</pre> | no |
| <a name="input_instance_image_custom"></a> [instance\_image\_custom](#input\_instance\_image\_custom) | A flag that designates that the user is aware that they are requesting<br>to use a custom and potentially incompatible image for this Slurm on<br>GCP module.<br><br>If the field is set to false, only the compatible families and project<br>names will be accepted. The deployment will fail with any other image<br>family or name. If set to true, no checks will be done.<br><br>See: https://goo.gle/hpc-slurm-images | `bool` | `false` | no |
| <a name="input_instance_template"></a> [instance\_template](#input\_instance\_template) | Self link to a custom instance template. If set, other VM definition<br>variables such as machine\_type and instance\_image will be ignored in favor<br>of the provided instance template.<br><br>For more information on creating custom images for the instance template<br>that comply with Slurm on GCP see the "Slurm on GCP Custom Images" section<br>in docs/vm-images.md. | `string` | `null` | no |
| <a name="input_labels"></a> [labels](#input\_labels) | Labels to add to partition compute instances. Key-value pairs. | `map(string)` | `{}` | no |
| <a name="input_machine_type"></a> [machine\_type](#input\_machine\_type) | Compute Platform machine type to use for this partition compute nodes. | `string` | `"c2-standard-60"` | no |
| <a name="input_maintenance_interval"></a> [maintenance\_interval](#input\_maintenance\_interval) | Specifies the frequency of planned maintenance events. Must be unset (null) or "PERIODIC". | `string` | `null` | no |
| <a name="input_metadata"></a> [metadata](#input\_metadata) | Metadata, provided as a map. | `map(string)` | `{}` | no |
| <a name="input_min_cpu_platform"></a> [min\_cpu\_platform](#input\_min\_cpu\_platform) | The name of the minimum CPU platform that you want the instance to use. | `string` | `null` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the node group. | `string` | `"ghpc"` | no |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ locals {
gpu = one(local.guest_accelerator)
labels = local.labels
machine_type = var.machine_type
maintenance_interval = var.maintenance_interval
metadata = var.metadata
min_cpu_platform = var.min_cpu_platform
on_host_maintenance = var.on_host_maintenance
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ locals {
# Currently supported images and projects
known_project_families = {
schedmd-slurm-public = [
"slurm-gcp-5-9-debian-11",
"slurm-gcp-5-9-hpc-rocky-linux-8",
"slurm-gcp-5-9-ubuntu-2004-lts",
"slurm-gcp-5-9-ubuntu-2204-lts-arm64",
"slurm-gcp-5-9-hpc-centos-7-k80",
"slurm-gcp-5-9-hpc-centos-7"
"slurm-gcp-5-10-debian-11",
"slurm-gcp-5-10-hpc-rocky-linux-8",
"slurm-gcp-5-10-ubuntu-2004-lts",
"slurm-gcp-5-10-ubuntu-2204-lts-arm64",
"slurm-gcp-5-10-hpc-centos-7-k80",
"slurm-gcp-5-10-hpc-centos-7"
]
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/

# Most variables have been sourced and modified from the SchedMD/slurm-gcp
# github repository: https://github.com/SchedMD/slurm-gcp/tree/5.9.1
# github repository: https://github.com/GoogleCloudPlatform/slurm-gcp/tree/5.10.2

variable "project_id" {
description = "Project in which the HPC deployment will be created."
Expand Down Expand Up @@ -96,7 +96,7 @@ variable "instance_image" {
type = map(string)
default = {
project = "schedmd-slurm-public"
family = "slurm-gcp-5-9-hpc-centos-7"
family = "slurm-gcp-5-10-hpc-centos-7"
}

validation {
Expand Down Expand Up @@ -413,6 +413,18 @@ variable "additional_networks" {
}))
}

variable "maintenance_interval" {
description = "Specifies the frequency of planned maintenance events. Must be unset (null) or \"PERIODIC\"."
default = null
type = string
nullable = true

validation {
condition = var.maintenance_interval == null || var.maintenance_interval == "PERIODIC"
error_message = "var.maintenance_interval must be unset (null) or set to \"PERIODIC\""
}
}

variable "disable_public_ips" {
description = "If set to false. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set."
type = bool
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ The HPC Toolkit team maintains the wrapper around the [slurm-on-gcp] terraform
modules. For support with the underlying modules, see the instructions in the
[slurm-gcp README][slurm-gcp-readme].
[slurm-on-gcp]: https://github.com/SchedMD/slurm-gcp
[slurm-gcp-readme]: https://github.com/SchedMD/slurm-gcp#slurm-on-google-cloud-platform
[slurm-on-gcp]: https://github.com/GoogleCloudPlatform/slurm-gcp
[slurm-gcp-readme]: https://github.com/GoogleCloudPlatform/slurm-gcp#slurm-on-google-cloud-platform
## License
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Expand Down Expand Up @@ -69,7 +69,7 @@ No providers.
| Name | Source | Version |
|------|--------|---------|
| <a name="module_slurm_partition"></a> [slurm\_partition](#module\_slurm\_partition) | github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition | 5.9.1 |
| <a name="module_slurm_partition"></a> [slurm\_partition](#module\_slurm\_partition) | github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition | 5.10.2 |
## Resources
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ locals {
}

module "slurm_partition" {
source = "github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition?ref=5.9.1"
source = "github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition?ref=5.10.2"

slurm_cluster_name = local.slurm_cluster_name
enable_job_exclusive = var.exclusive
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/

# Most variables have been sourced and modified from the SchedMD/slurm-gcp
# github repository: https://github.com/SchedMD/slurm-gcp/tree/5.9.1
# github repository: https://github.com/GoogleCloudPlatform/slurm-gcp/tree/5.10.2

variable "deployment_name" {
description = "Name of the deployment."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,8 @@ The HPC Toolkit team maintains the wrapper around the [slurm-on-gcp] terraform
modules. For support with the underlying modules, see the instructions in the
[slurm-gcp README][slurm-gcp-readme].

[slurm-on-gcp]: https://github.com/SchedMD/slurm-gcp
[slurm-gcp-readme]: https://github.com/SchedMD/slurm-gcp#slurm-on-google-cloud-platform
[slurm-on-gcp]: https://github.com/GoogleCloudPlatform/slurm-gcp
[slurm-gcp-readme]: https://github.com/GoogleCloudPlatform/slurm-gcp#slurm-on-google-cloud-platform

## License
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Expand Down Expand Up @@ -146,7 +146,7 @@ limitations under the License.

| Name | Source | Version |
|------|--------|---------|
| <a name="module_slurm_partition"></a> [slurm\_partition](#module\_slurm\_partition) | github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition | 5.9.1 |
| <a name="module_slurm_partition"></a> [slurm\_partition](#module\_slurm\_partition) | github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition | 5.10.2 |

## Resources

Expand All @@ -164,7 +164,7 @@ limitations under the License.
| <a name="input_exclusive"></a> [exclusive](#input\_exclusive) | Exclusive job access to nodes. | `bool` | `true` | no |
| <a name="input_is_default"></a> [is\_default](#input\_is\_default) | Sets this partition as the default partition by updating the partition\_conf.<br>If "Default" is already set in partition\_conf, this variable will have no effect. | `bool` | `false` | no |
| <a name="input_network_storage"></a> [network\_storage](#input\_network\_storage) | An array of network attached storage mounts to be configured on the partition compute nodes. | <pre>list(object({<br> server_ip = string,<br> remote_mount = string,<br> local_mount = string,<br> fs_type = string,<br> mount_options = string,<br> client_install_runner = map(string)<br> mount_runner = map(string)<br> }))</pre> | `[]` | no |
| <a name="input_node_groups"></a> [node\_groups](#input\_node\_groups) | A list of node groups associated with this partition. See<br>schedmd-slurm-gcp-v5-node-group for more information on defining a node<br>group in a blueprint. | <pre>list(object({<br> node_count_static = number<br> node_count_dynamic_max = number<br> group_name = string<br> node_conf = map(string)<br> access_config = list(object({<br> nat_ip = string<br> network_tier = string<br> }))<br> additional_disks = list(object({<br> disk_name = string<br> device_name = string<br> disk_size_gb = number<br> disk_type = string<br> disk_labels = map(string)<br> auto_delete = bool<br> boot = bool<br> }))<br> additional_networks = list(object({<br> network = string<br> subnetwork = string<br> subnetwork_project = string<br> network_ip = string<br> nic_type = string<br> stack_type = string<br> queue_count = number<br> access_config = list(object({<br> nat_ip = string<br> network_tier = string<br> }))<br> ipv6_access_config = list(object({<br> network_tier = string<br> }))<br> alias_ip_range = list(object({<br> ip_cidr_range = string<br> subnetwork_range_name = string<br> }))<br> }))<br> bandwidth_tier = string<br> can_ip_forward = bool<br> disable_smt = bool<br> disk_auto_delete = bool<br> disk_labels = map(string)<br> disk_size_gb = number<br> disk_type = string<br> enable_confidential_vm = bool<br> enable_oslogin = bool<br> enable_shielded_vm = bool<br> enable_spot_vm = bool<br> gpu = object({<br> count = number<br> type = string<br> })<br> instance_template = string<br> labels = map(string)<br> machine_type = string<br> metadata = map(string)<br> min_cpu_platform = string<br> on_host_maintenance = string<br> preemptible = bool<br> reservation_name = string<br> service_account = object({<br> email = string<br> scopes = list(string)<br> })<br> shielded_instance_config = object({<br> enable_integrity_monitoring = bool<br> enable_secure_boot = bool<br> enable_vtpm = bool<br> })<br> spot_instance_config = object({<br> termination_action = string<br> })<br> source_image_family = string<br> source_image_project = string<br> source_image = string<br> tags = list(string)<br> }))</pre> | `[]` | no |
| <a name="input_node_groups"></a> [node\_groups](#input\_node\_groups) | A list of node groups associated with this partition. See<br>schedmd-slurm-gcp-v5-node-group for more information on defining a node<br>group in a blueprint. | <pre>list(object({<br> node_count_static = number<br> node_count_dynamic_max = number<br> group_name = string<br> node_conf = map(string)<br> access_config = list(object({<br> nat_ip = string<br> network_tier = string<br> }))<br> additional_disks = list(object({<br> disk_name = string<br> device_name = string<br> disk_size_gb = number<br> disk_type = string<br> disk_labels = map(string)<br> auto_delete = bool<br> boot = bool<br> }))<br> additional_networks = list(object({<br> network = string<br> subnetwork = string<br> subnetwork_project = string<br> network_ip = string<br> nic_type = string<br> stack_type = string<br> queue_count = number<br> access_config = list(object({<br> nat_ip = string<br> network_tier = string<br> }))<br> ipv6_access_config = list(object({<br> network_tier = string<br> }))<br> alias_ip_range = list(object({<br> ip_cidr_range = string<br> subnetwork_range_name = string<br> }))<br> }))<br> bandwidth_tier = string<br> can_ip_forward = bool<br> disable_smt = bool<br> disk_auto_delete = bool<br> disk_labels = map(string)<br> disk_size_gb = number<br> disk_type = string<br> enable_confidential_vm = bool<br> enable_oslogin = bool<br> enable_shielded_vm = bool<br> enable_spot_vm = bool<br> gpu = object({<br> count = number<br> type = string<br> })<br> instance_template = string<br> labels = map(string)<br> machine_type = string<br> maintenance_interval = string<br> metadata = map(string)<br> min_cpu_platform = string<br> on_host_maintenance = string<br> preemptible = bool<br> reservation_name = string<br> service_account = object({<br> email = string<br> scopes = list(string)<br> })<br> shielded_instance_config = object({<br> enable_integrity_monitoring = bool<br> enable_secure_boot = bool<br> enable_vtpm = bool<br> })<br> spot_instance_config = object({<br> termination_action = string<br> })<br> source_image_family = string<br> source_image_project = string<br> source_image = string<br> tags = list(string)<br> }))</pre> | `[]` | no |
| <a name="input_partition_conf"></a> [partition\_conf](#input\_partition\_conf) | Slurm partition configuration as a map.<br>See https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION | `map(string)` | `{}` | no |
| <a name="input_partition_name"></a> [partition\_name](#input\_partition\_name) | The name of the slurm partition. | `string` | n/a | yes |
| <a name="input_partition_startup_scripts_timeout"></a> [partition\_startup\_scripts\_timeout](#input\_partition\_startup\_scripts\_timeout) | The timeout (seconds) applied to the partition startup script. If<br>any script exceeds this timeout, then the instance setup process is considered<br>failed and handled accordingly.<br><br>NOTE: When set to 0, the timeout is considered infinite and thus disabled. | `number` | `300` | no |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ data "google_compute_zones" "available" {
}

module "slurm_partition" {
source = "github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition?ref=5.9.1"
source = "github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_partition?ref=5.10.2"

slurm_cluster_name = local.slurm_cluster_name
partition_nodes = var.node_groups
Expand Down
Loading

0 comments on commit 4e85201

Please sign in to comment.