Skip to content

Commit

Permalink
Move to SlurmGCP image 6.7
Browse files Browse the repository at this point in the history
  • Loading branch information
mr0re1 committed Sep 23, 2024
1 parent fe4a73f commit 175608f
Show file tree
Hide file tree
Showing 34 changed files with 144 additions and 144 deletions.
2 changes: 1 addition & 1 deletion community/examples/AMD/hpc-amd-slurm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ deployment_groups:
# these images must match the images used by Slurm modules below because
# we are building OpenMPI with PMI support in libraries contained in
# Slurm installation
family: slurm-gcp-6-6-hpc-rocky-linux-8
family: slurm-gcp-6-7-hpc-rocky-linux-8
project: schedmd-slurm-public

- id: low_cost_nodeset
Expand Down
2 changes: 1 addition & 1 deletion community/examples/hpc-slurm-ubuntu2004.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ vars:
slurm_image:
# Please refer to the following link for the latest images:
# https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/docs/images.md#supported-operating-systems
family: slurm-gcp-6-6-ubuntu-2004-lts
family: slurm-gcp-6-7-ubuntu-2004-lts
project: schedmd-slurm-public
instance_image_custom: true

Expand Down
2 changes: 1 addition & 1 deletion community/examples/hpc-slurm6-apptainer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ deployment_groups:
settings:
source_image_project_id: [schedmd-slurm-public]
# see latest in https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/docs/images.md#published-image-family
source_image_family: slurm-gcp-6-6-hpc-rocky-linux-8
source_image_family: slurm-gcp-6-7-hpc-rocky-linux-8
# You can find size of source image by using following command
# gcloud compute images describe-from-family <source_image_family> --project schedmd-slurm-public
disk_size: $(vars.disk_size)
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ locals {
}

module "slurm_nodeset_template" {
source = "github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_instance_template?ref=6.7.0"
source = "github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_instance_template?ref=6.8.0"

project_id = var.project_id
region = var.region
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ locals {
# Currently supported images and projects
known_project_families = {
schedmd-slurm-public = [
"slurm-gcp-6-6-debian-11",
"slurm-gcp-6-6-hpc-rocky-linux-8",
"slurm-gcp-6-6-ubuntu-2004-lts",
"slurm-gcp-6-6-ubuntu-2204-lts-arm64"
"slurm-gcp-6-7-debian-11",
"slurm-gcp-6-7-hpc-rocky-linux-8",
"slurm-gcp-6-7-ubuntu-2004-lts",
"slurm-gcp-6-7-ubuntu-2204-lts-arm64"
]
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ variable "instance_image" {
EOD
type = map(string)
default = {
family = "slurm-gcp-6-6-hpc-rocky-linux-8"
family = "slurm-gcp-6-7-hpc-rocky-linux-8"
project = "schedmd-slurm-public"
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,23 +56,23 @@ No resources.
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_accelerator_config"></a> [accelerator\_config](#input\_accelerator\_config) | Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details. | <pre>object({<br/> topology = string<br/> version = string<br/> })</pre> | <pre>{<br/> "topology": "",<br/> "version": ""<br/>}</pre> | no |
| <a name="input_accelerator_config"></a> [accelerator\_config](#input\_accelerator\_config) | Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details. | <pre>object({<br> topology = string<br> version = string<br> })</pre> | <pre>{<br> "topology": "",<br> "version": ""<br>}</pre> | no |
| <a name="input_data_disks"></a> [data\_disks](#input\_data\_disks) | The data disks to include in the TPU node | `list(string)` | `[]` | no |
| <a name="input_disable_public_ips"></a> [disable\_public\_ips](#input\_disable\_public\_ips) | DEPRECATED: Use `enable_public_ips` instead. | `bool` | `null` | no |
| <a name="input_docker_image"></a> [docker\_image](#input\_docker\_image) | The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-6-tf-<var.tf\_version> | `string` | `null` | no |
| <a name="input_docker_image"></a> [docker\_image](#input\_docker\_image) | The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-7-tf-<var.tf\_version> | `string` | `null` | no |
| <a name="input_enable_public_ips"></a> [enable\_public\_ips](#input\_enable\_public\_ips) | If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access\_config is set. | `bool` | `false` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the nodeset. Automatically populated by the module id if not set. <br/>If setting manually, ensure a unique value across all nodesets. | `string` | n/a | yes |
| <a name="input_network_storage"></a> [network\_storage](#input\_network\_storage) | An array of network attached storage mounts to be configured on nodes. | <pre>list(object({<br/> server_ip = string,<br/> remote_mount = string,<br/> local_mount = string,<br/> fs_type = string,<br/> mount_options = string,<br/> }))</pre> | `[]` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the nodeset. Automatically populated by the module id if not set. <br>If setting manually, ensure a unique value across all nodesets. | `string` | n/a | yes |
| <a name="input_network_storage"></a> [network\_storage](#input\_network\_storage) | An array of network attached storage mounts to be configured on nodes. | <pre>list(object({<br> server_ip = string,<br> remote_mount = string,<br> local_mount = string,<br> fs_type = string,<br> mount_options = string,<br> }))</pre> | `[]` | no |
| <a name="input_node_count_dynamic_max"></a> [node\_count\_dynamic\_max](#input\_node\_count\_dynamic\_max) | Maximum number of auto-scaling nodes allowed in this partition. | `number` | `5` | no |
| <a name="input_node_count_static"></a> [node\_count\_static](#input\_node\_count\_static) | Number of nodes to be statically created. | `number` | `0` | no |
| <a name="input_node_type"></a> [node\_type](#input\_node\_type) | Specify a node type to base the vm configuration upon it. | `string` | n/a | yes |
| <a name="input_preemptible"></a> [preemptible](#input\_preemptible) | Should use preemptibles to burst. | `bool` | `false` | no |
| <a name="input_preserve_tpu"></a> [preserve\_tpu](#input\_preserve\_tpu) | Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted | `bool` | `false` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project ID to create resources in. | `string` | n/a | yes |
| <a name="input_reserved"></a> [reserved](#input\_reserved) | Specify whether TPU-vms in this nodeset are created under a reservation. | `bool` | `false` | no |
| <a name="input_service_account"></a> [service\_account](#input\_service\_account) | DEPRECATED: Use `service_account_email` and `service_account_scopes` instead. | <pre>object({<br/> email = string<br/> scopes = set(string)<br/> })</pre> | `null` | no |
| <a name="input_service_account"></a> [service\_account](#input\_service\_account) | DEPRECATED: Use `service_account_email` and `service_account_scopes` instead. | <pre>object({<br> email = string<br> scopes = set(string)<br> })</pre> | `null` | no |
| <a name="input_service_account_email"></a> [service\_account\_email](#input\_service\_account\_email) | Service account e-mail address to attach to the TPU-vm. | `string` | `null` | no |
| <a name="input_service_account_scopes"></a> [service\_account\_scopes](#input\_service\_account\_scopes) | Scopes to attach to the TPU-vm. | `set(string)` | <pre>[<br/> "https://www.googleapis.com/auth/cloud-platform"<br/>]</pre> | no |
| <a name="input_service_account_scopes"></a> [service\_account\_scopes](#input\_service\_account\_scopes) | Scopes to attach to the TPU-vm. | `set(string)` | <pre>[<br> "https://www.googleapis.com/auth/cloud-platform"<br>]</pre> | no |
| <a name="input_subnetwork_self_link"></a> [subnetwork\_self\_link](#input\_subnetwork\_self\_link) | The name of the subnetwork to attach the TPU-vm of this nodeset to. | `string` | n/a | yes |
| <a name="input_tf_version"></a> [tf\_version](#input\_tf\_version) | Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. | `string` | `"2.14.0"` | no |
| <a name="input_zone"></a> [zone](#input\_zone) | Zone in which to create compute VMs. TPU partitions can only specify a single zone. | `string` | n/a | yes |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ variable "data_disks" {
}

variable "docker_image" {
description = "The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-6-tf-<var.tf_version>"
description = "The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-7-tf-<var.tf_version>"
type = string
default = null
}
Expand Down
Loading

0 comments on commit 175608f

Please sign in to comment.