Skip to content

Latest commit

 

History

History
216 lines (177 loc) · 16.5 KB

File metadata and controls

216 lines (177 loc) · 16.5 KB

Description

This module is a wrapper around the slurm-controller-hybrid module by SchedMD as part of the slurm-gcp github repository. The hybrid module serves to create the configurations needed to extend an on-premise slurm cluster to one with one or more Google Cloud bursting partitions. These partitions will create the requested nodes in a GCP project on-demand and scale after a period of not being used, in the same way as the schedmd-slurm-gcp-v5-controller module auto-scales VMs.

NOTE: This is an experimental module and the functionality and documentation will likely be updated in the near future. This module has only been tested in limited capacity with the HPC Toolkit. On Premise Slurm configurations can vary significantly, this module should be used as a starting point, not a complete solution.

Usage

The slurm-controller-hybrid is intended to be run on the controller of the on premise slurm cluster, meaning executing terraform init/apply against the deployment directory. This allows the module to infer settings such as the slurm user and user ID when setting permissions for the created configurations.

If unable to install terraform and other dependencies on the controller directly, it is possible to deploy the hybrid module in a separate build environment and copy the created configurations to the on premise controller manually. This will require addition configuration and verification of permissions. For more information see the hybrid.md documentation on slurm-gcp.

NOTE: The hybrid module requires the following dependencies to be installed on the system deploying the module:

Manual Configuration

This module does not complete the installation of hybrid partitions on your slurm cluster. After deploying, you must follow the steps listed out in the hybrid.md documentation under manual steps.

Example Usage

The hybrid module can be added to a blueprint as follows:

- id: slurm-controller
  source: ./community/modules/scheduler/schedmd-slurm-gcp-v5-hybrid
  use:
  - debug-partition
  - compute-partition
  - pre-existing-storage
  settings:
    output_dir: ./hybrid
    slurm_bin_dir: /usr/local/bin
    slurm_control_host: static-controller

This defines a HPC module that create a hybrid configuration with the following attributes:

  • 2 partitions defined in previous modules with the IDs of debug-partition and compute-partition. These are the same partition modules used by schedmd-slurm-gcp-v5-controller.
  • Network storage to be mounted on the compute nodes when created, defined in pre-existing-storage.
  • output_directory set to ./hybrid. This is where the hybrid configurations will be created.
  • slurm_bin_dir located at /usr/local/bin. Set this to whereever the slurm executables are installed on your system.
  • slurm_control_host: The name of the on premise host is provided to the module for configuring NFS mounts and communicating with the controller after VM creation.

Assumptions and Limitations

Shared directories from the controller: By default, the following directories are NFS mounted from the on premise controller to the created cloud VMs:

  • /home
  • /opt/apps
  • /etc/munge
  • /usr/local/slurm/etc

The expectation is that these directories exist on the controller and that all files required by slurmd to be in sync with the controller are in those directories.

If this does not match your slurm cluster, these directories can be overwritten with a custom NFS mount using pre-existing-network-storage or by setting the network_storage variable directly in the hybrid module. Any value in network_storage, added directly or with use, will override the default directories above.

The variable disable_default_mounts will disregard these defaults. Note that at a minimum, the cloud VMs require /etc/munge and /usr/local/slurm/etc to be mounted from the controller. Those will need to be managed manually if the disable_default_mounts variable is set to true.

Power Saving Logic: The cloud partitions will make use of the power saving logic and the suspend and resume programs will be set. If any local partitions also make use of these slurm.conf variables, a conflict will likely occur. There is no support currently for partition level suspend and resume scripts, therefore either the local partition will need to turn this off or the hybrid module will not work.

Slurm versions: The version of slurm on the on premise cluster must match the slurm version on the cloud VMs created by the hybrid partitions. The version on the cloud VMs will be dictated by the version on the disk image that can be set when defining the partitions using schedmd-slurm-gcp-v5-partition.

If the publically available images do not suffice, slurm-gcp provides packer templates for creating custom disk images.

SchedMD only supports the current and last major version of slurm, therefore we strongly advise only using versions 21 or 22 when using this module. Attempting to use this module with any version older than 21 may lead to unexpected results.

License

Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name Version
terraform >= 0.14.0
null ~> 3.0

Providers

Name Version
null ~> 3.0

Modules

Name Source Version
slurm_controller_instance github.com/SchedMD/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_controller_hybrid 5.3.0

Resources

Name Type
null_resource.set_prefix_cloud_conf resource

Inputs

Name Description Type Default Required
cloud_parameters cloud.conf options.
object({
no_comma_params = bool
resume_rate = number
resume_timeout = number
suspend_rate = number
suspend_timeout = number
})
{
"no_comma_params": false,
"resume_rate": 0,
"resume_timeout": 300,
"suspend_rate": 0,
"suspend_timeout": 300
}
no
compute_startup_script Startup script used by the compute VMs. string "" no
compute_startup_scripts_timeout The timeout (seconds) applied to the compute_startup_script. If
any script exceeds this timeout, then the instance setup process is considered
failed and handled accordingly.

NOTE: When set to 0, the timeout is considered infinite and thus disabled.
number 300 no
deployment_name Name of the deployment. string n/a yes
disable_default_mounts Disable default global network storage from the controller: /usr/local/etc/slurm,
/etc/munge, /home, /apps.
If these are disabled, the slurm etc and munge dirs must be added manually,
or some other mechanism must be used to synchronize the slurm conf files
and the munge key across the cluster.
bool false no
enable_bigquery_load Enables loading of cluster job usage into big query.
NOTE: Requires Google Bigquery API.
bool false no
enable_cleanup_compute Enables automatic cleanup of compute nodes and resource policies (e.g.
placement groups) managed by this module, when cluster is destroyed.
NOTE: Requires Python and script dependencies.
WARNING: Toggling this may impact the running workload. Deployed compute nodes
may be destroyed and their jobs will be requeued.
bool false no
enable_cleanup_subscriptions Enables automatic cleanup of pub/sub subscriptions managed by this module, when
cluster is destroyed.
NOTE: Requires Python and script dependencies.
WARNING: Toggling this may temporarily impact var.enable_reconfigure behavior.
bool false no
enable_devel Enables development mode. Not for production use. bool false no
enable_reconfigure Enables automatic Slurm reconfigure on when Slurm configuration changes (e.g.
slurm.conf.tpl, partition details). Compute instances and resource policies
(e.g. placement groups) will be destroyed to align with new configuration.
NOTE: Requires Python and Google Pub/Sub API.
WARNING: Toggling this will impact the running workload. Deployed compute nodes
will be destroyed and their jobs will be requeued.
bool false no
epilog_scripts List of scripts to be used for Epilog. Programs for the slurmd to execute
on every node when a user's job completes.
See https://slurm.schedmd.com/slurm.conf.html#OPT_Epilog.
list(object({
filename = string
content = string
}))
[] no
google_app_cred_path Path to Google Applicaiton Credentials. string null no
install_dir Directory where the hybrid configuration directory will be installed on the
on-premise controller. This updates the prefix path for the resume and
suspend scripts in the generated cloud.conf file. The value defaults to
output_dir if not specified.
string null no
network_storage Storage to mounted on all instances.
- server_ip : Address of the storage server.
- remote_mount : The location in the remote instance filesystem to mount from.
- local_mount : The location on the instance filesystem to mount to.
- fs_type : Filesystem type (e.g. "nfs").
- mount_options : Options to mount with.
list(object({
server_ip = string
remote_mount = string
local_mount = string
fs_type = string
mount_options = string
}))
[] no
output_dir Directory where this module will write its files to. These files include:
cloud.conf; cloud_gres.conf; config.yaml; resume.py; suspend.py; and util.py.
If not specified explicitly, this will also be used as the default value
for the install_dir variable.
string null no
partition Cluster partitions as a list.
list(object({
compute_list = list(string)
partition = object({
enable_job_exclusive = bool
enable_placement_groups = bool
network_storage = list(object({
server_ip = string
remote_mount = string
local_mount = string
fs_type = string
mount_options = string
}))
partition_conf = map(string)
partition_name = string
partition_nodes = map(object({
bandwidth_tier = string
node_count_dynamic_max = number
node_count_static = number
enable_spot_vm = bool
group_name = string
instance_template = string
node_conf = map(string)
access_config = list(object({
network_tier = string
}))
spot_instance_config = object({
termination_action = string
})
}))
partition_startup_scripts_timeout = number
subnetwork = string
zone_policy_allow = list(string)
zone_policy_deny = list(string)
})
}))
[] no
project_id Project ID to create resources in. string n/a yes
prolog_scripts List of scripts to be used for Prolog. Programs for the slurmd to execute
whenever it is asked to run a job step from a new job allocation.
See https://slurm.schedmd.com/slurm.conf.html#OPT_Prolog.
list(object({
filename = string
content = string
}))
[] no
slurm_bin_dir Path to directroy of Slurm binary commands (e.g. scontrol, sinfo). If 'null',
then it will be assumed that binaries are in $PATH.
string null no
slurm_cluster_name Cluster name, used for resource naming and slurm accounting. If not provided
it will default to the first 8 characters of the deployment name (removing
any invalid characters).
string null no
slurm_control_addr The IP address or a name by which the address can be identified.
This value is passed to slurm.conf such that:
SlurmctldHost={var.slurm_control_host}({var.slurm_control_addr})
See https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldHost
string null no
slurm_control_host The short, or long, hostname of the machine where Slurm control daemon is
executed (i.e. the name returned by the command "hostname -s").
This value is passed to slurm.conf such that:
SlurmctldHost={var.slurm_control_host}({var.slurm_control_addr})
See https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldHost
string n/a yes
slurm_log_dir Directory where Slurm logs to. string "/var/log/slurm" no

Outputs

No outputs.