This directory contains a set of core modules built for the HPC Toolkit. Modules describe the building blocks of an HPC deployment. The expected fields in a module are listed in more detail below. Blueprints can be extended in functionality by incorporating modules from GitHub repositories.
Modules from various sources are all listed here for visibility. Badges are used to indicate the source and status of many of these resources.
Modules listed below with the badge are located in this folder and are tested and maintained by the HPC Toolkit team.
Modules labeled with the badge are contributed by the community (including the HPC Toolkit team, partners, etc.). Community modules are located in the community folder.
Modules that are still in development and less stable are labeled with the badge.
- vm-instance : Creates one or more VM instances.
- SchedMD-slurm-on-gcp-partition : Creates a partition to be used by a slurm-controller.
- schedmd-slurm-gcp-v5-partition : Creates a partition to be used by a slurm-controller.
- schedmd-slurm-gcp-v5-node-group : Creates a node group to be used by the schedmd-slurm-gcp-v5-partition module.
- htcondor-execute-point : Manages a group of execute points for use in an HTCondor pool.
- pbspro-execution : Creates execution hosts for use in a PBS Professional cluster.
- slurm-cloudsql-federation : Creates a Google SQL Instance meant to be integrated with a slurm-controller.
- filestore : Creates a filestore file system.
- pre-existing-network-storage : Specifies a pre-existing file system that can be mounted on a VM.
- DDN-EXAScaler : Creates a DDN EXAscaler lustre file system. This module has license costs.
- Intel-DAOS : Creates a DAOS file system.
- nfs-server : Creates a VM and configures an NFS server that can be mounted by other VM.
- dashboard : Creates a monitoring dashboard for visually tracking a HPC Toolkit deployment.
- vpc : Creates a Virtual Private Cloud (VPC) network with regional subnetworks and firewall rules.
- pre-existing-vpc : Used to connect newly built components to a pre-existing VPC network.
- custom-image : Creates a custom VM Image based on the GCP HPC VM image.
- new-project : Creates a Google Cloud Project.
- service-account : Creates service accounts for a GCP project.
- service-enablement : Allows enabling various APIs for a Google Cloud Project.
- batch-job-template : Creates a Google Cloud Batch job template that works with other Toolkit modules.
- batch-login-node : Creates a VM that can be used for submission of Google Cloud Batch jobs.
- schedmd-slurm-gcp-v5-controller : Creates a Slurm controller node using slurm-gcp-version-5.
- schedmd-slurm-gcp-v5-login : Creates a Slurm login node using slurm-gcp-version-5.
- schedmd-slurm-gcp-v5-hybrid : Creates hybrid Slurm partition configuration files using slurm-gcp-version-5.
- SchedMD-slurm-on-gcp-controller : Creates a Slurm controller node using slurm-gcp.
- SchedMD-slurm-on-gcp-login-node : Creates a Slurm login node using slurm-gcp.
- htcondor-configure : Creates Toolkit runners and service accounts to configure an HTCondor pool.
- pbspro-client : Creates a client host for submitting jobs to a PBS Professional cluster.
- pbspro-server : Creates a server host for operating a PBS Professional cluster.
- startup-script : Creates a customizable startup script that can be fed into compute VMs.
- htcondor-install : Creates a startup script to install HTCondor and exports a list of required APIs
- omnia-install : Installs Slurm via Dell Omnia onto a cluster of VMs instances.
- pbspro-preinstall : Creates a Cloud Storage bucket in which to save PBS Professional RPM packages for use by PBS clusters.
- pbspro-install : Creates a Toolkit runner to install PBS Professional from RPM packages.
- pbspro-qmgr : Creates a
Toolkit runner to run common
qmgr
commands when configuring a PBS Professional cluster. - spack-install : Creates a startup script to install Spack on an instance or a slurm login or controller.
- wait-for-startup : Waits for successful completion of a startup script on a compute VM.
The id
field is used to uniquely identify and reference a defined module.
ID's are used in variables and become the
name of each module when writing the terraform main.tf
file. They are also
used in the use and outputs lists
described below.
For terraform modules, the ID will be rendered into the terraform module label at the top level main.tf file.
The source is a path or URL that points to the source files for a module. The actual content of those files is determined by the kind of the module.
A source can be a path which may refer to a module embedded in the ghpc
binary or a local file. It can also be a URL pointing to a GitHub path
containing a conforming module.
Embedded modules are embedded in the ghpc binary during compilation and cannot
be edited. To refer to embedded modules, set the source path to
modules/<<MODULE_PATH>>
.
The paths match the modules in the repository at compilation time. You can review the directory structure of the core modules and community modules to determine which path to use. For example, the following code is using the embedded pre-existing-vpc module:
- id: network1
source: modules/network/pre-existing-vpc
Local modules point to a module in the file system and can easily be edited.
They are very useful during module development. To use a local module, set
the source to a path starting with /
, ./
, or ../
. For instance, the
following module definition refers the local pre-existing-vpc modules.
- id: network1
source: ./modules/network/pre-existing-vpc
NOTE: This example would have to be run from the HPC Toolkit repository directory, otherwise the path would need to be updated to point at the correct directory.
To use a Terraform module available on GitHub, set the source to a path starting
with github.com
(over HTTPS) or [email protected]
(over SSH). For instance, the
following module definitions are sourcing the vpc module by pointing at the HPC
Toolkit GitHub repository:
Get module from GitHub over SSH:
- id: network1
source: [email protected]:GoogleCloudPlatform/hpc-toolkit.git//modules/network/vpc
Get module from GitHub over HTTPS:
- id: network1
source: github.com/GoogleCloudPlatform/hpc-toolkit//modules/network/vpc
Both examples above use the double-slash notation (//
) to indicate
the root directory of the git repository and the remainder of the path indicates
the location of the Terraform module.
Additionally, specific revisions of a remote module can be selected by any valid git reference. Typically, these are a git branch, commit hash or tag. The Intel DAOS blueprint makes extensive use of this feature. For example, to temporarily point to a development copy of the Toolkit vpc module, use:
- id: network1
source: github.com/GoogleCloudPlatform/hpc-toolkit//modules/network/vpc?ref=develop
To use a Terraform module available in a non-GitHub git repository such as
gitlab, set the source to a path starting git::
. Two Standard git protocols
are supported, git::https://
for HTTPS or git::[email protected]
for SSH.
Additional formatting and features after git::
are identical to that of the
GitHub Modules described above.
kind
refers to the way in which a module is deployed. Currently, kind
can be
either terraform
or packer
. It must be specified for modules of type
packer
. If omitted, it will default to terraform
.
The settings field is a map that supplies any user-defined variables for each
module. Settings values can be simple strings, numbers or booleans, but can
also support complex data types like maps and lists of variable depth. These
settings will become the values for the variables defined in either the
variables.tf
file for Terraform or variable.pkr.hcl
file for Packer.
For some modules, there are mandatory variables that must be set,
therefore settings
is a required field in that case. In many situations, a
combination of sensible defaults, deployment variables and used modules can
populated all required settings and therefore the settings field can be omitted.
The use
field is a powerful way of linking a module to one or more other
modules. When a module "uses" another module, the outputs of the used
module are compared to the settings of the current module. If they have
matching names and the setting has no explicit value, then it will be set to
the used module's output. For example, see the following blueprint snippet:
modules:
- id: network1
source: modules/network/vpc
- id: workstation
source: modules/compute/vm-instance
use: [network1]
settings:
...
In this snippet, the VM instance workstation
uses the outputs of vpc
network1
.
In this case both network_self_link
and subnetwork_self_link
in the
workstation settings will be set
to $(network1.network_self_link)
and $(network1.subnetwork_self_link)
which
refer to the network1 outputs
of the same names.
The order of precedence that ghpc
uses in determining when to infer a setting
value is in the following priority order:
- Explicitly set in the blueprint using the
settings
field - Output from a used module, taken in the order provided in the
use
list - Deployment variable (
vars
) of the same name - Default value for the setting
NOTE: See the network storage documentation for more information about mounting network storage file systems via the
use
field.
The outputs
field allows a module-level output to be made available at the
deployment group level and therefore will be available via terraform output
in
terraform-based deployment groups. This can useful for displaying the IP of a
login node or simply displaying instructions on how to use a module, as we
have in the
monitoring dashboard module.
Each Toolkit module depends upon Google Cloud services ("APIs") being enabled
in the project used by the HPC environment. For example, the creation of
VMs requires the Compute Engine API
(compute.googleapis.com). The startup-script module
requires the Cloud Storage API (storage.googleapis.com) for storage of the
scripts themselves. Each module includes in the Toolkit source code describes
its required APIs internally. The Toolkit will merge the requiements from all
modules and automatically validate that all
APIs are enabled in the project specified by $(vars.project_id)
.
For advanced multi-project use cases and for modules not included with the Toolkit, you may manually add required APIs to each module with the following format:
deployment_groups:
- group: primary
modules:
...
- id: examplevm
source: modules/example/module
required_apis:
$(vars.project_id):
- compute.googleapis.com
- storage.googleapis.com
$(vars.other_project_id):
- storage.googleapis.com
explicit-project-id:
- file.googleapis.com
settings:
...
The following common naming conventions should be used to decrease the verbosity
needed to define a blueprint. This is intentional to allow multiple
modules to share inferred settings from deployment variables or from other
modules listed under the use
field.
For example, if all modules are to be created in a single region, that region
can be defined as a deployment variable named region
, which is shared between
all modules without an explicit setting. Similarly, if many modules need to be
connected to the same VPC network, they all can add the vpc module ID to their
use
list so that network_name
would be inferred from that vpc module rather
than having to set it manually.
- project_id: The GCP project ID in which to create the GCP resources.
- deployment_name: The name of the current deployment of a blueprint. This can help to avoid naming conflicts of modules when multiple deployments are created from the same blueprint.
- region: The GCP region the module will be created in.
- zone: The GCP zone the module will be created in.
- network_name: The name of the network a module will use or connect to.
- labels: Labels added to the module. In order to include any module in advanced monitoring, labels must be exposed. We strongly recommend that all modules expose this variable.
Modules are flexible by design, however we do define some best practices when creating a new module meant to be used with the HPC Toolkit.
The module source field must point to a single terraform module. We recommend the following structure:
- main.tf file composing the terraform resources using provided variables.
- variables.tf file defining the variables used.
- (Optional) outputs.tf file defining any exported outputs used (if any).
- (Optional) modules/ sub-directory pointing to submodules needed to create the top level module.
- Variables for environment-specific values (like project_id) should not be given defaults. This forces the calling module to provide meaningful values.
- Variables should only have zero-value defaults (like null or empty strings) where leaving the variable empty is a valid preference which will not be rejected by the underlying API(s).
- Set good defaults wherever possible. Be opinionated about HPC use cases.
- Follow common variable naming conventions.
Any Terraform based modules in the HPC Toolkit should implement the following standards:
- terraform-docs is used to generate README files for each module.
- The first parameter listed under a module should be source (when referring to an external implementation).
- The order for parameters in inputs should be:
- description
- type
- default
- The order for parameters in outputs should be:
- description
- value