Skip to content

Latest commit

 

History

History
209 lines (167 loc) · 16.4 KB

README.md

File metadata and controls

209 lines (167 loc) · 16.4 KB

Description

This module creates one or more compute VM instances.

Example

- id: compute
  source: modules/compute/vm-instance
  use: [network1]
  settings:
    instance_count: 8
    name_prefix: compute
    machine_type: c2-standard-60

This creates a cluster of 8 compute VMs that are:

  • named compute-[0-7]
  • on the network defined by the network1 module
  • of type c2-standard-60

NOTE: Simultaneous Multithreading (SMT) is deactivated by default (threads_per_core=1), which means only the physical cores are visible on the VM. With SMT disabled, a machine of type c2-standard-60 will only have the 30 physical cores visible. To change this, set threads_per_core=2 under settings.

VPC Networks

There are two methods for adding network connectivity to the vm-instance module. The first is shown in the example above, where a vpc module or pre-existing-vpc module is used by the vm-instance module. When this happens, the network_self_link and subnetwork_self_link outputs from the network are provided as input to the vm-instance and a network interface is defined based on that. This can also be done updating the network_self_link and subnetwork_self_link settings directly.

The alternative option can be used when more than one network needs to be added to the vm-instance or further customization is needed beyond what is provided via other variables. For this option, the network_interfaces variable can be used to set up one or more network interfaces on the VM instance. The format is consistent with the terraform google_compute_instance network_interface block, and more information can be found in the terraform docs.

NOTE: When supplying the network_interfaces variable, networks associated with the vm-instance via use will be ignored in favor of the networks added in network_interfaces. In addition, bandwidth_tier and disable_public_ips will not apply to networks defined in network_interfaces.

SSH key metadata

This module will ignore all changes to the ssh-keys metadata field that are typically set by external Google Cloud tools that automate SSH access when not using OS Login. For example, clicking on the Google Cloud Console SSH button next to VMs in the VM Instances list will temporarily modify VM metadata to include a dynamically-generated SSH public key.

Placement

The placement_policy variable can be used to control where your VM instances are physically located relative to each other within a zone. See the official placement guide and api documentation.

Use the following settings for compact placement:

  ...
  settings:
    instance_count: 4
    machine_type: c2-standard-60
    placement_policy:
      vm_count: null
      collocation: "COLLOCATED"
      availability_domain_count: null

When vm_count is not set, as shown in the example above, then the VMs will be added to the placement policy incrementally. This is the recommended way to use placement policies.

If vm_count is specified then VMs will stay in pending state until the specified number of VMs are created. See the warning below if using this field.

Warning When creating a compact placement with more than 10 VMs, you must add -parallelism=<n> argument on apply. For example if you have 15 VMs in a placement group: terraform apply -parallelism=15. This is because terraform self limits to 10 parallel requests by default but the create instance requests will not succeed until all VMs in the placement group have been requested, forming a deadlock.

Use the following settings for spread placement:

  ...
  settings:
    instance_count: 4
    machine_type: n2-standard-4
    placement_policy:
      vm_count: null
      collocation: null
      availability_domain_count: 2

NOTE: Due to this open issue, it may be required to specify the vm_count. Once this issue is resolved, vm_count will no longer be mandatory.

License

Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name Version
terraform >= 0.14.0
google >= 4.42
google-beta >= 4.12

Providers

Name Version
google >= 4.42
google-beta >= 4.12

Modules

Name Source Version
netstorage_startup_script github.com/GoogleCloudPlatform/hpc-toolkit//modules/scripts/startup-script 64bc890

Resources

Name Type
google-beta_google_compute_instance.compute_vm resource
google_compute_disk.boot_disk resource
google_compute_resource_policy.placement_policy resource
google_compute_image.compute_image data source

Inputs

Name Description Type Default Required
auto_delete_boot_disk Controls if boot disk should be auto-deleted when instance is deleted. bool true no
bandwidth_tier Tier 1 bandwidth increases the maximum egress bandwidth for VMs.
Using the tier_1_enabled setting will enable both gVNIC and TIER_1 higher bandwidth networking.
Using the gvnic_enabled setting will only enable gVNIC and will not enable TIER_1.
Note that TIER_1 only works with specific machine families & shapes and must be using an image that supports gVNIC. See official docs for more details.
string "not_enabled" no
deployment_name Name of the deployment, used to name the cluster string n/a yes
disable_public_ips If set to true, instances will not have public IPs bool false no
disk_size_gb Size of disk for instances. number 200 no
disk_type Disk type for instances. string "pd-standard" no
enable_oslogin Enable or Disable OS Login with "ENABLE" or "DISABLE". Set to "INHERIT" to inherit project OS Login setting. string "ENABLE" no
guest_accelerator List of the type and count of accelerator cards attached to the instance.
list(object({
type = string,
count = number
}))
null no
instance_count Number of instances number 1 no
instance_image Instance Image
object({
family = string,
project = string
})
{
"family": "hpc-centos-7",
"project": "cloud-hpc-image-public"
}
no
labels Labels to add to the instances. List key, value pairs. any n/a yes
local_ssd_count The number of local SSDs to attach to each VM. See https://cloud.google.com/compute/docs/disks/local-ssd. number 0 no
local_ssd_interface Interface to be used with local SSDs. Can be either 'NVME' or 'SCSI'. No effect unless local_ssd_count is also set. string "NVME" no
machine_type Machine type to use for the instance creation string "c2-standard-60" no
metadata Metadata, provided as a map map(string) {} no
name_prefix Name Prefix string null no
network_interfaces A list of network interfaces. The options match that of the terraform
network_interface block of google_compute_instance. For descriptions of the
subfields or more information see the documentation:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#nested_network_interface

_NOTE:_ If network_interfaces are set, network_self_link and
subnetwork_self_link will be ignored, even if they are provided through
the use field. bandwidth_tier and disable_public_ips also do not apply
to network interfaces defined in this variable.

Subfields:
network (string, required if subnetwork is not supplied)
subnetwork (string, required if network is not supplied)
subnetwork_project (string, optional)
network_ip (string, optional)
nic_type (string, optional, choose from ["GVNIC", "VIRTIO_NET"])
stack_type (string, optional, choose from ["IPV4_ONLY", "IPV4_IPV6"])
queue_count (number, optional)
access_config (object, optional)
ipv6_access_config (object, optional)
alias_ip_range (list(object), optional)
list(object({
network = string,
subnetwork = string,
subnetwork_project = string,
network_ip = string,
nic_type = string,
stack_type = string,
queue_count = number,
access_config = list(object({
nat_ip = string,
public_ptr_domain_name = string,
network_tier = string
})),
ipv6_access_config = list(object({
public_ptr_domain_name = string,
network_tier = string
})),
alias_ip_range = list(object({
ip_cidr_range = string,
subnetwork_range_name = string
}))
}))
[] no
network_self_link The self link of the network to attach the VM. string "default" no
network_storage An array of network attached storage mounts to be configured.
list(object({
server_ip = string,
remote_mount = string,
local_mount = string,
fs_type = string,
mount_options = string,
client_install_runner = map(string)
mount_runner = map(string)
}))
[] no
on_host_maintenance Describes maintenance behavior for the instance. If left blank this will default to MIGRATE except for when placement_policy, spot provisioning, or GPUs require it to be TERMINATE string null no
placement_policy Control where your VM instances are physically located relative to each other within a zone.
object({
vm_count = number,
availability_domain_count = number,
collocation = string,
})
null no
project_id Project in which the HPC deployment will be created string n/a yes
region The region to deploy to string n/a yes
service_account Service account to attach to the instance. See https://www.terraform.io/docs/providers/google/r/compute_instance_template.html#service_account.
object({
email = string,
scopes = set(string)
})
{
"email": null,
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append"
]
}
no
spot Provision VMs using discounted Spot pricing, allowing for preemption bool false no
startup_script Startup script used on the instance string null no
subnetwork_self_link The self link of the subnetwork to attach the VM. string null no
tags Network tags, provided as a list list(string) [] no
threads_per_core Sets the number of threads per physical core. By setting threads_per_core
to 2, Simultaneous Multithreading (SMT) is enabled extending the total number
of virtual cores. For example, a machine of type c2-standard-60 will have 60
virtual cores with threads_per_core equal to 2. With threads_per_core equal
to 1 (SMT turned off), only the 30 physical cores will be available on the VM.

The default value of "0" will turn off SMT for supported machine types, and
will fall back to GCE defaults for unsupported machine types (t2d, shared-core
instances, or instances with less than 2 vCPU).

Disabling SMT can be more performant in many HPC workloads, therefore it is
disabled by default where compatible.

null = SMT configuration will use the GCE defaults for the machine type
0 = SMT will be disabled where compatible (default)
1 = SMT will always be disabled (will fail on incompatible machine types)
2 = SMT will always be enabled (will fail on incompatible machine types)
number 0 no
zone Compute Platform zone string n/a yes

Outputs

Name Description
external_ip External IP of the instances (if enabled)
internal_ip Internal IP of the instances
name Name of any instance created