cluster-toolkit/modules/compute/vm-instance at main · GoogleCloudPlatform/cluster-toolkit

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
compute_image.tf	compute_image.tf
gpu_definition.tf	gpu_definition.tf
main.tf	main.tf
metadata.yaml	metadata.yaml
outputs.tf	outputs.tf
startup_from_network_storage.tf	startup_from_network_storage.tf
threads_per_core_calc.tf	threads_per_core_calc.tf
variables.tf	variables.tf
versions.tf	versions.tf

Description

This module creates one or more compute VM instances.

Example

- id: compute
  source: modules/compute/vm-instance
  use: [network1]
  settings:
    instance_count: 8
    name_prefix: compute
    machine_type: c2-standard-60

This creates a cluster of 8 compute VMs that are:

named compute-[0-7]
on the network defined by the network1 module
of type c2-standard-60

NOTE: Simultaneous Multithreading (SMT) is deactivated by default (threads_per_core=1), which means only the physical cores are visible on the VM. With SMT disabled, a machine of type c2-standard-60 will only have the 30 physical cores visible. To change this, set threads_per_core=2 under settings.

VPC Networks

There are two methods for adding network connectivity to the vm-instance module. The first is shown in the example above, where a vpc module or pre-existing-vpc module is used by the vm-instance module. When this happens, the network_self_link and subnetwork_self_link outputs from the network are provided as input to the vm-instance and a network interface is defined based on that. This can also be done updating the network_self_link and subnetwork_self_link settings directly.

The alternative option can be used when more than one network needs to be added to the vm-instance or further customization is needed beyond what is provided via other variables. For this option, the network_interfaces variable can be used to set up one or more network interfaces on the VM instance. The format is consistent with the terraform google_compute_instance network_interface block, and more information can be found in the terraform docs.

NOTE: When supplying the network_interfaces variable, networks associated with the vm-instance via use will be ignored in favor of the networks added in network_interfaces. In addition, bandwidth_tier and disable_public_ips will not apply to networks defined in network_interfaces.

SSH key metadata

This module will ignore all changes to the ssh-keys metadata field that are typically set by external Google Cloud tools that automate SSH access when not using OS Login. For example, clicking on the Google Cloud Console SSH button next to VMs in the VM Instances list will temporarily modify VM metadata to include a dynamically-generated SSH public key.

Placement

The placement_policy variable can be used to control where your VM instances are physically located relative to each other within a zone. See the official placement guide and api documentation.

Use the following settings for compact placement:

  ...
  settings:
    instance_count: 4
    machine_type: c2-standard-60
    placement_policy:
      collocation: "COLLOCATED"

By default the above placement policy will always result in the most compact set of VMs available. If you would like that provisioning failed if some level of compactness is not obtainable, you can enforce this with the max_distance setting:

  ...
  settings:
    instance_count: 4
    machine_type: c2-standard-60
    placement_policy:
      collocation: "COLLOCATED"
      max_distance: 1

Use the following settings for spread placement:

  ...
  settings:
    instance_count: 4
    machine_type: n2-standard-4
    placement_policy:
      availability_domain_count: 2

When vm_count is not set, as shown in the examples above, then the VMs will be added to the placement policy incrementally. This is the recommended way to use placement policies.

If vm_count is specified then VMs will stay in pending state until the specified number of VMs are created. See the warning below if using this field.

Warning

When creating a compact placement using vm_count with more than 10 VMs, you must add -parallelism=<n> argument on apply. For example if you have 15 VMs in a placement group: terraform apply -parallelism=15. This is because terraform self limits to 10 parallel requests by default but the create instance requests will not succeed until all VMs in the placement group have been requested, forming a deadlock.

GPU Support

More information on GPU support in vm-instance and other Cluster Toolkit modules can be found at docs/gpu-support.md

Lifecycle

The vm-instance module will be replaced when the instance_image variable is changed and terraform apply is run on the deployment group folder or gcluster deploy is run. However, it will not be automatically replaced if a new image is created in a family.

To selectively replace the vm-instance(s), consider running terraform apply -replace such as:

See https://developer.hashicorp.com/terraform/cli/commands/plan#replace-address for precise syntax terraform apply -replace=ADDRESS

terraform state list
# search for the module ID and resource
terraform apply -replace="address"

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name	Version
terraform	>= 1.3.0
google	>= 4.73.0
google-beta	>= 6.13.0
null	>= 3.0

Providers

Name	Version
google	>= 4.73.0
google-beta	>= 6.13.0
null	>= 3.0

Modules

Name	Source	Version
netstorage_startup_script	../../scripts/startup-script	n/a

Resources

Name	Type
google-beta_google_compute_instance.compute_vm	resource
google-beta_google_compute_resource_policy.placement_policy	resource
google_compute_address.compute_ip	resource
google_compute_disk.boot_disk	resource
null_resource.image	resource
null_resource.replace_vm_trigger_from_placement	resource
google_compute_image.compute_image	data source

Inputs

Name	Description	Type	Default	Required
add_deployment_name_before_prefix	If true, the names of VMs and disks will always be prefixed with `deployment_name` to enable uniqueness across deployments. See `name_prefix` for further details on resource naming behavior.	`bool`	`false`	no
allocate_ip	If not null, allocate IPs with the given configuration. See details at https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_address	object({ address_type = optional(string, "INTERNAL") purpose = optional(string), network_tier = optional(string), ip_version = optional(string, "IPV4"), })	`null`	no
allow_automatic_updates	If false, disables automatic system package updates on the created instances. This feature is only available on supported images (or images derived from them). For more details, see https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates	`bool`	`true`	no
auto_delete_boot_disk	Controls if boot disk should be auto-deleted when instance is deleted.	`bool`	`true`	no
automatic_restart	Specifies if the instance should be restarted if it was terminated by Compute Engine (not a user).	`bool`	`null`	no
bandwidth_tier	Tier 1 bandwidth increases the maximum egress bandwidth for VMs. Using the `tier_1_enabled` setting will enable both gVNIC and TIER_1 higher bandwidth networking. Using the `gvnic_enabled` setting will only enable gVNIC and will not enable TIER_1. Note that TIER_1 only works with specific machine families & shapes and must be using an image that supports gVNIC. See official docs for more details.	`string`	`"not_enabled"`	no
deployment_name	Name of the deployment, will optionally be used name resources according to `name_prefix`	`string`	n/a	yes
disable_public_ips	If set to true, instances will not have public IPs	`bool`	`false`	no
disk_size_gb	Size of disk for instances.	`number`	`200`	no
disk_type	Disk type for instances.	`string`	`"pd-standard"`	no
enable_oslogin	Enable or Disable OS Login with "ENABLE" or "DISABLE". Set to "INHERIT" to inherit project OS Login setting.	`string`	`"ENABLE"`	no
guest_accelerator	List of the type and count of accelerator cards attached to the instance.	list(object({ type = string, count = number }))	`[]`	no
instance_count	Number of instances	`number`	`1`	no
instance_image	Instance Image	`map(string)`	{ "family": "hpc-rocky-linux-8", "project": "cloud-hpc-image-public" }	no
labels	Labels to add to the instances. Key-value pairs.	`map(string)`	n/a	yes
local_ssd_count	The number of local SSDs to attach to each VM. See https://cloud.google.com/compute/docs/disks/local-ssd.	`number`	`0`	no
local_ssd_interface	Interface to be used with local SSDs. Can be either 'NVME' or 'SCSI'. No effect unless `local_ssd_count` is also set.	`string`	`"NVME"`	no
machine_type	Machine type to use for the instance creation	`string`	`"c2-standard-60"`	no
metadata	Metadata, provided as a map	`map(string)`	`{}`	no
min_cpu_platform	The name of the minimum CPU platform that you want the instance to use.	`string`	`null`	no
name_prefix	An optional name for all VM and disk resources. If not supplied, `deployment_name` will be used. When `name_prefix` is supplied, and `add_deployment_name_before_prefix` is set, then resources are named by "<`deployment_name`>-<`name_prefix`>-<#>".	`string`	`null`	no
network_interfaces	A list of network interfaces. The options match that of the terraform network_interface block of google_compute_instance. For descriptions of the subfields or more information see the documentation: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#nested_network_interface _NOTE:_ If `network_interfaces` are set, `network_self_link` and `subnetwork_self_link` will be ignored, even if they are provided through the `use` field. `bandwidth_tier` and `disable_public_ips` also do not apply to network interfaces defined in this variable. Subfields: network (string, required if subnetwork is not supplied) subnetwork (string, required if network is not supplied) subnetwork_project (string, optional) network_ip (string, optional) nic_type (string, optional, choose from ["GVNIC", "VIRTIO_NET"]) stack_type (string, optional, choose from ["IPV4_ONLY", "IPV4_IPV6"]) queue_count (number, optional) access_config (object, optional) ipv6_access_config (object, optional) alias_ip_range (list(object), optional)	list(object({ network = string, subnetwork = string, subnetwork_project = string, network_ip = string, nic_type = string, stack_type = string, queue_count = number, access_config = list(object({ nat_ip = string, public_ptr_domain_name = string, network_tier = string })), ipv6_access_config = list(object({ public_ptr_domain_name = string, network_tier = string })), alias_ip_range = list(object({ ip_cidr_range = string, subnetwork_range_name = string })) }))	`[]`	no
network_self_link	The self link of the network to attach the VM. Can use "default" for the default network.	`string`	`null`	no
network_storage	An array of network attached storage mounts to be configured.	list(object({ server_ip = string, remote_mount = string, local_mount = string, fs_type = string, mount_options = string, client_install_runner = map(string) mount_runner = map(string) }))	`[]`	no
on_host_maintenance	Describes maintenance behavior for the instance. If left blank this will default to `MIGRATE` except for when `placement_policy`, spot provisioning, or GPUs require it to be `TERMINATE`	`string`	`null`	no
placement_policy	Control where your VM instances are physically located relative to each other within a zone. See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_resource_policy#nested_group_placement_policy	`any`	`null`	no
project_id	Project in which the HPC deployment will be created	`string`	n/a	yes
region	The region to deploy to	`string`	n/a	yes
reservation_name	Name of the reservation to use for VM resources, should be in one of the following formats: - projects/PROJECT_ID/reservations/RESERVATION_NAME - RESERVATION_NAME Must be a "SPECIFIC_RESERVATION" Set to empty string if using no reservation or automatically-consumed reservations	`string`	`""`	no
service_account	DEPRECATED - Use `service_account_email` and `service_account_scopes` instead.	object({ email = string, scopes = set(string) })	`null`	no
service_account_email	Service account e-mail address to use with the node pool	`string`	`null`	no
service_account_scopes	Scopes to to use with the node pool.	`set(string)`	[ "https://www.googleapis.com/auth/cloud-platform" ]	no
spot	Provision VMs using discounted Spot pricing, allowing for preemption	`bool`	`false`	no
startup_script	Startup script used on the instance	`string`	`null`	no
subnetwork_self_link	The self link of the subnetwork to attach the VM.	`string`	`null`	no
tags	Network tags, provided as a list	`list(string)`	`[]`	no
threads_per_core	Sets the number of threads per physical core. By setting threads_per_core to 2, Simultaneous Multithreading (SMT) is enabled extending the total number of virtual cores. For example, a machine of type c2-standard-60 will have 60 virtual cores with threads_per_core equal to 2. With threads_per_core equal to 1 (SMT turned off), only the 30 physical cores will be available on the VM. The default value of "0" will turn off SMT for supported machine types, and will fall back to GCE defaults for unsupported machine types (t2d, shared-core instances, or instances with less than 2 vCPU). Disabling SMT can be more performant in many HPC workloads, therefore it is disabled by default where compatible. null = SMT configuration will use the GCE defaults for the machine type 0 = SMT will be disabled where compatible (default) 1 = SMT will always be disabled (will fail on incompatible machine types) 2 = SMT will always be enabled (will fail on incompatible machine types)	`number`	`0`	no
zone	Compute Platform zone	`string`	n/a	yes

Outputs

Name	Description
external_ip	External IP of the instances (if enabled)
instructions	Instructions on how to SSH into the created VM. Commands may fail depending on VM configuration and IAM permissions.
internal_ip	Internal IP of the instances
name	Names of instances created
self_link	The tuple URIs of the created instances

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vm-instance

vm-instance

README.md

Description

Example

VPC Networks

SSH key metadata

Placement

GPU Support

Lifecycle

License

Requirements

Providers

Modules

Resources

Inputs

Outputs

Files

vm-instance

Directory actions

More options

Directory actions

More options

Latest commit

History

vm-instance

Folders and files

parent directory

README.md

Description

Example

VPC Networks

SSH key metadata

Placement

GPU Support

Lifecycle

License

Requirements

Providers

Modules

Resources

Inputs

Outputs