Skip to content

Commit

Permalink
Update NCCL link and rename a3-mega GKE in terraform module (#370)
Browse files Browse the repository at this point in the history
  • Loading branch information
samcmho authored May 2, 2024
1 parent 48efec4 commit b80910d
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ the same as any other terraform:
# assuming the directory containing main.tf is the current working directory

# create/update the cluster
terraform init && terraform validate && terraform apply
terraform init && terraform validate && terraform apply -var-file="terraform.tfvars"

# destroy the cluster
terraform init && terraform validate && terraform apply -destroy
Expand Down
4 changes: 3 additions & 1 deletion a3-mega/examples/gke/main.tf
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
variable "node_pools" {}
variable "project_id" {}
variable "resource_prefix" {}
variable "region" {}

module "a3-gke" {
module "a3-mega-gke" {
source = "github.com/GoogleCloudPlatform/ai-infra-cluster-provisioning//a3-mega/terraform/modules/cluster/gke"

node_pools = var.node_pools
project_id = var.project_id
resource_prefix = var.resource_prefix
region = var.region
}
2 changes: 1 addition & 1 deletion a3-mega/terraform/modules/cluster/gke/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ module "kubectl-apply" {
daemonsets = {
device_plugin = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/cmd/nvidia_gpu/device-plugin.yaml"
nvidia_driver = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml"
nccl_plugin = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/gpudirect-fastrak/nccl-fastrak-installer.yaml" # TODO dead link
nccl_plugin = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/gpudirect-tcpxo/nccl-tcpxo-installer.yaml"
}
enable = var.ksa != null
ksa = var.ksa
Expand Down

0 comments on commit b80910d

Please sign in to comment.