Skip to content

Latest commit

 

History

History
113 lines (88 loc) · 7.28 KB

README.md

File metadata and controls

113 lines (88 loc) · 7.28 KB

Setup Infra

Platform module (to be renamed to Infra), creates the GKE cluster & other related resources for the AI applications / workloads to be deployed on them.

Update the platform.tfvars file with the required configuration. Kindly refer to tfvars_examples for sample configuration.

Prerequisites

For the GCP project where the infra resources are being created, the following prerequisites should be met

  • Billing is enabled
  • GPU quotas in place

Following service APIs are enabled,

  • container.googleapis.com
  • gkehub.googleapis.com

if not already enabled, use the following command:

gcloud services enable container.googleapis.com gkehub.googleapis.com

Network Connectivity

Private GKE Cluster with internal endpoint

Default config in platform.tfvars creates a private GKE cluster with internal endpoints & cluster is added to project-scoped Anthos fleet. For admin access to cluster, Anthos Connect Gateway is used.

Private GKE Cluster with external endpoint

Clusters with external endpoints can be accessed by configuing Autorized Networks. VPC network (10.100.0.0/16) is already configured for control plane authorized networks.

GPU Drivers

Lorum Ipsum

Outputs

  • cluster-name
  • region
  • project_id

Requirements

Name Version
helm ~> 2.8.0
kubernetes 2.18.1

Providers

No providers.

Modules

Name Source Version
cloud-nat terraform-google-modules/cloud-nat/google 5.0.0
custom-network terraform-google-modules/network/google 8.0.0
private-gke-autopilot-cluster ../modules/gke-autopilot-private-cluster n/a
private-gke-standard-cluster ../modules/gke-standard-private-cluster n/a
public-gke-autopilot-cluster ../modules/gke-autopilot-public-cluster n/a
public-gke-standard-cluster ../modules/gke-standard-public-cluster n/a

Resources

No resources.

Inputs

Name Description Type Default Required
all_node_pools_labels n/a map(string) n/a yes
all_node_pools_metadata n/a map(string) n/a yes
all_node_pools_oauth_scopes n/a list(string) n/a yes
all_node_pools_tags n/a list(string) n/a yes
autopilot_cluster n/a bool n/a yes
cluster_labels GKE cluster labels map n/a yes
cluster_name n/a string n/a yes
cluster_region n/a string n/a yes
cluster_regional n/a bool n/a yes
cluster_zones n/a list(string) n/a yes
cpu_pools n/a list(map(any)) n/a yes
create_cluster # GKE variables bool n/a yes
create_network # network variables bool n/a yes
deletion_protection n/a bool false no
enable_gpu Set to true to create TPU node pool bool true no
enable_tpu Set to true to create TPU node pool bool false no
gcs_fuse_csi_driver n/a bool false no
gpu_pools n/a list(map(any)) n/a yes
ip_range_pods n/a string n/a yes
ip_range_services n/a string n/a yes
kubernetes_version n/a string "latest" no
master_authorized_networks n/a
list(object({
cidr_block = string
display_name = optional(string)
}))
[] no
monitoring_enable_managed_prometheus n/a bool false no
network_name n/a string n/a yes
network_secondary_ranges n/a map(list(object({ range_name = string, ip_cidr_range = string }))) n/a yes
private_cluster n/a bool true no
project_id GCP project id string "umeshkumhar" no
region GCP project region or zone string "us-central1" no
subnetwork_cidr n/a string n/a yes
subnetwork_description n/a string n/a yes
subnetwork_name n/a string n/a yes
subnetwork_private_access n/a string n/a yes
subnetwork_region n/a string n/a yes
tpu_pools n/a list(map(any)) n/a yes

Outputs

Name Description
cluster_name n/a
cluster_region n/a
project_id n/a