Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Custom DC] Various setup improvements #2349

Merged
merged 5 commits into from
Mar 2, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions deploy/helm_charts/dc_website/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ spec:
name: schema-mapping{{ .Values.resourceSuffix }}
containers:
- name: website
image: "{{ .Values.website.image.repository }}:{{ .Values.website.image.tag }}"
image: "{{ .Values.website.image.registry }}/{{ .Values.website.image.project }}/{{ .Values.website.image.repository }}:{{ .Values.website.image.tag }}"
imagePullPolicy: {{ .Values.website.image.pullPolicy }}
args: []
ports:
Expand Down Expand Up @@ -117,7 +117,7 @@ spec:
- name: BOUNCE
value: "dummy"
- name: mixer
image: "{{ .Values.mixer.image.repository }}:{{ .Values.mixer.image.tag | default .Chart.AppVersion }}"
image: "{{ .Values.mixer.image.registry }}/{{ .Values.mixer.image.project }}/{{ .Values.mixer.image.repository }}:{{ .Values.mixer.image.tag }}"
imagePullPolicy: {{ .Values.mixer.image.pullPolicy }}
resources:
limits:
Expand Down
5 changes: 2 additions & 3 deletions deploy/helm_charts/dc_website/templates/ingress.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,9 @@ kind: Ingress
metadata:
name: {{ .Values.ingress.name }}
namespace: {{ .Values.namespace.name }}
{{- with .Values.ingress.annotations }}
annotations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good, would be nice to have more static resources like this (so it's easier to know what it is)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will follow this style from now on!

{{- toYaml . | nindent 4 }}
{{- end }}
kubernetes.io/ingress.global-static-ip-name: dc-website-ip
ingress.gcp.kubernetes.io/pre-shared-cert: dc-website-cert{{ .Values.resourceSuffix }}
spec:
rules:
- http:
Expand Down
14 changes: 9 additions & 5 deletions deploy/helm_charts/dc_website/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

resourceSuffix:
resourceSuffix: ""

# Website service config.
website:
Expand All @@ -13,11 +13,13 @@ website:
githash:

image:
repository: gcr.io/datcom-ci/datacommons-website
registry: "gcr.io"
project: "datcom-ci"
repository: "datacommons-website"
pullPolicy: Always
tag:

flaskEnv:
flaskEnv: "custom"
secretGCPProjectID:
enableModel: false

Expand Down Expand Up @@ -45,7 +47,6 @@ serviceAccount:
ingress:
name: website-ingress
enabled:
annotations: { kubernetes.io/ingress.global-static-ip-name: mixer-ip }

###############################################################################
# Config for Mixer helm chart
Expand All @@ -55,7 +56,10 @@ mixer:
githash:

image:
repository: gcr.io/datcom-ci/datacommons-mixer
registry: "gcr.io"
Fructokinase marked this conversation as resolved.
Show resolved Hide resolved
project: "datcom-ci"
repository: "datacommons-mixer"

pullPolicy: Always
tag:

Expand Down

This file was deleted.

49 changes: 2 additions & 47 deletions deploy/terraform-datacommons-website/examples/website_v1/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ module "apikeys" {
source = "../../modules/apikeys"
project_id = var.project_id
dc_website_domain = var.dc_website_domain
location = var.region
location = var.location

resource_suffix = local.resource_suffix
}
Expand All @@ -78,7 +78,7 @@ module "esp" {
module "cluster" {
source = "../../modules/gke"
project_id = var.project_id
region = var.region
location = var.location
cluster_name_prefix = var.cluster_name_prefix
web_robot_sa_email = local.web_robot_sa_email

Expand All @@ -99,48 +99,3 @@ resource "google_compute_managed_ssl_certificate" "dc_website_cert" {
domains = [format("%s.", var.dc_website_domain)]
}
}

# IMPORTANT NOTE: This script assumes that
# "~/.kube/config" already exists. This is because provider cannot depend on data or resources,
# as provider blocks need to be determined before resources/data states are fetched.
# In install_custom_dc.sh, currentlythe kubeconfig is fetched before calling terraform apply.
# .kube/config is the location where gcloud command for GKE stores cluster config, which
# is required to access the cluster, including using helm.
provider "kubernetes" {
alias = "datcom"
kubernetes {
config_path = "~/.kube/config"
}
}

provider "helm" {
alias = "datcom"
kubernetes {
config_path = "~/.kube/config"
}
}

module "k8s_resources" {
providers = {
kubernetes = kubernetes.datcom
helm = helm.datcom
}

resource_suffix = local.resource_suffix
website_githash = var.website_githash
mixer_githash = var.mixer_githash

source = "../../modules/helm"
project_id = var.project_id

cluster_name = module.cluster.name
cluster_region = var.region
dc_website_domain = var.dc_website_domain
global_static_ip_name = format("dc-website-ip%s", local.resource_suffix)
managed_cert_name = google_compute_managed_ssl_certificate.dc_website_cert.name

depends_on = [
google_compute_managed_ssl_certificate.dc_website_cert,
module.cluster
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ output "cluster_name" {
value = module.cluster.name
}

output "cluster_region" {
value = module.cluster.region
output "cluster_location" {
value = module.cluster.location
}
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,21 @@ variable "cluster_name_prefix" {
default = "datacommons"
}

variable "region" {
variable "location" {
type = string
description = "GCP region where the cluster will be created in."
default = "us-central1"
description = <<EOF
Location of the GCP resources to be created.

Can be regional, like "us-central1". Or zonal like "us-central1-a"

Major difference between regional and zonal is that for GKE cluster, regional
clusters will have nodes in each zone of that region, giving higher availability,
but is more expensive.

For regional only resources, if zonal location is specified, the region
will be parsed from the zone.
EOF
default = "us-central1-a"
}

variable "global_static_ip_name" {
Expand Down
3 changes: 2 additions & 1 deletion deploy/terraform-datacommons-website/modules/apikeys/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ resource "google_secret_manager_secret" "maps_api_key_secret" {
replication {
user_managed {
replicas {
location = var.location
# location needs to be a region here, so if var.location is a zone, make it a region.
location = length(split(var.location, "-")) == 2 ? var.location : join("-", slice(split( "-", var.location), 0, 2))
shifucun marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,13 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
if [[ $LOCATION =~ ^[a-z]+-[a-z0-9]+$ ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a quick comment on what the regex is looking for

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks!

REGION=$LOCATION
else
ZONE=$LOCATION
fi
gcloud container clusters get-credentials $CLUSTER_NAME \
--region $REGION --project=$PROJECT_ID
${REGION:+--region $REGION} ${ZONE:+--zone $ZONE} --project=$PROJECT_ID

# Create namespace if it does not exist.
kubectl create namespace website \
Expand Down
24 changes: 0 additions & 24 deletions deploy/terraform-datacommons-website/modules/gke/create_cluster.sh

This file was deleted.

52 changes: 38 additions & 14 deletions deploy/terraform-datacommons-website/modules/gke/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,24 +13,47 @@
# limitations under the License.

locals {
# Example cluster name: datacommons-us-central1
cluster_name = format("%s-%s%s",var.cluster_name_prefix,var.region, var.resource_suffix)
# Example cluster name: datacommons-us-central1 or datacommons-us-central1-a
cluster_name = format("%s-%s%s",var.cluster_name_prefix,var.location, var.resource_suffix)
}

resource "null_resource" "gke_cluster" {
provisioner "local-exec" {
command = "sh create_cluster.sh"
working_dir = path.module
resource "google_container_cluster" "primary" {
name = local.cluster_name
location = var.location

environment = {
PROJECT_ID = var.project_id
CLUSTER_NAME = local.cluster_name
NODES = var.num_nodes
REGION = var.region
}
# We can't create a cluster with no node pool defined, but we want to only use
# separately managed node pools. So we create the smallest possible default
# node pool and immediately delete it.
remove_default_node_pool = true
initial_node_count = 1
networking_mode = "VPC_NATIVE"

workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}

ip_allocation_policy {
cluster_ipv4_cidr_block = "/14"
}
}

resource "google_container_node_pool" "gke_node_pools" {
name = local.cluster_name
location = var.location
cluster = google_container_cluster.primary.name
node_count = var.num_nodes

node_config {
machine_type = "e2-highmem-4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if we can use a weaker machine_type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our kustomize template, I'm changing the memory from website container 8G -> 3G, ESP 2G -> 1G. Which will be something like 12.75G occupied of 15.X G available in a e2-standard-4.

However, the cost saving is only $30 or so (from ~$132 to ~$98). If there's anything that we shouldn't cheap out on imo it's the machine type, because k8s under pressure will give all sorts of weird errors. We may also need more resources for new services. I think keeping this machine type is appropriate but will leave the final desicion to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then sounds good to keep this. In that case, can revert the mem change above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the mem changes for now to conserve resources for other potential pods (like argo). Please see the response to Julia's comment.

oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
}

depends_on = [
google_container_cluster.primary
]
}

resource "null_resource" "gke_cluster_configuration" {
provisioner "local-exec" {
Expand All @@ -40,12 +63,13 @@ resource "null_resource" "gke_cluster_configuration" {
environment = {
PROJECT_ID = var.project_id
CLUSTER_NAME = local.cluster_name
REGION = var.region
LOCATION = var.location
WEB_ROBOT_SA_EMAIL = var.web_robot_sa_email
}
}

depends_on = [
null_resource.gke_cluster
google_container_cluster.primary,
google_container_node_pool.gke_node_pools
]
}
4 changes: 2 additions & 2 deletions deploy/terraform-datacommons-website/modules/gke/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ output "name" {
value = local.cluster_name
}

output "region" {
value = var.region
output "location" {
value = var.location
}
Loading