Skip to content

Commit

Permalink
gke-cluster-autopilot: add monitoring configuration (#1646)
Browse files Browse the repository at this point in the history
* gke-cluster-autopilot: add monitoring configuration block (monitoring_config)
  • Loading branch information
olliefr authored and simonebruzzechesse committed Sep 5, 2023
1 parent 42ebbcc commit e63167f
Show file tree
Hide file tree
Showing 4 changed files with 94 additions and 9 deletions.
46 changes: 38 additions & 8 deletions modules/gke-cluster-autopilot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,35 @@ module "cluster-1" {
# tftest modules=1 resources=1 inventory=logging-config.yaml
```

### Monitoring configuration

This example shows how to [configure collection of Kubernetes control plane metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-control-plane-metrics). The metrics for these components are not collected by default.

> **Note**
> System metrics collection is pre-configured for Autopilot clusters and cannot be disabled.
> **Warning**
> GKE **workload metrics** is deprecated and removed in GKE 1.24 and later. Workload metrics is replaced by [Google Cloud Managed Service for Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus), which is Google's recommended way to monitor Kubernetes applications by using Cloud Monitoring.
```hcl
module "cluster-1" {
source = "./fabric/modules/gke-cluster-autopilot"
project_id = var.project_id
name = "cluster-1"
location = "europe-west1"
vpc_config = {
network = var.vpc.self_link
subnetwork = var.subnet.self_link
}
monitoring_config = {
enable_api_server_metrics = true
enable_controller_manager_metrics = true
enable_scheduler_metrics = true
}
}
# tftest modules=1 resources=1 inventory=monitoring-config-control-plane.yaml
```

### Backup for GKE

This example shows how to [enable the Backup for GKE agent and configure a Backup Plan](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/concepts/backup-for-gke) for GKE Standard clusters.
Expand Down Expand Up @@ -120,9 +149,9 @@ module "cluster-1" {
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [location](variables.tf#L110) | Autopilot cluster are always regional. | <code>string</code> || |
| [name](variables.tf#L155) | Cluster name. | <code>string</code> || |
| [project_id](variables.tf#L181) | Cluster project id. | <code>string</code> || |
| [vpc_config](variables.tf#L209) | VPC-level configuration. | <code title="object&#40;&#123;&#10; network &#61; string&#10; subnetwork &#61; string&#10; master_ipv4_cidr_block &#61; optional&#40;string&#41;&#10; secondary_range_blocks &#61; optional&#40;object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;&#41;&#10; secondary_range_names &#61; optional&#40;object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;, &#123; pods &#61; &#34;pods&#34;, services &#61; &#34;services&#34; &#125;&#41;&#10; master_authorized_ranges &#61; optional&#40;map&#40;string&#41;&#41;&#10; stack_type &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> || |
| [name](variables.tf#L170) | Cluster name. | <code>string</code> || |
| [project_id](variables.tf#L196) | Cluster project id. | <code>string</code> || |
| [vpc_config](variables.tf#L224) | VPC-level configuration. | <code title="object&#40;&#123;&#10; network &#61; string&#10; subnetwork &#61; string&#10; master_ipv4_cidr_block &#61; optional&#40;string&#41;&#10; secondary_range_blocks &#61; optional&#40;object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;&#41;&#10; secondary_range_names &#61; optional&#40;object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;, &#123; pods &#61; &#34;pods&#34;, services &#61; &#34;services&#34; &#125;&#41;&#10; master_authorized_ranges &#61; optional&#40;map&#40;string&#41;&#41;&#10; stack_type &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> || |
| [backup_configs](variables.tf#L17) | Configuration for Backup for GKE. | <code title="object&#40;&#123;&#10; enable_backup_agent &#61; optional&#40;bool, false&#41;&#10; backup_plans &#61; optional&#40;map&#40;object&#40;&#123;&#10; encryption_key &#61; optional&#40;string&#41;&#10; include_secrets &#61; optional&#40;bool, true&#41;&#10; include_volume_data &#61; optional&#40;bool, true&#41;&#10; namespaces &#61; optional&#40;list&#40;string&#41;&#41;&#10; region &#61; string&#10; schedule &#61; string&#10; retention_policy_days &#61; optional&#40;string&#41;&#10; retention_policy_lock &#61; optional&#40;bool, false&#41;&#10; retention_policy_delete_lock_days &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;, &#123;&#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [description](variables.tf#L37) | Cluster description. | <code>string</code> | | <code>null</code> |
| [enable_addons](variables.tf#L43) | Addons enabled in the cluster (true means enabled). | <code title="object&#40;&#123;&#10; cloudrun &#61; optional&#40;bool, false&#41;&#10; config_connector &#61; optional&#40;bool, false&#41;&#10; dns_cache &#61; optional&#40;bool, false&#41;&#10; horizontal_pod_autoscaling &#61; optional&#40;bool, false&#41;&#10; http_load_balancing &#61; optional&#40;bool, false&#41;&#10; istio &#61; optional&#40;object&#40;&#123;&#10; enable_tls &#61; bool&#10; &#125;&#41;&#41;&#10; kalm &#61; optional&#40;bool, false&#41;&#10; network_policy &#61; optional&#40;bool, false&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; horizontal_pod_autoscaling &#61; true&#10; http_load_balancing &#61; true&#10;&#125;">&#123;&#8230;&#125;</code> |
Expand All @@ -132,11 +161,12 @@ module "cluster-1" {
| [logging_config](variables.tf#L115) | Logging configuration. | <code title="object&#40;&#123;&#10; enable_api_server_logs &#61; optional&#40;bool, false&#41;&#10; enable_scheduler_logs &#61; optional&#40;bool, false&#41;&#10; enable_controller_manager_logs &#61; optional&#40;bool, false&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [maintenance_config](variables.tf#L126) | Maintenance window configuration. | <code title="object&#40;&#123;&#10; daily_window_start_time &#61; optional&#40;string&#41;&#10; recurring_window &#61; optional&#40;object&#40;&#123;&#10; start_time &#61; string&#10; end_time &#61; string&#10; recurrence &#61; string&#10; &#125;&#41;&#41;&#10; maintenance_exclusions &#61; optional&#40;list&#40;object&#40;&#123;&#10; name &#61; string&#10; start_time &#61; string&#10; end_time &#61; string&#10; scope &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; daily_window_start_time &#61; &#34;03:00&#34;&#10; recurring_window &#61; null&#10; maintenance_exclusion &#61; &#91;&#93;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [min_master_version](variables.tf#L149) | Minimum version of the master, defaults to the version of the most recent official release. | <code>string</code> | | <code>null</code> |
| [node_locations](variables.tf#L160) | Zones in which the cluster's nodes are located. | <code>list&#40;string&#41;</code> | | <code>&#91;&#93;</code> |
| [private_cluster_config](variables.tf#L167) | Private cluster configuration. | <code title="object&#40;&#123;&#10; enable_private_endpoint &#61; optional&#40;bool&#41;&#10; master_global_access &#61; optional&#40;bool&#41;&#10; peering_config &#61; optional&#40;object&#40;&#123;&#10; export_routes &#61; optional&#40;bool&#41;&#10; import_routes &#61; optional&#40;bool&#41;&#10; project_id &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [release_channel](variables.tf#L186) | Release channel for GKE upgrades. Clusters created in the Autopilot mode must use a release channel. Choose between \"RAPID\", \"REGULAR\", and \"STABLE\". | <code>string</code> | | <code>&#34;REGULAR&#34;</code> |
| [service_account](variables.tf#L197) | The Google Cloud Platform Service Account to be used by the node VMs created by GKE Autopilot. | <code>string</code> | | <code>null</code> |
| [tags](variables.tf#L203) | Network tags applied to nodes. | <code>list&#40;string&#41;</code> | | <code>null</code> |
| [monitoring_config](variables.tf#L155) | Monitoring configuration. System metrics collection cannot be disabled for Autopilot clusters. Control plane metrics are optional. Google Cloud Managed Service for Prometheus is enabled by default. | <code title="object&#40;&#123;&#10; enable_api_server_metrics &#61; optional&#40;bool, false&#41;&#10; enable_controller_manager_metrics &#61; optional&#40;bool, false&#41;&#10; enable_scheduler_metrics &#61; optional&#40;bool, false&#41;&#10; enable_managed_prometheus &#61; optional&#40;bool, true&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [node_locations](variables.tf#L175) | Zones in which the cluster's nodes are located. | <code>list&#40;string&#41;</code> | | <code>&#91;&#93;</code> |
| [private_cluster_config](variables.tf#L182) | Private cluster configuration. | <code title="object&#40;&#123;&#10; enable_private_endpoint &#61; optional&#40;bool&#41;&#10; master_global_access &#61; optional&#40;bool&#41;&#10; peering_config &#61; optional&#40;object&#40;&#123;&#10; export_routes &#61; optional&#40;bool&#41;&#10; import_routes &#61; optional&#40;bool&#41;&#10; project_id &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [release_channel](variables.tf#L201) | Release channel for GKE upgrades. Clusters created in the Autopilot mode must use a release channel. Choose between \"RAPID\", \"REGULAR\", and \"STABLE\". | <code>string</code> | | <code>&#34;REGULAR&#34;</code> |
| [service_account](variables.tf#L212) | The Google Cloud Platform Service Account to be used by the node VMs created by GKE Autopilot. | <code>string</code> | | <code>null</code> |
| [tags](variables.tf#L218) | Network tags applied to nodes. | <code>list&#40;string&#41;</code> | | <code>null</code> |

## Outputs

Expand Down
15 changes: 14 additions & 1 deletion modules/gke-cluster-autopilot/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,20 @@ resource "google_container_cluster" "cluster" {
}
}

monitoring_config {
enable_components = toset(compact([
# System metrics collection cannot be disabled for Autopilot clusters.
"SYSTEM_COMPONENTS",
# Control plane metrics.
var.monitoring_config.enable_api_server_metrics ? "APISERVER" : null,
var.monitoring_config.enable_controller_manager_metrics ? "CONTROLLER_MANAGER" : null,
var.monitoring_config.enable_scheduler_metrics ? "SCHEDULER" : null,
]))
managed_prometheus {
enabled = var.monitoring_config.enable_managed_prometheus
}
}

dynamic "notification_config" {
for_each = var.enable_features.upgrade_notifications != null ? [""] : []
content {
Expand Down Expand Up @@ -305,7 +319,6 @@ resource "google_gke_backup_backup_plan" "backup_plan" {
}
}


resource "google_compute_network_peering_routes_config" "gke_master" {
count = (
try(var.private_cluster_config.peering_config, null) != null ? 1 : 0
Expand Down
15 changes: 15 additions & 0 deletions modules/gke-cluster-autopilot/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,21 @@ variable "min_master_version" {
default = null
}

variable "monitoring_config" {
description = "Monitoring configuration. System metrics collection cannot be disabled for Autopilot clusters. Control plane metrics are optional. Google Cloud Managed Service for Prometheus is enabled by default."
type = object({
# Control plane metrics
enable_api_server_metrics = optional(bool, false)
enable_controller_manager_metrics = optional(bool, false)
enable_scheduler_metrics = optional(bool, false)
# Google Cloud Managed Service for Prometheus
# GKE Autopilot clusters running GKE version 1.25 or greater must have this on.
enable_managed_prometheus = optional(bool, true)
})
default = {}
nullable = false
}

variable "name" {
description = "Cluster name."
type = string
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

values:
module.cluster-1.google_container_cluster.cluster:
monitoring_config:
- enable_components:
- APISERVER
- CONTROLLER_MANAGER
- SCHEDULER
- SYSTEM_COMPONENTS
managed_prometheus:
- enabled: true

counts:
google_container_cluster: 1

0 comments on commit e63167f

Please sign in to comment.