Halo Metrics Deployment on GKE

Getting visibility into CMMS metrics using Google Cloud Monitoring.

Background

We can use Google Managed Prometheus (GMP) on GKE clusters to get metrics into Google Cloud Monitoring. Using OpenTelemetry we can also collect more detailed metrics from CMMS component pods.

The configuration for the dev environment can be used as the basis for deploying CMMS components using Google Kubernetes Engine (GKE) on another Google Cloud project.

Many operations can be done either via the gcloud CLI or the Google Cloud web console. This guide picks whichever is most convenient for that operation. Feel free to use whichever you prefer.

What are we creating/deploying?

OpenTelemetryCollector
- default
OpenTelemetry Instrumentation
- open-telemetry-java-agent
GMP ClusterPodMonitoring
- opentelemetry-collector-pod-monitor
GMP PodMonitoring
- collector-pod-monitor
NetworkPolicy
- opentelemetry-collector-network-policy

Before you start

Deploy a Halo component. See the related guides: Create Kingdom Cluster, Create Duchy Cluster, or Create Reporting Cluster.

Enable Managed Service for Prometheus on the Cluster

This can be done via the Google Cloud Console under "Features", or using the gcloud CLI. For example, assuming a cluster named "kingdom":

gcloud container clusters update kingdom --enable-managed-prometheus

Service Accounts

Make sure that the least-privilege service account you created for the cluster has permissions to access the Cloud Monitoring API. See Cluster Configuration.

Create the K8s Object Configurations

Deploying to the cluster is generally done by applying a K8s object configuration file. You can use the dev configurations as a base to get started. The dev configurations are YAML files that are generated from files written in CUE using Bazel rules.

You can customize the generated object configuration as-needed.

OpenTelemetry Collectors and Instrumentation

The default dev configuration for OpenTelemetry collection is in open_telemetry_gke.cue, which depends on open_telemetry.cue.

The default build target is //src/main/k8s/dev:open_telemetry_gke.

Prometheus Monitoring

The dev configuration is in prometheus_gke.cue. The build target is //src/main/k8s/dev:prometheus_gke.

Apply the K8s Object Configurations

Install cert-manager

You must use a cert-manager, OpenTelemetry Operator, and collector image that are compatible with each other. See the Compatibility matrix and the collector image specified in open_telemetry.cue.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.2/cert-manager.yaml

Install OpenTelemetry Operator

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/download/v0.77.0/opentelemetry-operator.yaml

Apply OpenTelemetry and Prometheus Configurations

You can just use kubectl apply, specifying the configuration files you created in the previous step.

Restart Deployments to Start Collecting Metrics

You will need to restart all the Deployments to pick up the Java agent instrumentation.

for deployment in $(kubectl get deployments -o name); do kubectl rollout restart $deployment; done

Verify Managed Prometheus can Scrape Metrics

Visit the Managed Prometheus page in Cloud Console. Query up and scrape_samples_scraped.

The first one tells you which targets it can find and whether they are up, and the latter is a good way to check that scraping is occurring. If it hasn't been long enough, the latter might show all 0's, but after a couple of minutes you should be seeing results for every target that is up.

Adding Additional Metrics

The above adds OpenTelemetry JVM and RPC metrics. With the above as a base, it is possible to add other metrics that can be scraped.

kubelet and cAdvisor

See kubelet

List of OpenTelemetry Metrics on Prometheus Dashboard

OpenTelemetry Auto Instrumented RPC and JVM Metrics

rpc_client_duration_bucket
rpc_client_duration_count
rpc_client_duration_sum
rpc_server_duration_bucket
rpc_server_duration_count
rpc_server_duration_sum
process_runtime_jvm_buffer_count
process_runtime_jvm_buffer_limit
process_runtime_jvm_buffer_usage
process_runtime_jvm_classes_current_loaded
process_runtime_jvm_classes_loaded
process_runtime_jvm_classes_unloaded
process_runtime_jvm_cpu_utilization
process_runtime_jvm_memory_committed
process_runtime_jvm_memory_init
process_runtime_jvm_memory_limit
process_runtime_jvm_memory_usage
process_runtime_jvm_system_cpu_load_1m
process_runtime_jvm_system_cpu_utilization
process_runtime_jvm_threads_count

Mill Metrics

active_non_daemon_thread_count
jni_wall_clock_duration_millis
stage_wall_clock_duration_millis
stage_cpu_time_duration_millis
initialization_phase_crypto_cpu_time_duration_millis
setup_phase_crypto_cpu_time_duration_millis
execution_phase_one_crypto_cpu_time_duration_millis
execution_phase_two_crypto_cpu_time_duration_millis
execution_phase_three_crypto_cpu_time_duration_millis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics-deployment.md

metrics-deployment.md

Halo Metrics Deployment on GKE

Background

What are we creating/deploying?

Before you start

Enable Managed Service for Prometheus on the Cluster

Service Accounts

Create the K8s Object Configurations

OpenTelemetry Collectors and Instrumentation

Prometheus Monitoring

Apply the K8s Object Configurations

Install cert-manager

Install OpenTelemetry Operator

Apply OpenTelemetry and Prometheus Configurations

Restart Deployments to Start Collecting Metrics

Verify Managed Prometheus can Scrape Metrics

Adding Additional Metrics

kubelet and cAdvisor

List of OpenTelemetry Metrics on Prometheus Dashboard

OpenTelemetry Auto Instrumented RPC and JVM Metrics

Mill Metrics

Files

metrics-deployment.md

Latest commit

History

metrics-deployment.md

File metadata and controls

Halo Metrics Deployment on GKE

Background

What are we creating/deploying?

Before you start

Enable Managed Service for Prometheus on the Cluster

Service Accounts

Create the K8s Object Configurations

OpenTelemetry Collectors and Instrumentation

Prometheus Monitoring

Apply the K8s Object Configurations

Install cert-manager

Install OpenTelemetry Operator

Apply OpenTelemetry and Prometheus Configurations

Restart Deployments to Start Collecting Metrics

Verify Managed Prometheus can Scrape Metrics

Adding Additional Metrics

kubelet and cAdvisor

List of OpenTelemetry Metrics on Prometheus Dashboard

OpenTelemetry Auto Instrumented RPC and JVM Metrics

Mill Metrics