From 683dedc94d1a3ac5aa3508dfd0c38a504a390c3f Mon Sep 17 00:00:00 2001 From: Yiqi Gao Date: Tue, 18 Feb 2020 15:22:28 -0500 Subject: [PATCH] Add user guide on viewing metrics in Stackdriver --- incubator/hnc/README.md | 14 +++ incubator/hnc/doc/metrics/stackdriver-gke.md | 95 ++++++++++++++++++++ 2 files changed, 109 insertions(+) create mode 100644 incubator/hnc/doc/metrics/stackdriver-gke.md diff --git a/incubator/hnc/README.md b/incubator/hnc/README.md index 445fabc94..edd7d9462 100644 --- a/incubator/hnc/README.md +++ b/incubator/hnc/README.md @@ -58,6 +58,20 @@ scripts](https://docs.google.com/document/d/1tKQgtMSf0wfT3NOGQx9ExUQ-B8UkkdVZB6m to get an idea of what HNC can do. For a more in-depth understanding, check out the [HNC Concepts doc](http://bit.ly/38YYhE0). +### Viewing metrics +You should be able to view all HNC metrics in your preferred backend: +* [Stackdriver on GKE](doc/metrics/stackdriver-gke.md) +* Prometheus (see [#433](https://github.com/kubernetes-sigs/multi-tenancy/issues/433)) + +|Metric |Description | +|:-------------------------------------------------- |:-------------| +| hnc/reconcilers/hierconfig/total | The total number of HierarchyConfiguration (HC) reconciliations happened | +| hnc/reconcilers/hierconfig/concurrent_peak | The peak concurrent HC reconciliations happened in the past 60s, which is also the minimum Stackdriver reporting period and the one we're using | +| hnc/reconcilers/hierconfig/hierconfig_writes_total | The number of HC writes happened during HC reconciliations | +| hnc/reconcilers/hierconfig/namespace_writes_total | The number of namespace writes happened during HC reconciliations | +| hnc/reconcilers/object/total | The total number of object reconciliations happened | +| hnc/reconcilers/object/concurrent_peak | The peak concurrent object reconciliations happened in the past 60s, which is also the minimum Stackdriver reporting period and the one we're using | + ### Uninstalling HNC **WARNING:** this will also delete all the hierarchical relationships between your namespaces. Reinstalling HNC will _not_ recreate these relationships. There diff --git a/incubator/hnc/doc/metrics/stackdriver-gke.md b/incubator/hnc/doc/metrics/stackdriver-gke.md new file mode 100644 index 000000000..ad9ffdb29 --- /dev/null +++ b/incubator/hnc/doc/metrics/stackdriver-gke.md @@ -0,0 +1,95 @@ +# Stackdriver on GKE + +To view HNC Metrics in Stackdriver, you will need a GKE cluster with HNC installed +and a method to access Cloud APIs, specifically Stackdriver monitoring APIs, from GKE. +We will introduce two methods and their pros and cons: +* (Recommended) Use the [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) +, which has improved security properties and manageability. Please note that it’s +in a pre-release state (Beta) and might change. +* Use the [Compute Engine default service account](https://cloud.google.com/compute/docs/access/service-accounts#default_service_account) +on your GCE nodes, which is easy to set up but can result in over-provisioning of permissions. + +Once it's set up, you can view the metrics in Stackdriver [Metrics Explorer](https://cloud.google.com/monitoring/charts/metrics-explorer) +by searching the metrics keywords. + +## Option 1: Use Workload Identity (Recommended) +These are the steps to set up HNC in GKE using Workload Identity, which are further described below: +1. Ensure your GKE clusters have Workload Identity enabled +2. Install HNC +3. Create a suitable GSA and map it to the KSA + +### Ensure your GKE clusters have Workload Identity enabled +This section introduces how to enable [Workload Identity (WI)](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) either on a new cluster or on an existing cluster: +* [Enable WI on a new cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_workload_identity_on_a_new_cluster) +* [Enable WI on an existing cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_workload_identity_on_an_existing_cluster) + +### Install HNC +We will install HNC and make sure the `hnc-system/default` Kubernetes service account +exists. Here are the steps: +1. [Install HNC](https://github.com/kubernetes-sigs/multi-tenancy/tree/master/incubator/hnc#installing-or-upgrading-hnc) +2. Run `kubectl get serviceaccounts -n hnc-system` and make sure the `default` is listed: +``` + NAME SECRETS AGE + default 1 5d +``` + +### Create a suitable GSA and map it to the KSA +This section will create an [Cloud IAM policy binding](https://cloud.google.com/sdk/gcloud/reference/iam/service-accounts/add-iam-policy-binding) +between the Kubernetes service account (KSA) and the GCP service account (GSA). +This binding allows the KSA to act as the GSA so that the HNC metrics can be exported +to Stackdriver. The above `hnc-system/default` is the KSA to be used. This action +requires Security Admin role, with `iam.serviceAccounts.setIamPolicy` permission, +which your User Account should already have if you have the full-access Owner role. +Therefore, you can execute the following commands to add the IAM policy binding. + +Steps: +1. [Create a Google service account (GSA)](https://cloud.google.com/docs/authentication/production#creating_a_service_account): + ```bash + gcloud iam service-accounts create [GSA_NAME] + ``` +2. Grant “[Monitoring Metric Writer](https://cloud.google.com/monitoring/access-control#mon_roles_desc)” +role to the GSA: + ```bash + gcloud projects add-iam-policy-binding [PROJECT_ID] --member \ + "serviceAccount:[GSA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \ + --role "roles/monitoring.metricWriter" + ``` +3. Create an [Cloud IAM policy binding](https://cloud.google.com/sdk/gcloud/reference/iam/service-accounts/add-iam-policy-binding) +between `hnc-system/default` KSA and the newly created GSA: + ``` + gcloud iam service-accounts add-iam-policy-binding \ + --role roles/iam.workloadIdentityUser \ + --member "serviceAccount:[PROJECT_ID].svc.id.goog[hnc-system/default]" \ + [GSA_NAME]@[PROJECT_ID].iam.gserviceaccount.com + ``` +4. Add the `iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_ID]` annotation to +the KSA, using the email address of the Google service account: + ``` + kubectl annotate serviceaccount \ + --namespace hnc-system \ + default \ + iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_ID].iam.gserviceaccount.com + ``` +5. Verify the service accounts are configured correctly by creating a Pod with the +Kubernetes service account that runs the `cloud-sdk` container image, and connecting +to it with an interactive session: + ``` + kubectl run --rm -it \ + --generator=run-pod/v1 \ + --image google/cloud-sdk:slim \ + --serviceaccount default \ + --namespace hnc-system \ + workload-identity-test + ``` +6. You are now connected to an interactive shell within the created Pod. Run the following command: + ``` + gcloud auth list + ``` + If the service accounts are correctly configured, the GSA email address is listed + as the active (and only) identity. + +## Option 2: Use GCE default Service Account +By default, GKE clusters without Workload Identity use the GCE default Service Account, +and since this SA already has permission to write metrics to Stackdriver, no extra +steps are required other than creating a GKE cluster and installing HNC. The HNC +workloads running on the GCE nodes will use their Service Accounts by default. \ No newline at end of file