Skip to content

Commit

Permalink
Added support to collect control plane component metrics; etcd (#384)
Browse files Browse the repository at this point in the history
  • Loading branch information
jvoravong authored Mar 2, 2022
1 parent ce28e77 commit f5b1918
Show file tree
Hide file tree
Showing 20 changed files with 428 additions and 83 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## Unreleased

### Added

- Control plane metrics support: etcd (#384)

## [0.43.5] - 2022-03-02

### Fixed
Expand Down
102 changes: 100 additions & 2 deletions docs/advanced-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,11 @@ for the Fargate distribution has two primary differences between regular `eks` t

## Control Plane metrics

By setting `agent.controlPlaneEnabled=true` the helm chart will set up the otel-collector agent to collect metrics from
the control plane.
By setting `agent.controlPlaneMetrics.{component}.enabled=true` the helm chart will set up the otel-collector agent to
collect metrics from a particular control plane component. Most metrics can be collected from the control plane
with no extra configuration, however, extra configuration steps must be taken to collect metrics from etcd (
[see below](#setting-up-etcd-metrics)
) due to TLS security requirements.

To collect control plane metrics, the helm chart has the otel-collector agent on each node use the
[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/receivercreator/README.md)
Expand Down Expand Up @@ -226,11 +229,106 @@ The default configurations for the control plane receivers can be found in
Here are the documentation links that contain configuration options and supported metrics information for each receiver
used to collect metrics from the control plane.
* [smartagent/coredns](https://docs.splunk.com/Observability/gdi/coredns/coredns.html)
* [smartagent/etcd](https://docs.splunk.com/Observability/gdi/etcd/etcd.html)
* [smartagent/kube-controller-manager](https://docs.splunk.com/Observability/gdi/kube-controller-manager/kube-controller-manager.html)
* [smartagent/kubernetes-apiserver](https://docs.splunk.com/Observability/gdi/kubernetes-apiserver/kubernetes-apiserver.html)
* [smartagent/kubernetes-proxy](https://docs.splunk.com/Observability/gdi/kubernetes-proxy/kubernetes-proxy.html)
* [smartagent/kubernetes-scheduler](https://docs.splunk.com/Observability/gdi/kubernetes-scheduler/kubernetes-scheduler.html)

### Setting up etcd metrics

The etcd metrics cannot be collected out of box because etcd requires TLS authentication for communication. Below, we
have supplied a couple methods for setting up TLS authentication between etcd and the otel-collector agent. The etcd TLS
client certificate and key play a critical role in the security of the cluster, handle them with care and avoid storing
them in unsecured locations. To limit unnecessary access to the etcd certificate and key, you should deploy the helm
chart into a namespace that is isolated from other unrelated resources.

#### Method 1: Deploy the helm chart with the etcd certificate and key as values
The easiest way to set up the TLS authentication for etcd metrics is to retrieve the client certificate and key from an
etcd pod and directly use them in the values.yaml (or using --set=). The helm chart will set up the rest. The helm chart
will add the client certificate and key to a newly created kubernetes secret and then configure the etcd receiver to use
them.

You can get the contents of the certificate and key by running these commands. The path to the certificate and key can
vary depending on your Kubernetes distribution.
```bash
# The steps for kubernetes and openshift are listed here.
# For kubernetes:
etcd_pod_name=$(kubectl get pods -n kube-system -l k8s-app=etcd-manager-events -o=name | sed "s/^.\{4\}//" | head -n 1)
kubectl exec -it -n kube-system {etcd_pod_name} cat /etc/kubernetes/pki/etcd-manager/etcd-clients-ca.crt
kubectl exec -it -n kube-system {etcd_pod_name} cat /etc/kubernetes/pki/etcd-manager/etcd-clients-ca.key
# For openshift:
etcd_pod_name=$(kubectl get pods -n openshift-etcd -l k8s-app=etcd -o=name | sed "s/^.\{4\}//" | head -n 1)
kubectl exec -it -n openshift-etcd {etcd_pod_name} cat /etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-metrics-{etcd_pod_name}.crt
kubectl exec -it -n openshift-etcd {etcd_pod_name} cat /etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-metrics-{etcd_pod_name}.key
```

Once you have the contents of your certificate and key, insert them into your values.yaml. Since the helm chart will
create the secret, you must specify agent.controlPlaneMetrics.etcd.secret.create=true. Then install your helm chart.
```yaml
agent:
controlPlaneMetrics:
etcd:
enabled: true
secret:
create: true
# The PEM-format CA certificate for this client.
clientCert: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
# The private key for this client.
clientKey: |
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
# Optional. The CA cert that has signed the TLS cert.
# caFile: |
```

#### Method 2: Deploy the helm chart with a secret that contains the etcd certificate and key
To set up the TLS authentication for etcd metrics with this method, the otel-collector agents will need access to a
kubernetes secret that contains the etcd TLS client certificate and key. The name of this kubernetes secret must be
supplied in the helm chart (.Values.agent.controlPlaneMetrics.etcd.secret.name). When installed, the helm chart will
mount the specified kubernetes secret onto the /otel/etc/etcd directory of the otel-collector agent containers so the
agent can use it.

Here are the commands for creating a kubernetes secret named splunk-monitoring-etcd.
```bash
# The steps for kubernetes and openshift are listed here.
# For kubernetes:
etcd_pod_name=$(kubectl get pods -n kube-system -l k8s-app=etcd-manager-events -o=name | sed "s/^.\{4\}//" | head -n 1)
kubectl exec -it -n kube-system $etcd_pod_name -- cat /etc/kubernetes/pki/etcd-manager/etcd-clients-ca.crt > ./tls.crt
kubectl exec -it -n kube-system $etcd_pod_name -- cat /etc/kubernetes/pki/etcd-manager/etcd-clients-ca.key > ./tls.key
# For openshift:
etcd_pod_name=$(kubectl get pods -n openshift-etcd -l k8s-app=etcd -o=name | sed "s/^.\{4\}//" | head -n 1)
kubectl exec -it -n openshift-etcd {etcd_pod_name} cat /etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-metrics-{etcd_pod_name}.crt > ./tls.crt
kubectl exec -it -n openshift-etcd {etcd_pod_name} cat /etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-metrics-{etcd_pod_name}.key > ./tls.key
# Create the the secret.
# The input file names must be one of: tls.crt, tls.key, cacert.pem
kubectl create secret generic splunk-monitoring-etcd --from-file=./tls.crt --from-file=./tls.key
# Optional. Include the CA cert that has signed the TLS cert.
# kubectl create secret generic splunk-monitoring-etcd --from-file=./tls.crt --from-file=./tls.key --from-file=cacert.pem
# Cleanup the local files.
rm ./tls.crt
rm ./tls.key
```

Once your kubernetes secret is created, specify the secret's name in values.yaml. Since the helm chart will be using the
secret you created, make sure to set .Values.agent.controlPlaneMetrics.etc.secret.create=false. Then install your helm
chart.
```yaml
agent:
controlPlaneMetrics:
etcd:
enabled: true
secret:
create: false
name: splunk-monitoring-etcd
```

### Using custom configurations for nonstandard control plane components

A user may need to override the default configuration values used to connect to the control plane for a couple different
Expand Down
27 changes: 12 additions & 15 deletions helm-charts/splunk-otel-collector/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ Whether profiling data is enabled (applicable to Splunk Observability only).
{{- end -}}

{{/*
Define name for the Secret
Define name for the Splunk Secret
*/}}
{{- define "splunk-otel-collector.secret" -}}
{{- if .Values.secret.name -}}
Expand All @@ -146,6 +146,17 @@ Define name for the Secret
{{- end -}}
{{- end -}}

{{/*
Define name for the etcd Secret
*/}}
{{- define "splunk-otel-collector.etcdSecret" -}}
{{- if .Values.agent.controlPlaneMetrics.etcd.secret.name -}}
{{- printf "%s" .Values.agent.controlPlaneMetrics.etcd.secret.name -}}
{{- else -}}
{{- default .Chart.Name .Values.nameOverride | printf "%s-etcd" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}

{{/*
Create the name of the service account to use
*/}}
Expand Down Expand Up @@ -184,20 +195,6 @@ Get Splunk Observability Access Token.
{{- .Values.splunkObservability.accessToken | default .Values.splunkAccessToken | default "" -}}
{{- end -}}

{{/*
Helper that returns the controlPlaneEnabled parameter taking care of backward compatibility with the old parameter
name "autodetect.controlPlane".
*/}}
{{- define "splunk-otel-collector.controlPlaneEnabled" -}}
{{- if ne (toString .Values.agent.controlPlaneEnabled) "<nil>" }}
{{- .Values.agent.controlPlaneEnabled }}
{{- else if ne (toString .Values.autodetect.controlPlane) "<nil>" }}
{{- .Values.autodetect.controlPlane }}
{{- else }}
{{- true }}
{{- end -}}
{{- end -}}

{{/*
Create the fluentd image name.
*/}}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ receivers:
# Verified with Kubernetes v1.22 and Openshift v4.9.
# Below, the TLS certificate verification is often skipped because the k8s default certificate is self signed and
# will fail the verification.
{{- if (eq (include "splunk-otel-collector.controlPlaneEnabled" .) "true") }}
{{- if .Values.agent.controlPlaneMetrics.coredns.enabled }}
smartagent/coredns:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && namespace == "openshift-dns" && name contains "dns"
Expand All @@ -98,6 +98,32 @@ receivers:
{{- else }}
port: 9153
{{- end }}
{{- end }}
{{- if .Values.agent.controlPlaneMetrics.etcd.enabled }}
smartagent/etcd:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && labels["k8s-app"] == "etcd"
{{- else }}
rule: type == "pod" && (labels["k8s-app"] == "etcd-manager-events" || labels["k8s-app"] == "etcd-manager-main")
{{- end }}
config:
clientCertPath: /otel/etc/etcd/tls.crt
clientKeyPath: /otel/etc/etcd/tls.key
useHTTPS: true
type: etcd
{{- if .Values.agent.controlPlaneMetrics.etcd.skipVerify }}
skipVerify: true
{{- else }}
caCertPath: /otel/etc/etcd/cacert.pem
skipVerify: false
{{- end }}
{{- if eq .Values.distribution "openshift" }}
port: 9979
{{- else }}
port: 4001
{{- end }}
{{- end }}
{{- if .Values.agent.controlPlaneMetrics.controllerManager.enabled }}
smartagent/kube-controller-manager:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && labels["app"] == "kube-controller-manager" && labels["kube-controller-manager"] == "true"
Expand All @@ -112,6 +138,8 @@ receivers:
type: kube-controller-manager
useHTTPS: true
useServiceAccount: true
{{- end }}
{{- if .Values.agent.controlPlaneMetrics.apiserver.enabled }}
smartagent/kubernetes-apiserver:
{{- if eq .Values.distribution "openshift" }}
rule: type == "port" && port == 6443 && pod.labels["app"] == "openshift-kube-apiserver" && pod.labels["apiserver"] == "true"
Expand All @@ -125,6 +153,8 @@ receivers:
type: kubernetes-apiserver
useHTTPS: true
useServiceAccount: true
{{- end }}
{{- if .Values.agent.controlPlaneMetrics.proxy.enabled }}
smartagent/kubernetes-proxy:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && labels["app"] == "sdn"
Expand All @@ -140,6 +170,8 @@ receivers:
{{- else }}
port: 10249
{{- end }}
{{- end }}
{{- if .Values.agent.controlPlaneMetrics.scheduler.enabled }}
smartagent/kubernetes-scheduler:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && labels["app"] == "openshift-kube-scheduler" && labels["scheduler"] == "true"
Expand Down
10 changes: 10 additions & 0 deletions helm-charts/splunk-otel-collector/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,11 @@ spec:
mountPath: /otel/etc
readOnly: true
{{- end }}
{{- if .Values.agent.controlPlaneMetrics.etcd.enabled }}
- name: etcd-secret
mountPath: /otel/etc/etcd
readOnly: true
{{- end }}
{{- if and (eq (include "splunk-otel-collector.logsEnabled" $) "true") (eq .Values.logsEngine "otel") }}
{{- if .Values.isWindows }}
- name: varlog
Expand Down Expand Up @@ -463,6 +468,11 @@ spec:
secret:
secretName: {{ template "splunk-otel-collector.secret" . }}
{{- end }}
{{- if .Values.agent.controlPlaneMetrics.etcd.enabled }}
- name: etcd-secret
secret:
secretName: {{ template "splunk-otel-collector.etcdSecret" . }}
{{- end }}
- name: otel-configmap
configMap:
name: {{ template "splunk-otel-collector.fullname" . }}-otel-agent
Expand Down
23 changes: 23 additions & 0 deletions helm-charts/splunk-otel-collector/templates/secret-etcd.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{{- if .Values.agent.controlPlaneMetrics.etcd.secret.create -}}
apiVersion: v1
kind: Secret
metadata:
name: {{ template "splunk-otel-collector.etcdSecret" . }}
labels:
{{- include "splunk-otel-collector.commonLabels" . | nindent 4 }}
app: {{ template "splunk-otel-collector.name" . }}
chart: {{ template "splunk-otel-collector.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
type: Opaque
data:
{{- with .Values.agent.controlPlaneMetrics.etcd.secret.clientCert }}
tls.crt: {{ . | b64enc }}
{{- end }}
{{- with .Values.agent.controlPlaneMetrics.etcd.secret.clientKey }}
tls.key: {{ . | b64enc }}
{{- end }}
{{- with .Values.agent.controlPlaneMetrics.etcd.secret.caFile }}
cacert.pem: {{ . | b64enc }}
{{- end }}
{{- end -}}
Loading

0 comments on commit f5b1918

Please sign in to comment.