Skip to content

Commit

Permalink
Added support to collect control plane component metrics; controller-…
Browse files Browse the repository at this point in the history
…manager, coredns, proxy, scheduler (#383)
  • Loading branch information
jvoravong authored Feb 17, 2022
1 parent b2dfe48 commit 7ae8f61
Show file tree
Hide file tree
Showing 14 changed files with 274 additions and 39 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## Unreleased

### Added

- Added support to collect control plane component metrics; controller-manager, coredns, proxy, scheduler (#383)

## [0.43.2] - 2022-02-02

### Added
Expand Down
3 changes: 3 additions & 0 deletions ci_scripts/sck_otel_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ logsEngine: otel

clusterName: "functional-test"

agent:
controlPlaneEnabled: true

# Metadata to be set on the telemetry data from Kubernetes objects.
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sprocessor.
#k8sMetadata:
Expand Down
108 changes: 80 additions & 28 deletions docs/advanced-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,86 @@ for the Fargate distribution has two primary differences between regular `eks` t
node label. The Collector's ClusterRole for `eks/fargate` will allow the `patch` verb on `nodes` resources for the default API groups to allow the cluster
receiver's init container to add this node label for designated self monitoring.

## Control Plane metrics

By setting `agent.controlPlaneEnabled=true` the helm chart will set up the otel-collector agent to collect metrics from
the control plane.

To collect control plane metrics, the helm chart has the otel-collector agent on each node use the
[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/receivercreator/README.md)
to instantiate control plane receivers at runtime. The receiver creator has a set of discovery rules to know
which control plane receivers to create. The default discovery rules can vary depending on the Kubernetes distribution
and version. If your control plane is using nonstandard specs, then you can provide a custom configuration (
[see below](#using-custom-configurations-for-nonstandard-control-plane-components)
) so the otel-collector agent can still successfully connect.

The otel-collector agent relies on having pod level network access to collect metrics from the control plane pods.
Since most cloud Kubernetes as a service distributions don't expose the control plane pods to the
end user, collecting metrics from these distributions is not supported.

* Supported Distributions:
* kubernetes 1.22 (kops created)
* openshift v4.9
* Unsupported Distributions:
* aks
* eks
* eks/fargate
* gke
* gke/autopilot

The default configurations for the control plane receivers can be found in
[_otel-agent.tpl](../helm-charts/splunk-otel-collector/templates/config/_otel-agent.tpl).

### Receiver documentation

Here are the documentation links that contain configuration options and supported metrics information for each receiver
used to collect metrics from the control plane.
* [smartagent/coredns](https://docs.splunk.com/Observability/gdi/coredns/coredns.html)
* [smartagent/kube-controller-manager](https://docs.splunk.com/Observability/gdi/kube-controller-manager/kube-controller-manager.html)
* [smartagent/kubernetes-apiserver](https://docs.splunk.com/Observability/gdi/kubernetes-apiserver/kubernetes-apiserver.html)
* [smartagent/kubernetes-proxy](https://docs.splunk.com/Observability/gdi/kubernetes-proxy/kubernetes-proxy.html)
* [smartagent/kubernetes-scheduler](https://docs.splunk.com/Observability/gdi/kubernetes-scheduler/kubernetes-scheduler.html)

### Using custom configurations for nonstandard control plane components

A user may need to override the default configuration values used to connect to the control plane for a couple different
reason. If your control plane uses nonstandard ports or custom TLS settings, then you will need to override the default
configurations. Here is an example of how you could connect to a nonstandard apiserver that uses port 3443 for metrics
and custom TLS certs stored in the /etc/myapiserver/ directory.

```yaml
agent:
config:
receivers:
receiver_creator:
receivers:
# Template for overriding the discovery rule and config.
# smartagent/{control_plane_receiver}:
# rule: {rule_value}
# config:
# {config_value}
smartagent/kubernetes-apiserver:
rule: type == "port" && port == 3443 && pod.labels["k8s-app"] == "kube-apiserver"
config:
clientCertPath: /etc/myapiserver/clients-ca.crt
clientKeyPath: /etc/myapiserver/clients-ca.key
skipVerify: true
useHTTPS: true
useServiceAccount: false
```

### Known issues

Kube Proxy
* https://github.com/kubernetes/kops/issues/6472
* Problem
* When using a kops created Kubernetes cluster, a network connectivity issue has been reported that prevents proxy
metrics from being collected.
* Solution
* This issue can be addressed updating the kubeProxy metric bind address in the kops cluster spec:
* Set "kubeProxy.metricsBindAddress: 0.0.0.0" in the kops cluster spec.
* Deploy the change with "kops update cluster {cluster_name}" and "kops rolling-update cluster {cluster_name}".

## Logs collection

The helm chart currently utilizes [fluentd](https://docs.fluentd.org/) for Kubernetes logs
Expand Down Expand Up @@ -340,31 +420,3 @@ autodetect:
## Override underlying OpenTelemetry agent configuration

If you want to use your own OpenTelemetry Agent configuration, you can override it by providing a custom configuration in the `agent.config` parameter in the values.yaml, which will be merged into the default agent configuration, list parts of the configuration (for example, `service.pipelines.logs.processors`) to be fully re-defined.

### Override a control plane configuration

If your control plane is using non-standard ports or custom TLS certificates, then you can provide a custom
configuration so the otel-collector agent can still successfully connect to it.

To collect control plane metrics, we use a
[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/receivercreator/README.md)
that instantiates
[smartagent/{control_plane_component}](https://docs.splunk.com/observability/gdi/orchestration.html#nav-Orchestration)
receivers at runtime.

Below is an example configuration of how you could set up the agent to connect to an apiserver that is running on port
8443 (instead of the normal 443) and use custom TLS configurations.

```yaml
agent:
config:
receivers:
receiver_creator:
receivers:
smartagent/kubernetes-apiserver:
rule: type == "port" && port == 8443 && pod.labels["k8s-app"] == "kube-apiserver"
config:
clientCertPath: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
clientKeyPath: /var/run/secrets/kubernetes.io/serviceaccount/token
useServiceAccount: false
```
14 changes: 14 additions & 0 deletions helm-charts/splunk-otel-collector/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,20 @@ Get Splunk Observability Access Token.
{{- .Values.splunkObservability.accessToken | default .Values.splunkAccessToken | default "" -}}
{{- end -}}

{{/*
Helper that returns the controlPlaneEnabled parameter taking care of backward compatibility with the old parameter
name "autodetect.controlPlane".
*/}}
{{- define "splunk-otel-collector.controlPlaneEnabled" -}}
{{- if ne (toString .Values.agent.controlPlaneEnabled) "<nil>" }}
{{- .Values.agent.controlPlaneEnabled }}
{{- else if ne (toString .Values.autodetect.controlPlane) "<nil>" }}
{{- .Values.autodetect.controlPlane }}
{{- else }}
{{- true }}
{{- end -}}
{{- end -}}

{{/*
Create the fluentd image name.
*/}}
Expand Down
64 changes: 62 additions & 2 deletions helm-charts/splunk-otel-collector/templates/config/_otel-agent.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,42 @@ receivers:
{{- end }}

# Receivers for collecting k8s control plane metrics.
{{- if .Values.autodetect.controlPlane }}
# Verified with Kubernetes v1.22 and Openshift v4.9.
# Below, the TLS certificate verification is often skipped because the k8s default certificate is self signed and
# will fail the verification.
{{- if (eq (include "splunk-otel-collector.controlPlaneEnabled" .) "true") }}
smartagent/coredns:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && namespace == "openshift-dns" && name contains "dns"
{{- else }}
rule: type == "pod" && labels["k8s-app"] == "kube-dns"
{{- end }}
config:
extraDimensions:
metric_source: k8s-coredns
type: coredns
{{- if eq .Values.distribution "openshift" }}
port: 9154
skipVerify: true
useHTTPS: true
useServiceAccount: true
{{- else }}
port: 9153
{{- end }}
smartagent/kube-controller-manager:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && labels["app"] == "kube-controller-manager" && labels["kube-controller-manager"] == "true"
{{- else }}
rule: type == "pod" && labels["k8s-app"] == "kube-controller-manager"
{{- end }}
config:
extraDimensions:
metric_source: kubernetes-controller-manager
port: 10257
skipVerify: true
type: kube-controller-manager
useHTTPS: true
useServiceAccount: true
smartagent/kubernetes-apiserver:
{{- if eq .Values.distribution "openshift" }}
rule: type == "port" && port == 6443 && pod.labels["app"] == "openshift-kube-apiserver" && pod.labels["apiserver"] == "true"
Expand All @@ -86,11 +121,36 @@ receivers:
config:
extraDimensions:
metric_source: kubernetes-apiserver
# We skip verifying here because the k8s default certificate is self signed and will fail this verification.
skipVerify: true
type: kubernetes-apiserver
useHTTPS: true
useServiceAccount: true
smartagent/kubernetes-proxy:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && labels["app"] == "sdn"
{{- else }}
rule: type == "pod" && labels["k8s-app"] == "kube-proxy"
{{- end }}
config:
extraDimensions:
metric_source: kubernetes-proxy
type: kubernetes-proxy
{{- if eq .Values.distribution "openshift" }}
port: 29101
{{- else }}
port: 10249
{{- end }}
smartagent/kubernetes-scheduler:
{{- if eq .Values.distribution "openshift" }}
rule: type == "pod" && labels["app"] == "openshift-kube-scheduler" && labels["scheduler"] == "true"
{{- else }}
rule: type == "pod" && labels["k8s-app"] == "kube-scheduler"
{{- end }}
config:
extraDimensions:
metric_source: kubernetes-scheduler
port: 10251
type: kubernetes-scheduler
{{- end}}

kubeletstats:
Expand Down
6 changes: 5 additions & 1 deletion helm-charts/splunk-otel-collector/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,8 @@
"additionalProperties": false,
"properties": {
"controlPlane": {
"type": "boolean"
"type": "boolean",
"deprecated": true
},
"prometheus": {
"type": "boolean"
Expand Down Expand Up @@ -287,6 +288,9 @@
"enabled": {
"type": "boolean"
},
"controlPlaneEnabled": {
"type": "boolean"
},
"ports": {
"type": "object",
"patternProperties": {
Expand Down
9 changes: 4 additions & 5 deletions helm-charts/splunk-otel-collector/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -151,18 +151,13 @@ distribution: ""

################################################################################
# Optional: Automatic detection of additional metric sources.
# We collect k8s control plane metrics by default, you can disable this by
# setting autodetect.controlPlane=false. This controlPlane integration
# relies on having access the k8s control plane which k8s clusters run as a
# managed service do not support.
# Set autodetect.prometheus=true if you want the otel-collector agent to scrape
# prometheus metrics from pods that have prometheus-style annotations like
# "prometheus.io/scrape".
# Set autodetect.istio=true in istio environment.
################################################################################

autodetect:
controlPlane: true
prometheus: false
istio: false

Expand Down Expand Up @@ -216,6 +211,10 @@ extraAttributes:
agent:
enabled: true

# This Flag enables k8s control plane metric collection.
# Details about control plane monitoring and related configurations are located in docs/advanced-configuration.md
controlPlaneEnabled: true

# The ports to be exposed by the agent to the host.
# Make sure that only necessary ports are exposed, <hostIP, hostPort, protocol> combination must
# be unique across all the nodes in k8s cluster. Any port can be disabled,
Expand Down
31 changes: 31 additions & 0 deletions rendered/manifests/agent-only/configmap-agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,23 @@ data:
- ${K8S_POD_IP}:8889
receiver_creator:
receivers:
smartagent/coredns:
config:
extraDimensions:
metric_source: k8s-coredns
port: 9153
type: coredns
rule: type == "pod" && labels["k8s-app"] == "kube-dns"
smartagent/kube-controller-manager:
config:
extraDimensions:
metric_source: kubernetes-controller-manager
port: 10257
skipVerify: true
type: kube-controller-manager
useHTTPS: true
useServiceAccount: true
rule: type == "pod" && labels["k8s-app"] == "kube-controller-manager"
smartagent/kubernetes-apiserver:
config:
extraDimensions:
Expand All @@ -185,6 +202,20 @@ data:
useHTTPS: true
useServiceAccount: true
rule: type == "port" && port == 443 && pod.labels["k8s-app"] == "kube-apiserver"
smartagent/kubernetes-proxy:
config:
extraDimensions:
metric_source: kubernetes-proxy
port: 10249
type: kubernetes-proxy
rule: type == "pod" && labels["k8s-app"] == "kube-proxy"
smartagent/kubernetes-scheduler:
config:
extraDimensions:
metric_source: kubernetes-scheduler
port: 10251
type: kubernetes-scheduler
rule: type == "pod" && labels["k8s-app"] == "kube-scheduler"
watch_observers:
- k8s_observer
signalfx:
Expand Down
2 changes: 1 addition & 1 deletion rendered/manifests/agent-only/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ spec:
app: splunk-otel-collector
release: default
annotations:
checksum/config: 7ceaa2db448ab1a0ebc167bc1aa68fbb65dddeb17d91a9c1fb9c3b0d0917515d
checksum/config: aa7528ec9c709cc3ba572b82a01e5655bc4b4f3dfc033dfe63416f7dfc938f1e
kubectl.kubernetes.io/default-container: otel-collector
spec:
hostNetwork: true
Expand Down
31 changes: 31 additions & 0 deletions rendered/manifests/metrics-only/configmap-agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,23 @@ data:
- ${K8S_POD_IP}:8889
receiver_creator:
receivers:
smartagent/coredns:
config:
extraDimensions:
metric_source: k8s-coredns
port: 9153
type: coredns
rule: type == "pod" && labels["k8s-app"] == "kube-dns"
smartagent/kube-controller-manager:
config:
extraDimensions:
metric_source: kubernetes-controller-manager
port: 10257
skipVerify: true
type: kube-controller-manager
useHTTPS: true
useServiceAccount: true
rule: type == "pod" && labels["k8s-app"] == "kube-controller-manager"
smartagent/kubernetes-apiserver:
config:
extraDimensions:
Expand All @@ -176,6 +193,20 @@ data:
useHTTPS: true
useServiceAccount: true
rule: type == "port" && port == 443 && pod.labels["k8s-app"] == "kube-apiserver"
smartagent/kubernetes-proxy:
config:
extraDimensions:
metric_source: kubernetes-proxy
port: 10249
type: kubernetes-proxy
rule: type == "pod" && labels["k8s-app"] == "kube-proxy"
smartagent/kubernetes-scheduler:
config:
extraDimensions:
metric_source: kubernetes-scheduler
port: 10251
type: kubernetes-scheduler
rule: type == "pod" && labels["k8s-app"] == "kube-scheduler"
watch_observers:
- k8s_observer
signalfx:
Expand Down
Loading

0 comments on commit 7ae8f61

Please sign in to comment.