Skip to content

Commit

Permalink
add eks/fargate distribution with cluster-receiver-observer deployment
Browse files Browse the repository at this point in the history
  • Loading branch information
Ryan Fitzpatrick committed Jan 6, 2022
1 parent d6f2ca7 commit 2871834
Show file tree
Hide file tree
Showing 29 changed files with 1,628 additions and 8 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.idea
*.iml
12 changes: 12 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,15 @@ render:
default helm-charts/splunk-otel-collector; \
mv "$$dir"/splunk-otel-collector/templates/* "$$dir"; \
rm -rf "$$dir"/splunk-otel-collector

# eks/fargate deployment (with recommended gateway)
dir=rendered/manifests/eks-fargate; \
mkdir -p "$$dir"; \
helm template \
--namespace default \
--values rendered/values.yaml \
--output-dir "$$dir" \
--set distribution=eks/fargate,gateway.enabled=true,cloudProvider=aws \
default helm-charts/splunk-otel-collector; \
mv "$$dir"/splunk-otel-collector/templates/* "$$dir"; \
rm -rf "$$dir"/splunk-otel-collector
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ Kubernetes distributions:

- [Vanilla (unmodified version) Kubernetes](https://kubernetes.io)
- [Amazon Elastic Kubernetes Service](https://aws.amazon.com/eks)
including [with Fargate profiles](docs/advanced-configuration.md#eks-fargate-support)
- [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks)
- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine)
including [GKE Autopilot](docs/advanced-configuration.md#gke-autopilot-support)
Expand Down
32 changes: 31 additions & 1 deletion docs/advanced-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,11 @@ Use the `distribution` parameter to provide information about underlying
Kubernetes deployment. This parameter allows the connector to automatically
scrape additional metadata. The supported options are:

- `aks` - Azure AKS
- `eks` - Amazon EKS
- `eks/fargate` - Amazon EKS with Fargate profiles
- `gke` - Google GKE / Standard mode
- `gke/autopilot` - Google GKE / Autopilot mode
- `aks` - Azure AKS
- `openshift` - Red Hat OpenShift

This value can be omitted if none of the values apply.
Expand Down Expand Up @@ -121,6 +122,35 @@ the following line to your custom values.yaml:
priorityClassName: splunk-otel-agent-priority
```

## EKS Fargate support

If you want to run the Splunk OpenTelemetry Collector in [Amazon Elastic Kubernetes Service
with Fargate profiles](https://docs.aws.amazon.com/eks/latest/userguide/fargate.html),
make sure to set the required `distribution` value to `eks/fargate`:

```yaml
distribution: eks/fargate
```

**NOTE:** Fluentd and Native OTel logs collection are not yet automatically configured in EKS with Fargate profiles

This distribution will operate similarly to the `eks` distribution but with the following distinctions:

1. The Collector agent daemonset is not applied since Fargate doesn't support daemonsets. Any desired Collector instances
running as agents must be configured manually as sidecar containers in your custom deployments. This includes any application
logging services like Fluentd. We recommend setting the `gateway.enabled` to `true` and configuring your instrumented
applications to report metrics, traces, and logs to its `<namespace>-splunk-otel-collector` service address if no agent
instances are used in your cluster.
3. The configured Cluster Receiver single-replica deployment is configured with a
[Kubernetes Observer extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/observer/k8sobserver/README.md)
that discovers the cluster's nodes and pods.
4. The configured Cluster Receiver single-replica deployment is configured with a dynamically created
[Kubelet Stats receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/kubeletstatsreceiver/README.md)
that will report kubelet metrics for all observed Fargate nodes (expect its own as it's unreachable by the collector's Pod).
5. An additional "Cluster Receiver Observer" single-replica deployment similar to the Cluster Receiver's is configured to report
just the Kubelet stats for the Cluster Receiver node for additional Collector monitoring. This is made possible by Fargate-specific
deployment label.

## Logs collection

The helm chart currently utilizes [fluentd](https://docs.fluentd.org/) for Kubernetes logs
Expand Down
15 changes: 15 additions & 0 deletions helm-charts/splunk-otel-collector/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -308,3 +308,18 @@ compatibility with the old config group name: "otelK8sClusterReceiver".
{{- deepCopy .Values.otelK8sClusterReceiver | mustMergeOverwrite (deepCopy .Values.clusterReceiver) | toYaml }}
{{- end }}
{{- end -}}

{{/*
"clusterReceiverObserver" configuration values
*/}}
{{- define "splunk-otel-collector.clusterReceiverObserver" -}}
{{- .Values.clusterReceiverObserver | toYaml }}
{{- end -}}

{{/*
"clusterReceiverObserverEnabled" that's based on enabled flags and distribution
*/}}
{{- define "splunk-otel-collector.clusterReceiverObserverEnabled" -}}
{{- $clusterReceiverObserver := fromYaml (include "splunk-otel-collector.clusterReceiverObserver" .) }}
{{- and (eq (include "splunk-otel-collector.metricsEnabled" .) "true") (or (eq (toString $clusterReceiverObserver.enabled) "true") (and (eq (toString $clusterReceiverObserver.enabled) "false-default") (eq (include "splunk-otel-collector.distribution" .) "eks/fargate"))) }}
{{- end -}}
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ resourcedetection:
- env
{{- if hasPrefix "gke" (include "splunk-otel-collector.distribution" .) }}
- gke
{{- else if eq (include "splunk-otel-collector.distribution" .) "eks" }}
{{- else if hasPrefix "eks" (include "splunk-otel-collector.distribution" .) }}
- eks
{{- else if eq (include "splunk-otel-collector.distribution" .) "aks" }}
- aks
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
{{/*
Config for the otel-collector eks/fargate cluster receiver observer deployment.
The values can be overridden in .Values.clusterReceiverObserver.config
*/}}
{{- define "splunk-otel-collector.clusterReceiverObserverConfig" -}}
{{ $gateway := fromYaml (include "splunk-otel-collector.gateway" .) -}}
{{ $clusterReceiverObserver := fromYaml (include "splunk-otel-collector.clusterReceiverObserver" .) -}}
extensions:
health_check:

memory_ballast:
size_mib: ${SPLUNK_BALLAST_SIZE_MIB}

# k8s_observer w/ pod and node detection for eks/fargate deployment
k8s_observer:
auth_type: serviceAccount
observe_pods: true
observe_nodes: true

receivers:
# Prometheus receiver scraping metrics from the pod itself
prometheus/k8s_cluster_receiver:
config:
scrape_configs:
- job_name: 'otel-k8s-cluster-receiver-observer'
scrape_interval: 10s
static_configs:
- targets: ["${K8S_POD_IP}:8889"]
{{- if $clusterReceiverObserver.k8sEventsEnabled }}
smartagent/kubernetes-events:
type: kubernetes-events
alwaysClusterReporter: true
whitelistedEvents:
- reason: Created
involvedObjectKind: Pod
- reason: Unhealthy
involvedObjectKind: Pod
- reason: Failed
involvedObjectKind: Pod
- reason: FailedCreate
involvedObjectKind: Job
{{- end }}

# dynamically created kubeletstats receiver to report kubelet stats for cluster receiver "node"
receiver_creator/eks-fargate-cluster-receiver:
receivers:
kubeletstats:
rule: type == "k8s.node" && name contains "fargate" && labels["otel-eks-fargate-is-cluster-receiver-node"] == "true"
config:
auth_type: serviceAccount
collection_interval: 10s
endpoint: "`endpoint`:`kubelet_endpoint_port`"
extra_metadata_labels:
- container.id
metric_groups:
- container
- pod
- node
watch_observers:
- k8s_observer


processors:
{{- include "splunk-otel-collector.otelMemoryLimiterConfig" . | nindent 2 }}

batch:

{{- include "splunk-otel-collector.resourceDetectionProcessor" . | nindent 2 }}

{{- if and $clusterReceiverObserver.k8sEventsEnabled (eq (include "splunk-otel-collector.o11yMetricsEnabled" .) "true") }}
resource/add_event_k8s:
attributes:
- action: insert
key: kubernetes_cluster
value: {{ .Values.clusterName }}
{{- end }}

# Resource attributes specific to the collector itself.
resource/add_collector_k8s:
attributes:
- action: insert
key: k8s.node.name
value: "${K8S_NODE_NAME}"
- action: insert
key: k8s.pod.name
value: "${K8S_POD_NAME}"
- action: insert
key: k8s.pod.uid
value: "${K8S_POD_UID}"
- action: insert
key: k8s.namespace.name
value: "${K8S_NAMESPACE}"

resource:
attributes:
# TODO: Remove once available in mapping service.
- action: insert
key: metric_source
value: kubernetes
# XXX: Added so that Smart Agent metrics and OTel metrics don't map to the same MTS identity
# (same metric and dimension names and values) after mappings are applied. This would be
# the case if somebody uses the same cluster name from Smart Agent and OTel in the same org.
- action: insert
key: receiver
value: k8scluster
- action: upsert
key: k8s.cluster.name
value: {{ .Values.clusterName }}
{{- range .Values.extraAttributes.custom }}
- action: upsert
key: {{ .name }}
value: {{ .value }}
{{- end }}
# Extract "container.image.tag" attribute from "container.image.name" here until k8scluster
# receiver does it natively.
- key: container.image.name
pattern: ^(?P<temp_container_image_name>[^\:]+)(?:\:(?P<temp_container_image_tag>.*))?
action: extract
- key: container.image.name
from_attribute: temp_container_image_name
action: upsert
- key: temp_container_image_name
action: delete
- key: container.image.tag
from_attribute: temp_container_image_tag
action: upsert
- key: temp_container_image_tag
action: delete

exporters:
{{- if eq (include "splunk-otel-collector.o11yMetricsEnabled" $) "true" }}
signalfx:
{{ if $gateway.enabled }}
ingest_url: http://{{ include "splunk-otel-collector.fullname" . }}:9943
api_url: http://{{ include "splunk-otel-collector.fullname" . }}:6060
{{- else }}
ingest_url: {{ include "splunk-otel-collector.o11yIngestUrl" . }}
api_url: {{ include "splunk-otel-collector.o11yApiUrl" . }}
{{- end }}
access_token: ${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}
timeout: 10s
{{- end }}

{{- if and (eq (include "splunk-otel-collector.logsEnabled" $) "true") $clusterReceiverObserver.k8sEventsEnabled }}
splunk_hec/o11y:
endpoint: {{ include "splunk-otel-collector.o11yIngestUrl" . }}/v1/log
token: "${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}"
sourcetype: kube:events
source: kubelet
{{- end }}

{{- if (eq (include "splunk-otel-collector.platformMetricsEnabled" .) "true") }}
{{- include "splunk-otel-collector.splunkPlatformMetricsExporter" . | nindent 2 }}
{{- end }}

service:
extensions: [health_check, memory_ballast, k8s_observer]
pipelines:
# k8s metrics pipeline
metrics:
receivers: [receiver_creator/eks-fargate-cluster-receiver]
processors: [memory_limiter, batch, resource]
exporters:
{{- if (eq (include "splunk-otel-collector.o11yMetricsEnabled" .) "true") }}
- signalfx
{{- end }}
{{- if (eq (include "splunk-otel-collector.platformMetricsEnabled" $) "true") }}
- splunk_hec/platform_metrics
{{- end }}

{{- if or (eq (include "splunk-otel-collector.splunkO11yEnabled" $) "true") (eq (include "splunk-otel-collector.platformMetricsEnabled" $) "true") }}
# Pipeline for metrics collected about the collector pod itself.
metrics/collector:
receivers: [prometheus/k8s_cluster_receiver]
processors:
- memory_limiter
- batch
- resource
- resource/add_collector_k8s
- resourcedetection
exporters:
{{- if (eq (include "splunk-otel-collector.o11yMetricsEnabled" .) "true") }}
- signalfx
{{- end }}
{{- if (eq (include "splunk-otel-collector.platformMetricsEnabled" $) "true") }}
- splunk_hec/platform_metrics
{{- end }}
{{- end }}

{{- if and $clusterReceiverObserver.k8sEventsEnabled (eq (include "splunk-otel-collector.o11yMetricsEnabled" .) "true") }}
logs/events:
receivers:
- smartagent/kubernetes-events
processors:
- memory_limiter
- batch
- resource
- resource/add_event_k8s
exporters:
- signalfx
{{- if (eq (include "splunk-otel-collector.o11yLogsEnabled" .) "true") }}
- splunk_hec/o11y
{{- end }}
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ extensions:
memory_ballast:
size_mib: ${SPLUNK_BALLAST_SIZE_MIB}

{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
# k8s_observer w/ pod and node detection for eks/fargate deployment
k8s_observer:
auth_type: serviceAccount
observe_pods: true
observe_nodes: true
{{- end }}

receivers:
# Prometheus receiver scraping metrics from the pod itself, both otel and fluentd
prometheus/k8s_cluster_receiver:
Expand Down Expand Up @@ -42,6 +50,26 @@ receivers:
- reason: FailedCreate
involvedObjectKind: Job
{{- end }}
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
# dynamically created kubeletstats receiver to report all Fargate "node" kubelet stats
# with exception of collector "node's" own since Fargate forbids connection.
receiver_creator:
receivers:
kubeletstats:
rule: type == "k8s.node" && name contains "fargate" && not ( name contains "${K8S_NODE_NAME}" )
config:
auth_type: serviceAccount
collection_interval: 10s
endpoint: "`endpoint`:`kubelet_endpoint_port`"
extra_metadata_labels:
- container.id
metric_groups:
- container
- pod
- node
watch_observers:
- k8s_observer
{{- end }}

processors:
{{- include "splunk-otel-collector.otelMemoryLimiterConfig" . | nindent 2 }}
Expand Down Expand Up @@ -137,11 +165,20 @@ exporters:
{{- end }}

service:
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
extensions: [health_check, memory_ballast, k8s_observer]
{{- else }}
extensions: [health_check, memory_ballast]
{{- end }}
pipelines:
# k8s metrics pipeline
metrics:
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }}
receivers: [k8s_cluster, receiver_creator]
{{- else }}
receivers: [k8s_cluster]
{{- end }}

processors: [memory_limiter, batch, resource]
exporters:
{{- if (eq (include "splunk-otel-collector.o11yMetricsEnabled" .) "true") }}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
{{ $agent := fromYaml (include "splunk-otel-collector.agent" .) }}
{{ if $agent.enabled }}
{{/*
Fargate doesn't support daemonsets so never use for that platform
*/}}
{{- if and $agent.enabled (ne (include "splunk-otel-collector.distribution" .) "eks/fargate") }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
Loading

0 comments on commit 2871834

Please sign in to comment.