Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add OTLP Logs PoC documentation #762

Merged
merged 4 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions docs/contributor/assets/otel-logs-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
mode: daemonset

presets:
logsCollection:
enabled: true
kubernetesAttributes:
enabled: true
extractAllPodLabels: true

config:
receivers:
filelog:
include: [ /var/log/pods/*/*/*.log ]
exclude: []
# Exclude collector container's logs. The file format is /var/log/pods/<namespace_name>_<pod_name>_<pod_uid>/<container_name>/<run_id>.log
start_at: end
retry_on_failure:
enabled: true
include_file_path: true
include_file_name: false
operators:
# Find out which format is used by kubernetes
- type: router
id: get-format
routes:
- output: parser-docker
expr: 'body matches "^\\{"'
- output: parser-crio
expr: 'body matches "^[^ Z]+ "'
- output: parser-containerd
expr: 'body matches "^[^ Z]+Z"'
# Parse CRI-O format
- type: regex_parser
id: parser-crio
regex: '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
timestamp:
parse_from: attributes.time
layout_type: gotime
layout: '2006-01-02T15:04:05.999999999Z07:00'
- type: recombine
id: crio-recombine
output: extract_metadata_from_filepath
combine_field: attributes.log
source_identifier: attributes["log.file.path"]
is_last_entry: "attributes.logtag == 'F'"
combine_with: ""
# Parse CRI-Containerd format
- type: regex_parser
id: parser-containerd
regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
- type: recombine
id: containerd-recombine
output: extract_metadata_from_filepath
combine_field: attributes.log
source_identifier: attributes["log.file.path"]
is_last_entry: "attributes.logtag == 'F'"
combine_with: ""
# Parse Docker format
- type: json_parser
id: parser-docker
output: extract_metadata_from_filepath
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
# Extract metadata from file path
- type: regex_parser
id: extract_metadata_from_filepath
regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
parse_from: attributes["log.file.path"]
# Rename attributes
- type: move
from: attributes.stream
to: attributes["log.iostream"]
- type: move
from: attributes.container_name
to: resource["k8s.container.name"]
- type: move
from: attributes.namespace
to: resource["k8s.namespace.name"]
- type: move
from: attributes.pod_name
to: resource["k8s.pod.name"]
- type: move
from: attributes.restart_count
to: resource["k8s.container.restart_count"]
- type: move
from: attributes.uid
to: resource["k8s.pod.uid"]
# Clean up log body
- type: move
from: attributes.log
to: body
# Extract JSON attributes
- type: json_parser
if: 'body matches "^{.*}$"'
parse_from: body
parse_to: attributes
- type: copy
from: body
to: attributes.original
- type: move
from: attributes.message
to: body
if: 'attributes.message != nil'
- type: move
from: attributes.msg
to: body
if: 'attributes.msg != nil'
- type: severity_parser
parse_from: attributes.level
if: 'attributes.level != nil'

exporters:
otlp:
endpoint: ${ingest-otlp-endpoint}
tls:
insecure: false
cert_pem: ${ingest-otlp-cert}
key_pem: ${ingest-otlp-key}
service:
pipelines:
logs:
exporters:
- otlp

extraEnvsFrom:
- secretRef:
name: sap-cloud-logging
106 changes: 106 additions & 0 deletions docs/contributor/pocs/otlp-logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# OpenTelemetry Logs PoC

## Scope and Goals

When integrating an OTLP compliant logging backend, applications can either ingest their logs directly or emit them to STDOUT and use a log collector to process and forward the logs.
With this PoC, we evaluated how the OpenTelemetry Collector's [filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver) can be configured to transform structured JSON logs emitted by Kubernetes workloads to STDOUT, and subsequently to the [OTLP logs data model](https://opentelemetry.io/docs/specs/otel/logs/data-model/).
OpenTelemtry Collector should move JSON attributes to the **attributes** map of the log record, extract other fields like **severity** or **timestamp**, write the actual log message to the **body** field, and add any missing information to ensure that the **attributes** and **resource** attributes comply with the semantic conventions.

This PoC does not cover logs ingested by the application using the OTLP protocol. We assume that the application already fills the log record fields with the intended values.

## Setup

We created a Helm values file for the `open-telemetry/opentelemetry-collector` chart that parses and transforms container logs in the described way. We use an SAP Cloud Logging instance as the OTLP compliant logging backend. To deploy the setup, follow these steps:

1. Create an SAP Cloud Logging instance. Store the endpoint, client certificate, and key under the keys `ingest-otlp-endpoint`, `ingest-otlp-cert`, and `ingest-otlp-key` respectively, in a Kubernetes Secret within the `otel-logging` namespace.

2. Deploy the OpenTelemetry Collector Helm chart with the values file [otlp-logs.yaml](../assets/otel-logs-values.yaml):

```bash
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install -n otel-logging logging open-telemetry/opentelemetry-collector \
-f ../assets/otel-logs-values.yaml
```

## Results

We tested different log formats to evaluate the filelog receiver configuration. The following example of a log record emitted by telemetry-metric-agent demonstrates the transformation. The original log record looks as follows:

```
{"level":"info","ts":1706781583.437593,"caller":"exporterhelper/retry_sender.go:129","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"metrics","name":"otlp","error":"rpc error: code = Unavailable desc = no healthy upstream","interval":"6.132976949s"}
```

This processed log record arrives in the SAP Cloud Logging (OpenSearch):

```
{
"_index": "logs-otel-v1-2024.02.01",
"_type": "_doc",
"_id": "20ccZI0BYhUzrpscNwrE",
"_version": 1,
"_score": null,
"_source": {
"traceId": "",
"spanId": "",
"severityText": "info",
"flags": 0,
"time": "2024-02-01T09:59:43.437812119Z",
"severityNumber": 9,
"droppedAttributesCount": 0,
"serviceName": null,
"body": "Exporting failed. Will retry the request after interval.",
"observedTime": "2024-02-01T09:59:43.580359394Z",
"schemaUrl": "",
"log.attributes.time": "2024-02-01T09:59:43.437812119Z",
"log.attributes.original": "{\"level\":\"info\",\"ts\":1706781583.437593,\"caller\":\"exporterhelper/retry_sender.go:129\",\"msg\":\"Exporting failed. Will retry the request after interval.\",\"kind\":\"exporter\",\"data_type\":\"metrics\",\"name\":\"otlp\",\"error\":\"rpc error: code = Unavailable desc = no healthy upstream\",\"interval\":\"6.132976949s\"}",
"resource.attributes.k8s@namespace@name": "kyma-system",
"resource.attributes.k8s@container@name": "collector",
"resource.attributes.security@istio@io/tlsMode": "istio",
"log.attributes.log@iostream": "stderr",
"log.attributes.name": "otlp",
"resource.attributes.k8s@pod@name": "telemetry-metric-agent-8wxcx",
"resource.attributes.k8s@node@name": "...",
"resource.attributes.service@istio@io/canonical-name": "telemetry-metric-agent",
"resource.attributes.service@istio@io/canonical-revision": "latest",
"resource.attributes.app@kubernetes@io/name": "telemetry-metric-agent",
"log.attributes.level": "info",
"resource.attributes.k8s@daemonset@name": "telemetry-metric-agent",
"log.attributes.logtag": "F",
"log.attributes.data_type": "metrics",
"resource.attributes.k8s@pod@start_time": "2024-02-01 09:59:25 +0000 UTC",
"resource.attributes.controller-revision-hash": "7758d58497",
"log.attributes.error": "rpc error: code = Unavailable desc = no healthy upstream",
"resource.attributes.pod-template-generation": "2",
"log.attributes.log@file@path": "/var/log/pods/kyma-system_telemetry-metric-agent-8wxcx_a01b36e5-28a0-4e31-9ee5-615ceed08321/collector/0.log",
"resource.attributes.k8s@pod@uid": "a01b36e5-28a0-4e31-9ee5-615ceed08321",
"resource.attributes.sidecar@istio@io/inject": "true",
"log.attributes.ts": 1706781583.437593,
"log.attributes.kind": "exporter",
"resource.attributes.k8s@container@restart_count": "0",
"log.attributes.interval": "6.132976949s",
"log.attributes.caller": "exporterhelper/retry_sender.go:129"
},
"fields": {
"observedTime": [
"2024-02-01T09:59:43.580Z"
],
"time": [
"2024-02-01T09:59:43.437Z"
]
},
"sort": [
1706781583437
]
}
```

The OpenTelemetry Collector configuration moves all JSON fields to the **attributes** map. The user-given log message emitted in the **msg** JSON field is moved to the OTLP **body** field.
The **level** JSON field determines the **severityName** and **severityNumber** fields. The mapping is automatically performed using the severity_parser operator.
Operators for the filelog receiver determine the emitting Pod. The k8sattributes processor adds other resource attributes to fulfill the semantic conventions.
The k8sattributes processor is also used to create resource attributes for pod labels. The same could be done with annotations.
An operator for the filelog receiver preserves the originating filesystem path of the record to be compliant with the semantic conventions for logs.
In the used configuration, we move the original log record to the **original** attribute for debugging purposes.

The OpenTelemetry Collector setup is able to extract the log message from different attributes, depending on their presence. This means that it is possible to support different logging libraries.

Non-JSON logs are preserved in the **body** field until the enrichment with resource attributes is completed.
Loading