Skip to content

Commit

Permalink
fix(logging): Update otel and tracing config (#5291)
Browse files Browse the repository at this point in the history
- fixes dataflow engine logging errors by specifying the otel exporter protocol

the updates are needed because the opentelemetry-java-instrumentation library requires a http(s) URI for the otel collector endpoint, regardless of the actual protocol. As the default for us is grpc, explicitly set the OTEL_EXPORTER_OTLP_PROTOCOL environment variable on dataflow pods

- fixes otel-collector config to remove deprecated jagger exporter (jagger now supports otel directly).

add the OTEL_EXPORTED_OTLP_PROTOCOL key to the seldon-tracing configMap and update the operator, crds and helm charts to support getting the value for this key from tracing config, similarly to how OTEL_EXPORTER_OTLP_PROTOCOL is fetched

- update versions used by ansible for jagger and opentelemetry-operator
- port 14250 no longer needs to be exposed under any config
- fix dependency ordering for dataflow/gradle.

previous ordering caused kafka-streams not to be able to find the slf4j logging provider
this lead to logs produced by kafka-streams not being recorded

Fixes #
Internal issue references:

#INFRA-568 Jagger latest is crashing
#INFRA-464 Otel is not able to parse its config (deprecated exporters)

Public issues:

otel-collector-1 container not starting: #5189 
related PR with partial otel functionality: #5170
  • Loading branch information
lc525 authored Feb 10, 2024
1 parent 94c107d commit b24107e
Show file tree
Hide file tree
Showing 30 changed files with 142 additions and 55 deletions.
2 changes: 1 addition & 1 deletion ansible/roles/jaeger/defaults/main.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
jaeger_namespace: observability

jaeger_version: v1.33.0
jaeger_version: v1.53.0
jaeger_yaml: "https://github.com/jaegertracing/jaeger-operator/releases/download/{{ jaeger_version }}/jaeger-operator.yaml"

jaeger_wait_for_deployments: true
2 changes: 1 addition & 1 deletion ansible/roles/opentelemetry/defaults/main.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
opentelemetry_namespace: opentelemetry-operator-system

opentelemetry_version: v0.49.0
opentelemetry_version: v0.92.0
opentelemetry_yaml: "https://github.com/open-telemetry/opentelemetry-operator/releases/download/{{ opentelemetry_version }}/opentelemetry-operator.yaml"

opentelemetry_wait_for_deployments: true
4 changes: 4 additions & 0 deletions docs/source/contents/getting-started/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ The top level keys are:

* `enable` : whether to enable tracing
* `otelExporterEndpoint` : The host and port for the OTEL exporter
* `otelExporterProtocol` : The protocol for the OTEL exporter. Currently used for
jvm-based components only (such as dataflow-engine), because `opentelemetry-java-instrumentation`
requires a http(s) URI for the endpoint but defaults to `http/protobuf` as a protocol.
Because of this, gRPC connections (over http) can only be set up by setting this option to `grpc`
* `ratio` : The ratio of requests to trace. Takes values between 0 and 1 inclusive.


Expand Down
2 changes: 1 addition & 1 deletion docs/source/contents/kubernetes/tracing/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Tracing

We support Open Telemetry tracing. By default all components will attempt to send OLTP events to `seldon-collector.seldon-mesh:4317` which will export to Jaeger at `simplest-collector.seldon-mesh:14250`.
We support Open Telemetry tracing. By default all components will attempt to send OLTP events to `seldon-collector.seldon-mesh:4317` which will export to Jaeger at `simplest-collector.seldon-mesh:4317`.

The components can be installed from the `tracing/k8s` folder. In future an Ansible playbook will be created. This installs a Open Telemetry collector and a simple Jaeger install with a service that can be port forwarded to at `simplest.seldon-mesh:16686`.

Expand Down
3 changes: 2 additions & 1 deletion k8s/Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
CUSTOM_IMAGE_TAG ?= latest
NEW_VERSION ?= 0.0.0
SELDON_MESH_NAMESPACE ?= seldon-mesh

HELM_CRD_BASE := helm-charts/seldon-core-v2-crds/templates
HELM_COMPONENTS_BASE := helm-charts/seldon-core-v2-setup/templates
Expand All @@ -25,7 +26,7 @@ create-helm-charts:

.PHONY: create-yaml
create-yaml:
helm template seldon-core-v2-certs ./helm-charts/seldon-core-v2-certs | grep -v "namespace:" > yaml/certs.yaml
helm template -n ${SELDON_MESH_NAMESPACE} seldon-core-v2-certs ./helm-charts/seldon-core-v2-certs | grep -v "namespace:" > yaml/certs.yaml
helm template seldon-core-v2-crds ./helm-charts/seldon-core-v2-crds > yaml/crds.yaml
helm template seldon-core-v2-components ./helm-charts/seldon-core-v2-setup | grep -v "namespace:" > yaml/components.yaml
helm template seldon-core-v2-runtime ./helm-charts/seldon-core-v2-runtime | grep -v "namespace:" > yaml/runtime.yaml
Expand Down
41 changes: 28 additions & 13 deletions k8s/helm-charts/seldon-core-v2-crds/templates/seldon-v2-crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -370,8 +370,11 @@ spec:
type: string
type: array
joinType:
description: One of inner (default), outer, or any (see above
for details)
default: inner
enum:
- inner
- outer
- any
type: string
joinWindowMs:
description: msecs to wait for messages from multiple inputs to
Expand All @@ -385,8 +388,11 @@ spec:
-> input1
type: object
triggersJoinType:
description: One of inner (default), outer, or any (see above
for details)
default: inner
enum:
- inner
- outer
- any
type: string
type: object
output:
Expand All @@ -403,8 +409,11 @@ spec:
type: string
type: array
stepsJoin:
description: One of inner (default), outer, or any (see above
for details)
default: inner
enum:
- inner
- outer
- any
type: string
tensorMap:
additionalProperties:
Expand Down Expand Up @@ -436,11 +445,11 @@ spec:
type: string
type: array
inputsJoinType:
description: 'One of inner (default), outer, or any inner -
do an inner join: data must be available from all inputs outer
- do an outer join: data will include any data from any inputs
at end of window any - first data input that arrives will
be forwarded'
default: inner
enum:
- inner
- outer
- any
type: string
joinWindowMs:
description: msecs to wait for messages from multiple inputs
Expand All @@ -462,8 +471,10 @@ spec:
type: string
type: array
triggersJoinType:
description: One of inner (default), outer, or any (see above
for details)
enum:
- inner
- outer
- any
type: string
required:
- name
Expand Down Expand Up @@ -8145,6 +8156,8 @@ spec:
type: boolean
otelExporterEndpoint:
type: string
otelExporterProtocol:
type: string
ratio:
type: string
type: object
Expand Down Expand Up @@ -8258,6 +8271,8 @@ spec:
type: boolean
otelExporterEndpoint:
type: string
otelExporterProtocol:
type: string
ratio:
type: string
type: object
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: mlops.seldon.io/v1alpha1
kind: SeldonRuntime
metadata:
name: seldon
namespace: '{{ .Release.Namespace }}'
namespace: '{{ .Release.Namespace }}'
spec:
seldonConfig: {{ .Values.seldonConfig }}
disableAutoUpdate: {{ .Values.disableAutoUpdate }}
Expand All @@ -12,19 +12,19 @@ spec:
replicas: {{ .Values.hodometer.replicas }}
- name: seldon-scheduler
disable: {{ .Values.scheduler.disable }}
serviceType: {{ .Values.scheduler.serviceType }}
serviceType: {{ .Values.scheduler.serviceType }}
- name: seldon-envoy
disable: {{ .Values.envoy.disable }}
replicas: {{ .Values.envoy.replicas }}
serviceType: {{ .Values.envoy.serviceType }}
serviceType: {{ .Values.envoy.serviceType }}
- name: seldon-dataflow-engine
disable: {{ .Values.dataflow.disable }}
replicas: {{ .Values.dataflow.replicas }}
replicas: {{ .Values.dataflow.replicas }}
- name: seldon-modelgateway
disable: {{ .Values.modelgateway.disable }}
replicas: {{ .Values.modelgateway.replicas }}
replicas: {{ .Values.modelgateway.replicas }}
- name: seldon-pipelinegateway
disable: {{ .Values.pipelinegateway.disable }}
disable: {{ .Values.pipelinegateway.disable }}
replicas: {{ .Values.pipelinegateway.replicas }}
config:
agentConfig:
Expand Down Expand Up @@ -55,4 +55,5 @@ spec:
tracingConfig:
disable: {{ .Values.config.tracingConfig.disable }}
otelExporterEndpoint: {{ .Values.config.tracingConfig.otelExporterEndpoint }}
otelExporterProtocol: {{ .Values.config.tracingConfig.otelExporterProtocol }}
ratio: {{ .Values.config.tracingConfig.ratio }}
1 change: 1 addition & 0 deletions k8s/helm-charts/seldon-core-v2-runtime/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ config:
tracingConfig:
disable:
otelExporterEndpoint:
otelExporterProtocol:
ratio:
serviceConfig:
serviceGRPCPrefix:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1432,6 +1432,11 @@ spec:
configMapKeyRef:
key: OTEL_EXPORTER_OTLP_ENDPOINT
name: seldon-tracing
- name: OTEL_EXPORTER_OTLP_PROTOCOL
valueFrom:
configMapKeyRef:
key: OTEL_EXPORTER_OTLP_PROTOCOL
name: seldon-tracing
- name: SELDON_POD_NAMESPACE
valueFrom:
fieldRef:
Expand Down Expand Up @@ -1480,6 +1485,7 @@ spec:
tracingConfig:
disable: {{ .Values.opentelemetry.disable }}
otelExporterEndpoint: '{{ .Values.opentelemetry.endpoint }}'
otelExporterProtocol: '{{ .Values.opentelemetry.protocol }}'
ratio: '{{ .Values.opentelemetry.ratio }}'
---
apiVersion: mlops.seldon.io/v1alpha1
Expand Down
1 change: 1 addition & 0 deletions k8s/helm-charts/seldon-core-v2-setup/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ imagePullSecrets:
opentelemetry:
# This will need to be customized to your open telemetry installation endpoint
endpoint: seldon-collector.seldon-mesh:4317
protocol: grpc
disable: false
ratio: 1

Expand Down
1 change: 1 addition & 0 deletions k8s/helm-charts/seldon-core-v2-setup/values.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ imagePullSecrets:
opentelemetry:
# This will need to be customized to your open telemetry installation endpoint
endpoint: seldon-collector.seldon-mesh:4317
protocol: grpc
disable: false
ratio: 1

Expand Down
1 change: 1 addition & 0 deletions k8s/kustomize/helm-components-sc/patch_tracingconfig.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ spec:
tracingConfig:
disable: HACK_REMOVE_ME{{ .Values.opentelemetry.disable }}
otelExporterEndpoint: '{{ .Values.opentelemetry.endpoint }}'
otelExporterProtocol: '{{ .Values.opentelemetry.protocol }}'
ratio: '{{ .Values.opentelemetry.ratio }}'
6 changes: 6 additions & 0 deletions k8s/yaml/components.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1052,6 +1052,11 @@ spec:
configMapKeyRef:
key: OTEL_EXPORTER_OTLP_ENDPOINT
name: seldon-tracing
- name: OTEL_EXPORTER_OTLP_PROTOCOL
valueFrom:
configMapKeyRef:
key: OTEL_EXPORTER_OTLP_PROTOCOL
name: seldon-tracing
- name: SELDON_POD_NAMESPACE
valueFrom:
fieldRef:
Expand Down Expand Up @@ -1097,6 +1102,7 @@ spec:
tracingConfig:
disable: false
otelExporterEndpoint: 'seldon-collector.seldon-mesh:4317'
otelExporterProtocol: 'grpc'
ratio: '1'
---
# Source: seldon-core-v2-setup/templates/seldon-v2-components.yaml
Expand Down
41 changes: 28 additions & 13 deletions k8s/yaml/crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -374,8 +374,11 @@ spec:
type: string
type: array
joinType:
description: One of inner (default), outer, or any (see above
for details)
default: inner
enum:
- inner
- outer
- any
type: string
joinWindowMs:
description: msecs to wait for messages from multiple inputs to
Expand All @@ -389,8 +392,11 @@ spec:
-> input1
type: object
triggersJoinType:
description: One of inner (default), outer, or any (see above
for details)
default: inner
enum:
- inner
- outer
- any
type: string
type: object
output:
Expand All @@ -407,8 +413,11 @@ spec:
type: string
type: array
stepsJoin:
description: One of inner (default), outer, or any (see above
for details)
default: inner
enum:
- inner
- outer
- any
type: string
tensorMap:
additionalProperties:
Expand Down Expand Up @@ -440,11 +449,11 @@ spec:
type: string
type: array
inputsJoinType:
description: 'One of inner (default), outer, or any inner -
do an inner join: data must be available from all inputs outer
- do an outer join: data will include any data from any inputs
at end of window any - first data input that arrives will
be forwarded'
default: inner
enum:
- inner
- outer
- any
type: string
joinWindowMs:
description: msecs to wait for messages from multiple inputs
Expand All @@ -466,8 +475,10 @@ spec:
type: string
type: array
triggersJoinType:
description: One of inner (default), outer, or any (see above
for details)
enum:
- inner
- outer
- any
type: string
required:
- name
Expand Down Expand Up @@ -8150,6 +8161,8 @@ spec:
type: boolean
otelExporterEndpoint:
type: string
otelExporterProtocol:
type: string
ratio:
type: string
type: object
Expand Down Expand Up @@ -8264,6 +8277,8 @@ spec:
type: boolean
otelExporterEndpoint:
type: string
otelExporterProtocol:
type: string
ratio:
type: string
type: object
Expand Down
11 changes: 6 additions & 5 deletions k8s/yaml/runtime.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,19 @@ spec:
replicas: 1
- name: seldon-scheduler
disable: false
serviceType: LoadBalancer
serviceType: LoadBalancer
- name: seldon-envoy
disable: false
replicas: 1
serviceType: LoadBalancer
serviceType: LoadBalancer
- name: seldon-dataflow-engine
disable: false
replicas: 1
replicas: 1
- name: seldon-modelgateway
disable: false
replicas: 1
replicas: 1
- name: seldon-pipelinegateway
disable: false
disable: false
replicas: 1
config:
agentConfig:
Expand All @@ -57,4 +57,5 @@ spec:
tracingConfig:
disable:
otelExporterEndpoint:
otelExporterProtocol:
ratio:
4 changes: 4 additions & 0 deletions operator/apis/mlops/v1alpha1/seldonconfig_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ type RcloneConfiguration struct {
type TracingConfig struct {
Disable bool `json:"disable,omitempty"`
OtelExporterEndpoint string `json:"otelExporterEndpoint,omitempty"`
OtelExporterProtocol string `json:"otelExporterProtocol,omitempty"`
Ratio string `json:"ratio,omitempty"`
}

Expand Down Expand Up @@ -188,6 +189,9 @@ func (t *TracingConfig) addDefaults(defaults TracingConfig) {
if t.OtelExporterEndpoint == "" {
t.OtelExporterEndpoint = defaults.OtelExporterEndpoint
}
if t.OtelExporterProtocol == "" {
t.OtelExporterProtocol = defaults.OtelExporterProtocol
}
}

func (sc *ServiceConfig) addDefaults(defaults ServiceConfig) {
Expand Down
Loading

0 comments on commit b24107e

Please sign in to comment.