Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixing Prometheus Exporter with Collector's ServiceMonitor leads to double-scraping of metrics #2251

Closed
chris-pinola-rf opened this issue Oct 19, 2023 · 4 comments · Fixed by #2573
Assignees
Labels
bug Something isn't working needs triage

Comments

@chris-pinola-rf
Copy link

chris-pinola-rf commented Oct 19, 2023

Component(s)

operator, collector

What happened?

Description

When defining a Prometheus Exporter named "prometheus" on an OTel Collector CRD resource, the OTel Operator (very cleverly!) adds a TCP port named "prometheus" to the v1.Service resources for the Collector with the same port number declared in the exporter's configured endpoint.

The issue appears when also opting-in to creating a ServiceMonitor resource for the Collector by setting OpenTelemetryCollector.spec.observability.metrics.enableMetrics to true. When enableMetrics is true, the Operator creates a third v1.Service for the Collector with the "-monitoring" suffix. Notably, the "-monitoring" v1.Service does not receive the named "prometheus" port. The issue is that the named port is added to both the ClusterIP v1.Service and the "-headless" v1.Service resource (but again, not the "-monitoring" one).

When enableMetrics is true and the Collector's config specifies a Prometheus Exporter, the Operator (again, quite cleverly) adds a - port: prometheus to the generated ServiceMonitor's .spec.endpoints array. That ServiceMonitor resource spec also bears the following selectors:

  namespaceSelector:
    matchNames:
    - my-namepsace
  selector:
    matchLabels:
      app.kubernetes.io/instance: my-namepsace.otel
      app.kubernetes.io/managed-by: opentelemetry-operator

which unfortunately matches all 3 of the v1.Service resources associated with the Collector. Since the "prometheus" named port appears on both the (unsuffixed) ClusterIP v1.Service and the "-headless" v1.Service, the Prometheus Operator will match both of these services and pick up the named "prometheus" port on both, causing all metrics data from the Prometheus Exporter to be scraped and stored twice (since Prometheus treats each as a distinct scrape target): once with a job label of "otel-collector" and again with a job label of "otel-collector-headless".

image

Steps to Reproduce

Define an OTel Collector CRD as follows:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: my-namespace
spec:
  mode: "deployment"
  upgradeStrategy: "automatic"
  observability:
    metrics:
      enableMetrics: true
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:

    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s

    exporters:
      logging:
      otlp/tempo:
        endpoint: "..."
      prometheus:
        endpoint: "0.0.0.0:9999"
        send_timestamps: true
        resource_to_telemetry_conversion:
          enabled: true

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp/tempo]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [prometheus]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [logging]

Observe that v1.Service + ServiceMonitor resources are generated as shown below in the 'Actual Result' section.

Expected Result

Since I'm adding the Prometheus Exporter with the intention of having those exposed metrics scraped into my Prometheus, I would much prefer for the named "prometheus" port to be added only to the "-monitoring" v1.Service when setting OpenTelemetryCollector.spec.observability.metrics.enableMetrics to true.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: otel-collector
  namespace: my-namespace
spec:
  endpoints:
  - port: monitoring
  - port: prometheus
  namespaceSelector:
    matchNames:
    - my-namespace
  selector:
    matchLabels:
      app.kubernetes.io/instance: my-namespace.otel
      app.kubernetes.io/managed-by: opentelemetry-operator
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otel-collector
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
  name: otel-collector
  namespace: my-namespace
spec:
  clusterIP: ...
  clusterIPs:
  - ...
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - appProtocol: grpc
    name: otlp-grpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - appProtocol: http
    name: otlp-http
    port: 4318
    protocol: TCP
    targetPort: 4318
  selector:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/part-of: opentelemetry
  sessionAffinity: None
  type: ClusterIP
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otel-collector
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
    operator.opentelemetry.io/collector-headless-service: Exists
  name: otel-collector-headless
  namespace: my-namespace
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - appProtocol: grpc
    name: otlp-grpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - appProtocol: http
    name: otlp-http
    port: 4318
    protocol: TCP
    targetPort: 4318
  selector:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/part-of: opentelemetry
  sessionAffinity: None
  type: ClusterIP
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otel-collector-monitoring
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
  name: otel-collector-monitoring
  namespace: my-namespace
spec:
  clusterIP: ...
  clusterIPs:
  - ...
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: monitoring
    port: 8888
    protocol: TCP
    targetPort: 8888
  - name: prometheus
    port: 9999
    protocol: TCP
    targetPort: 9999
  selector:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/part-of: opentelemetry
  sessionAffinity: None
  type: ClusterIP

Actual Result

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: otel-collector
  namespace: my-namespace
spec:
  endpoints:
  - port: monitoring
  - port: prometheus
  namespaceSelector:
    matchNames:
    - my-namespace
  selector:
    matchLabels:
      app.kubernetes.io/instance: my-namespace.otel
      app.kubernetes.io/managed-by: opentelemetry-operator
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otel-collector
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
  name: otel-collector
  namespace: my-namespace
spec:
  clusterIP: ...
  clusterIPs:
  - ...
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - appProtocol: grpc
    name: otlp-grpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - appProtocol: http
    name: otlp-http
    port: 4318
    protocol: TCP
    targetPort: 4318
  - name: prometheus
    port: 9999
    protocol: TCP
    targetPort: 9999
  selector:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/part-of: opentelemetry
  sessionAffinity: None
  type: ClusterIP
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otel-collector
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
    operator.opentelemetry.io/collector-headless-service: Exists
  name: otel-collector-headless
  namespace: my-namespace
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - appProtocol: grpc
    name: otlp-grpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - appProtocol: http
    name: otlp-http
    port: 4318
    protocol: TCP
    targetPort: 4318
  - name: prometheus
    port: 9999
    protocol: TCP
    targetPort: 9999
  selector:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/part-of: opentelemetry
  sessionAffinity: None
  type: ClusterIP
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otel-collector-monitoring
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
  name: otel-collector-monitoring
  namespace: my-namespace
spec:
  clusterIP: ...
  clusterIPs:
  - ...
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: monitoring
    port: 8888
    protocol: TCP
    targetPort: 8888
  selector:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: my-namespace.otel
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/part-of: opentelemetry
  sessionAffinity: None
  type: ClusterIP

Kubernetes Version

1.25.6

Operator version

v0.86.0

Collector version

0.87.0

Environment information

There's a monitoring.coreos.com/v1.Prometheus CRD in my cluster from the prometheus-operator which is configured to scrape all ServiceMonitor resources in the cluster, e.g.

  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}

Log output

No response

Additional context

No response

@chris-pinola-rf chris-pinola-rf added bug Something isn't working needs triage labels Oct 19, 2023
@bmbbms
Copy link

bmbbms commented Oct 24, 2023

Actually,your excepted result is not very good when there are other recevicer or exporter ports to add. we do not need care where the prometheus port to add to which services , but we need to get a way to make servicemonitor to match only one prometheus service . In my case ,i just overwrite the label "app.kubernetes.io/instance" of service "XXX-headless" with suffix "--headless", and it is worked well on my prometheus . To make this change work all the time, you need change the operator code of "internal/manifests/collector/service.go" like this

const (
	headlessLabel  = "operator.opentelemetry.io/collector-headless-service"
	headlessExists = "Exists"
        // add the var 
	renameInstance = "app.kubernetes.io/instance"
)
...
	h.Labels[headlessLabel] = headlessExists
       // change the instance label
	h.Labels[renameInstance] = fmt.Sprintf("%s.%s-headless", otelcol.Namespace, otelcol.Name)

Finally, you need to rebuild through the dockerfile.

@chris-pinola-rf
Copy link
Author

@bmbbms That seems like it would work, but I'm trying to avoid building the operator binaries myself. My current workaround has been to set OpenTelemetryCollector.spec.observability.metrics.enableMetrics to false and define my own (more selective) ServiceMonitor out-of-band.

However, I do agree with you that not all exporter ports should be included in the ServiceMonitor. The Prometheus Exporter is a bit of a special use case, especially when paired with the Operator's ability to generate a ServiceMonitor.

If someone could help me figure out how we collectively want these features to interact, I'd be happy to try implementing a fix.

@jaronoff97
Copy link
Contributor

@yuriolisa i want to continue the convo here, because this is a bit of an odd one. I don't think we should remove the prometheus exporter port from the main collector service as is recommended here, as that may break existing users and IMO would be a bit confusing (why would the port only be exposed on the monitoring service.) I think the solution should be similar to the headless label that we add. I.e. we should add a label to the generated monitoring service that makes it clear which service the servicemonitor should be selecting. I don't think any changes are required if someone generates a podmonitor as there wouldn't be multiple potential targets.

@asm76
Copy link

asm76 commented Feb 16, 2024

Can we just exclude headless services from serviceMonitor selector?
Something like this:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/instance: tenant1.simple
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: simple-collector
  name: simple-collector
  namespace: tenant1
spec:
  endpoints:
  - port: monitoring
  - port: prometheus
  namespaceSelector:
    matchNames:
    - tenant1
  selector:
    matchExpressions:
    - key: operator.opentelemetry.io/collector-headless-service
      operator: NotIn
      values:
      - Exists
    matchLabels:
      app.kubernetes.io/instance: tenant1.simple
      app.kubernetes.io/managed-by: opentelemetry-operator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants