Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DD Connector missing APM stats in Traces #32219

Closed
Zurvarian opened this issue Apr 8, 2024 · 6 comments
Closed

DD Connector missing APM stats in Traces #32219

Zurvarian opened this issue Apr 8, 2024 · 6 comments
Labels

Comments

@Zurvarian
Copy link

Component(s)

connector/datadog, exporter/datadog

What happened?

Description

We've upgraded from version v0.82.0 to v0.97.0 and as a condition for the upgrade we've changed our setup to use datadog/connector to be able to calculate the APM Metrics.
We can see the APM metrics in the main Traces dashboard, however Traces APM metrics are gone.

Steps to Reproduce

Configure OTEL to consume Traces from zipkin.
Route traces via datadog/connector into datadog/exporter.

Expected Result

APM Stats are attached to each Trace so we can see them when investigating issues.

Actual Result

APM Stats are only present in the main section of the Traces page.

Collector version

v096.0, v0.97.0

Environment information

We use the OTEL distrib installation that comes with the official Helm chart, deployed in a GCP Kubernetes.

OpenTelemetry Collector configuration

processors:
  metricstransform:
    transforms:
      - include: (.*)$$
        match_type: regexp
        action: update
        new_name: dna.clp.$${1}


  # this processor helps us to set and environment globally in opentelemetry
  resource/environment:
    attributes:
      - key: deployment.environment
        value: "dev"
        action: upsert

  # sadly in queue proxy all the knative names are using the pod name as the service name in otel
  # we're using this to replace the truncate the bad naming in knative.
  transform/knative:
    trace_statements:
      - context: resource
        statements:
          - replace_pattern(attributes["service.name"], "-[0-9]{1,5}-deployment-[a-fA-F0-9]{8,10}-[a-zA-Z0-9]{5}", "-xxx")
          - set(attributes["service.name"] , Concat(["our_prefix",attributes["service.name"]] , "-"))
    metric_statements:
      - context: resource
        statements:
          - replace_pattern(attributes["service.name"], "-[0-9]{1,5}-deployment-[a-fA-F0-9]{8,10}-[a-zA-Z0-9]{5}", "-xxx")
          - set(attributes["service.name"] , Concat(["our_prefix",attributes["service.name"]] , "-"))
  memory_limiter:
    check_interval: 1s
    limit_percentage: 80
    spike_limit_percentage: 20

  batch:
    send_batch_max_size: 1000
    send_batch_size: 100
    timeout: 10s

  # k8sattributes processor adds the necessary attributes to enable trace/metrics
  # correlation by means of container tags.
  k8sattributes:
    passthrough: false
    auth_type: "serviceAccount"

    # Pod association using resource attributes and connection
    pod_association:
      - sources: # Match by connection (IP of the source request)
          - from: connection
      - sources: # Match by the unique Pod UID
          - from: resource_attribute
            name: k8s.pod.uid
      - sources: # Match by the Pod IP
          - from: resource_attribute
            name: k8s.pod.ip

    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.node.name
        - k8s.namespace.name
        - k8s.pod.start_time
        - k8s.replicaset.name
        - k8s.replicaset.uid
        - k8s.daemonset.name
        - k8s.daemonset.uid
        - k8s.job.name
        - k8s.job.uid
        - k8s.cronjob.name
        - k8s.statefulset.name
        - k8s.statefulset.uid
        - k8s.container.name
        - container.image.name
        - container.image.tag
        - container.id

      labels:
        - tag_name: kube_app_name
          key: app.kubernetes.io/name
          from: pod
        - tag_name: kube_app_instance
          key: app.kubernetes.io/instance
          from: pod
        - tag_name: kube_app_version
          key: app.kubernetes.io/version
          from: pod
        - tag_name: kube_app_component
          key: app.kubernetes.io/component
          from: pod
        - tag_name: kube_app_part_of
          key: app.kubernetes.io/part-of
          from: pod
        - tag_name: kube_app_managed_by
          key: app.kubernetes.io/managed-by
          from: pod
receivers:
  opencensus:
    endpoint: ${env:MY_POD_IP}:55678
  jaeger: null
  prometheus: null
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        endpoint: ${env:MY_POD_IP}:4318
  zipkin:
    endpoint: ${env:MY_POD_IP}:9411
connectors:
  datadog/connector:
    traces:
      ignore_resources: [ "(GET|POST) /healthcheck", "(GET|POST) /probe" ] ## ignore healthcheck
exporters:
  datadog/exporter:
    api:
      key: ${env:DATADOG_APIKEY}
    sending_queue:
      queue_size: 10000
service:
  telemetry:
    logs:
      level: "INFO"
      development: false
      encoding: "json"
  pipelines:
    # Traces must go first through the DD connector to provide APM metrics.
    # Then, from the traces/exporter pipeline we can emit traces with APM metrics associated to the root spans.
    # And emit global APM metrics via the metrics pipeline.
    traces/receiver:
      receivers: [ zipkin, otlp ]
      processors: [ memory_limiter, resource/environment, transform/knative, k8sattributes, batch ]
      exporters: [ datadog/connector ]
    traces/exporter:
      receivers: [ datadog/connector ]
      processors: [ batch ]
      exporters: [ datadog/exporter ]
    metrics:
      receivers: [ opencensus, otlp, datadog/connector ]
      processors: [ memory_limiter, metricstransform, resource/environment, k8sattributes, batch ]
      exporters: [ datadog/exporter ]
    logs:
      receivers: [ otlp ]

Log output

No relevant logs found.

Additional context

When running the same version (v0.97.0) without the connector and with the feature flag: --feature-gates=-exporter.datadogexporter.DisableAPMStats disabled then the exporter is able to produce the APM Stats per Trace (and even per Span)

@Zurvarian Zurvarian added bug Something isn't working needs triage New item requiring triage labels Apr 8, 2024
Copy link
Contributor

github-actions bot commented Apr 8, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mx-psi mx-psi added priority:p1 High data:traces Trace related issues and removed needs triage New item requiring triage labels Apr 9, 2024
@mx-psi
Copy link
Member

mx-psi commented Apr 9, 2024

@Zurvarian We will take a look, but I recommend you to also file a support ticket with Datadog (see https://www.datadoghq.com/support/) so that we can work with you without you having to share details of your setup in the open.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 10, 2024
@Zurvarian
Copy link
Author

Hi @mx-psi @dineshg13 @liustanley @songy23 @mackjmr @ankitpatel96,

Is there any news on this?

@github-actions github-actions bot removed the Stale label Jun 11, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 12, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants