Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datadog_agent source is not processing V2 API payload from Datadog agent accurately. #18690

Open
rpriyanshu9 opened this issue Sep 27, 2023 · 8 comments
Labels
source: datadog_agent Anything `datadog_agent` source related type: bug A code related bug.

Comments

@rpriyanshu9
Copy link

rpriyanshu9 commented Sep 27, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Hey there,

After upgrading Datadog agent from 7.39.2 to 7.45.0, we observed that some metrics which use the device tag stopped coming. We further investigated this and found out that the device tag was renamed to resource.device after the upgrade. This resulted in many dashboards being empty and monitors going off in Datadog. We had to revert the upgrade for fixing this issue. We looked into the source code of Datadog agent and Vector to find the root cause of this issue.

Here's what we think is causing this:

Starting from the 7.43.2 release of Datadog agent, the device tag was sent as a part of resources array : DataDog/datadog-agent#16264.

The datadog_agent source acknowledges the V2 API payload with the resources field, but does not handle the tags that are sent as a part of resources and not the tags array. ref:

serie.resources.into_iter().for_each(|r| {
// As per https://github.com/DataDog/datadog-agent/blob/a62ac9fb13e1e5060b89e731b8355b2b20a07c5b/pkg/serializer/internal/metrics/iterable_series.go#L180-L189
// the hostname can be found in MetricSeries::resources and that is the only value stored there.
if r.r#type.eq("host") {
log_schema()
.host_key()
.and_then(|key| tags.replace(key.to_string(), r.name));
} else {
// But to avoid losing information if this situation changes, any other resource type/name will be saved in the tags map
tags.replace(format!("resource.{}", r.r#type), r.name);
}
});

Because of the above block of code, the device tag that comes as an element of resources gets remapped to resource.device by the datadog_agent source. Because of this remapping, the metrics sent out by the datadog_metrics sink have the resource.device tag which is incorrect. It should be device only.

Seeking assistance in resolving this issue.

Discord thread: https://discord.com/channels/742820443487993987/1155850005391880214

cc @datsabk @jszwedko

Configuration

    api:
      address: 0.0.0.0:8686
      enabled: true
      playground: false
    data_dir: /data/vector
    sinks:
      datadog_metrics:
        batch:
          max_bytes: 512000
        buffer:
          max_events: 10000
          type: memory
          when_full: block
        default_api_key: ${DD_API_KEY}
        inputs:
        - modify_tags_for_datadog
        type: datadog_metrics
    sources:
      datadog_agent:
        address: 0.0.0.0:8282
        disable_logs: true
        disable_traces: true
        multiple_outputs: false
        store_api_key: true
        type: datadog_agent
      vector_source:
        address: 0.0.0.0:9000
        type: vector
    transforms:
      filter_for_datadog:
        condition:
          source: "true"
          type: vrl
        inputs:
        - datadog_agent
        - vector_source
        type: filter
      modify_tags_for_datadog:
        inputs:
        - filter_for_datadog
        source: |-
          del(.tags."k2.version")
          del(.tags."k2.skip_checks")
          del(.tags.container_id)
          del(.tags.display_container_name)
          del(.tags."git.commit.sha")
          del(.tags.kube_ownerref_name)
          del(.tags.kube_replica_set)
          del(.tags."io.kubernetes.pod.uid")
          del(.tags.image_id)
        type: remap

Version

vector 0.30.0

Debug Output

No response

Example Data

{
    "metric": {
        "name": "disk.in_use",
        "namespace": "system",
        "tags": {
            "device_name": "loop0",
            "host": "i-03fe32ac191d77928",
            "resource.device": "/dev/loop0",
            "source_type_name": "System"
        },
        "timestamp": "2023-09-26T15:25:14Z",
        "kind": "absolute",
        "gauge": {
            "value": 0.192
        }
    }
}

Additional Context

No response

References

No response

@rpriyanshu9 rpriyanshu9 added the type: bug A code related bug. label Sep 27, 2023
@rpriyanshu9 rpriyanshu9 changed the title datadog_agent source is not processing the V2 API payload from the Datadog agent accurately. datadog_agent source is not processing V2 API payload from Datadog agent accurately. Sep 27, 2023
@neuronull
Copy link
Contributor

👋 Thanks for the thorough report and analysis here. After reviewing everything, I agree with the consensus.

Basically the datadog_metrics sink encoder is encoding without the knowledge of the fact that the v2 parser in the agent source has namespaced the device to resource.device. Since we want to handle both the v1 and v2 endpoints , the sink encoder should check for the presence of both. Alternatively the agent source could be consistent in whether or not to namespace it.

Relatedly, this is the type of behavior we will want to test in the end-to-end test cases for the Datadog components that is in progress. I will link this issue there.

In the meantime, I believe this could be worked around by configuring the Agent to send on the v1 series endpoint instead of using the default of v2. This will mean Vector uses the parsing for v1, which doesn't namespace the device. Another workaround could be to have a transform that intercepts and removes the namespace for that tag.

@jszwedko
Copy link
Member

jszwedko commented Oct 3, 2023

For the workaround, to configure the Agent to use the v1 endpoint you can set use_v2_api.series: false in the Agent configuration file (or set DD_USE_V2_API_SERIES=false).

@neuronull
Copy link
Contributor

Another thought- there are in progress changes to migrate the datadog_metrics sink to send to the v2 series endpoint. In those changes, I'm handling the case for this discrepancy in the source's decoding. Essentially, once that is merged in, this issue should also be resolved.

@rpriyanshu9
Copy link
Author

For the workaround, to configure the Agent to use the v1 endpoint you can set use_v2_api.series: false in the Agent configuration file (or set DD_USE_V2_API_SERIES=false).

Yeah for now we're using this variable to get past the issue. BTW it's the datadog_agent source, which is at fault, right?

@jszwedko jszwedko added the source: datadog_agent Anything `datadog_agent` source related label Oct 5, 2023
@neuronull
Copy link
Contributor

👋 this issue was addressed in #18761 , which is included in the recent v0.34.0 release.

@neuronull
Copy link
Contributor

Re-openening since v0.34.1 will contain #19138 , which reverts to the v1 behavior.

@neuronull neuronull reopened this Nov 15, 2023
@rpriyanshu9
Copy link
Author

Hi @neuronull @jszwedko, are there any updates on this issue?

@jszwedko
Copy link
Member

jszwedko commented Apr 5, 2024

Hi @neuronull @jszwedko, are there any updates on this issue?

No updates unfortunately; I believe this issue still exists. The fix we'd like to do is to switch the datadog_metrics sink to using the /v2 metrics API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: datadog_agent Anything `datadog_agent` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants