Clarification on "blessed" approach for running multiple different datadog agent setups within a single cluster #1280

dlmather · 2024-07-09T23:45:09Z

Describe what happened:
I need to run multiple (currently just two) different datadog-agent setups within a single cluster with very very slight variations. Ideally all I need that's different between the setups is different node selectors / affinity (so they don't overlap), slightly different labels, and one different env var. Everything else can be the same. It's not clear to me how to easily make this work with the datadog operator:

it's unclear that creating multiple concurrent versions of the DatadogAgent object will work for this, for one, the operator seems to hit various issues and fail to converge. It seems that under this setup there would also be the creation of redundant cluster-agents which seems less than ideal.
It's unclear to me that using something like extended daemonsets would support my use-case correctly
the DAPs setup that's being worked on looks close to what I'd want as a simple system for overriding the base datadog agent configuration, but the set of settings that can be overwritten seems to be very limited still.

I'm seeking clarification on what the right path to configure this should be based on my requirements. For additional context, we deploy the datadog-operator currently through helm via the terraform helm add-on. If it will help for my use-case I can upgrade to any version, currently running a somewhat older one.

khewonc · 2024-07-11T16:36:04Z

Hi @dlmather, thanks for reaching out. I'm leaning towards recommending DAPs from the description you've given so far, but would you be willing to share your use case with us? DAPs can only override settings in the node agent, so I want to make sure I'm not suggesting this if it won't work for the changes you'd like to make.

We're adding the ability to override env vars in DAPs in v1.8.0 of the operator, which we expect to start qa for within the coming weeks. While we don't have label overrides available for DAPs yet, I can add a feature request in our backlog for that.

dlmather · 2024-07-11T22:21:36Z

Yeah, happy to share our use-case and thanks for taking a look:

we are trying to setup a network egress proxy to manage our out of cluster traffic, which includes telemetry from datadog-agent -> the datadog public APIs
on most of our nodes in the cluster, they should direct traffic to the proxy (for example by setting HTTP_PROXY env var to point at the proxy's address)
HOWEVER, on the k8s nodes that run the proxy (which we have on separately identified hardware from the rest), we still want to capture telemetry from the proxy itself but crucially we don't want to re-send that traffic back through the proxy itself as this would create an infinite feedback loop, since the proxy generates metrics in response to receiving traffic.
So in effect HTTP_PROXY should be set on all Datadog Agents not running on the set of hosts that handle egress traffic proxy.

In terms of the labels, that requirement is a little bit less certain and we have a few other options there, but it's slightly different from the env var. We use a label selector to attach pod-level AWS Security group policy to pods, we have different requirements for the security group policy that we want for datadog-agent pods that are running on the regular hosts vs the egress-proxy hosts. Datadog Agents on the regular hosts should only be able to talk to the egress proxy, datadog agents on the egress proxy should be able to talk out to the public internet. Since labels and selectors are what are used to drive the security groups that get attached in our system that would require us to set different labels on the agents based on the hosts they're running on.

dlmather · 2024-07-11T22:23:20Z

In general, if we can override the env vars via the DAPs that should very nicely meet at least the first requirement

khewonc · 2024-07-15T14:37:21Z

Thanks for the explanation! It sounds like DAPs could work for your case once env vars are supported in 1.8.0. I've also cleared adding label overrides to DAPs for 1.8.0 with the team so we'll try to get a PR for that in soon.

dlmather · 2024-07-15T18:34:27Z

Great! Thanks for that update, we'll look forward to that release then.

khewonc · 2024-08-16T21:13:48Z

@dlmather Hi, operator v1.8.0 was released recently that supports overrides for node agent labels and env vars for DAPs. We have docs on DAPs here to help get you started: https://github.com/DataDog/datadog-operator/blob/main/docs/datadog_agent_profiles.md. It's still in beta so it comes with the caveats that we don't recommend using it in production environments and that the CRD could have breaking changes in the future. Let me know if you have any questions about profiles that I can help with

levan-m added question Further information is requested feature request labels Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on "blessed" approach for running multiple different datadog agent setups within a single cluster #1280

Clarification on "blessed" approach for running multiple different datadog agent setups within a single cluster #1280

dlmather commented Jul 9, 2024

khewonc commented Jul 11, 2024

dlmather commented Jul 11, 2024 •

edited

Loading

dlmather commented Jul 11, 2024

khewonc commented Jul 15, 2024

dlmather commented Jul 15, 2024

khewonc commented Aug 16, 2024

Clarification on "blessed" approach for running multiple different datadog agent setups within a single cluster #1280

Clarification on "blessed" approach for running multiple different datadog agent setups within a single cluster #1280

Comments

dlmather commented Jul 9, 2024

khewonc commented Jul 11, 2024

dlmather commented Jul 11, 2024 • edited Loading

dlmather commented Jul 11, 2024

khewonc commented Jul 15, 2024

dlmather commented Jul 15, 2024

khewonc commented Aug 16, 2024

dlmather commented Jul 11, 2024 •

edited

Loading