-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opentelemetry-helm-charts/opentelemetry-operator] Allow deploying the OTEL Operator as a Daemonset #1023
Comments
@AlecAttwood the OpenTelemetry Operator does not support being run as a Daemonset. Can you describe more how pods are starting before the Operator? Is the Operator deployment getting evicted and spun up somewhere else? PodDistruptionBudgets may help |
Hey @TylerHelmuth, sorry for the delayed response. The issue occurs during a Kube version upgrade e.g. going from 1.26 -> 1.27. During an upgrade a new node is created on the new version. For some reason, communications between 1.26 and 1.27 nodes during this time is interrupted. In this scenario, if a new OTEL operator isn't created on the new 1.27 node, any app that starts won't have the OTEL sidecar injected. Due to the operator only existing on 1.26 nodes. Now this can be worked around, Pod Topology is an option for example. But I feel like a daemonset is a guaranteed way to avoid this particular issue. But even if people aren't experiencing our particular issue, a daemonset is still useful for managing deployments on large clusters. Having a option in the Helm Chart to either deploy using a Deployment or a Daemonset would be a good feature in my opinion. |
I think Topology is definitely what you want. If the chart allowed installing the operator as a daemonset we'd be deploying the operator in a pattern unsupported by the OpenTelemetry Operator SIG. |
I'm curious why the Daemonset pattern would be unsupported. What is the rationale behind that? Just asking out of curiosity, I can address my issues with pod topology and some other config, but daemonsets are very commonly used. Was there any documentation or articles available to explain the decision not to support daemonsets? |
Other @open-telemetry/operator-approvers please chime in, but I believe having multiple operators managing the same webhooks/custom resources would create a problem. In general I think K8s Operators are not run as daemonsets because that makes it much hard to centralize the management of the objects it controls. I don't know of any Operators that do. |
The otel operator does not scale naturally. Increasing the number of replicas in a naive way only gives you hot standbys, as the replicas will elect a leader, and the leader will do all the work. It is possible to partition the workload manually, but not in a way that would make sense for a DaemonSet. In general, running as a DaemonSet isn't much different from running as a Deployment with a lot of replicas, so allowing it would be giving users a footgun with no upside.
That doesn't sound right to me. The operator doesn't care about Nodes at all, it talks directly to the API Server. Can you reliably reproduce this problem? I don't think running the operator as a DaemonSet would help with it, sounds more like a bug in K8s itself potentially. |
@swiatekm-sumo thanks for the context. I was unaware the operator elected a leader, that's very useful. I was wondering why a daemonset wasn't viable when a deployment with replicas was. |
Overview
Allow the option to deploy the OTEL Operator as a Daemonset.
Why do we want this?
Currently the only option to deploy the Operator is with a kube deployment. Adding the ability to deploy a daemonset would allow a greater flexibility when choosing how to run OTEL. It also addresses a very specific issue where, when deploying new nodes or AKS versions, pods can start up before the operator does and do not have the OTEL collector sidecar injected. Deploying the operator as a daemonset would ensure the operator is present.
The text was updated successfully, but these errors were encountered: