-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes agent stops sending data #4187
Comments
By comparison, here's some output from an agent that is sending metrics. It has been up for ~2x as long, but processed about ~100x metrics packets. (I haven't checked how skewed our load is ATM, but it is definitely not 100x...)
|
So I found some of these in
It varies a bit what is healthy and unhealthy, but we are seeing quite a few of them. |
Hi @msiebuhr, thanks for reaching out
If that doesn't help, feel free to reach out to support, they'll ask you to send a flare from one of these agents so we can troubleshoot deeper. |
Thanks, @hkaj. I have updated and things haven't broken for a few hours (but we're not at peak traffic yet) and I'm in contact with support to move things further along. But what about the error? We do see quite a few of them, and it would be nice to fix it... |
Fixed by applying the following initContainer before starting the datadog agent: containernetworking/plugins#123 (comment) |
Upon closer inspection of the service, having an initContainer running |
Output of the info page (if this is a bug)
kubectl exec -it datadog-agent-ltqjf agent status
Gives
Describe what happened:
The container/pod above has stopped sending data to Datadog. It looks to be related to a liveness-probe failing, causing a restart of the pod.
We also get a lot of the following in the logs, which I suspect may be correlated.
Describe what you expected:
Less errors, more metrics.
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc):
Running the service agent (1.12.1) on Kubernetes (GKE) - with non-local Statsd enabled (
DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
).The text was updated successfully, but these errors were encountered: