-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grafana-agent-operator: integration pod failing (grafana-agent-metrics does not exist) #3282
Comments
I'm experiencing the same problem |
I'm getting the same issue :( |
Same here. |
I also ran into this. The fix worked. |
the same error. |
Thanks for workaround! |
I, too, experienced this on at least 2 clusters, Kubernetes 1.26.3, Grafana Agent 0.33.1, deployed via Grafana Cloud instructions, using the operator method installed via Helm, configured to collect metrics, logs, and events. Restarting the daemonset worked in each case. Thanks for the workaround! |
I've had some difficulty reproducing, but I think I have now. I can only get it to work if I:
The integrations daemonset continues crash-looping, even though the secret is correct. Ordering seems to matter here, and a reconcile where the integration exists but the metricsinstance does not seems to cause a problem. I'd expect reconcile to fail in that case, but it seems to put things in a bad state. The second problem is why the daemonsets are not recovering once the secret is correct. The volumes all look right. Perhaps something is off with the reloader there. I am continuing to dig on this. |
Ah, the reload fails from the sidecar container. That explains why it does not self-resolve:
But I'm still not sure why a crash-looping agent container would not get the updates if the reloader is otherwise working. Need to dig more. |
Fix to reloader is in thanos-io/thanos#6519. If that is accepted, I will try to get it into prometheus-operator who owns the reloader image, and then into the operator defaults. |
The fix for this has been merged upstream, and the main build of prometheus-config-reloader has a fix for this deadlock. It can be used by setting a field on your
If that field is not availible, you may need to update your CRD definitions from the latest in the repo. I will leave this issue open until I can update the default image the operator uses to a stable release version. |
I'm installing the grafana agent operator on our aws eks cluster (tried on a local kind cluster and not getting this error).
grafana-agent-integrations-ds
pod lands in this stateand the relevant logs show (see bottom for complete logs)
Fix
I realized that this is fixed simply by restarting the integration daemonset...
kubectl -n monitoring rollout restart daemonset grafana-agent-integrations-ds
Not sure why this is happening, but would be nice not to have to restart the daemonset everytime.
Steps to reproduce
where the manifest is
complete logs of the restarting pod are
The text was updated successfully, but these errors were encountered: