-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki POD crashes randomly #605
Comments
Please ignore it.. I think its because of node health check is not working properly |
I fixed the Nodes Liveness probe issue.. but still it is not resolved. as soon as I search logs in Grafana.. loki pod crashed. please find attached the loki pod details: kubectl describe pod loki-d86549668-2c4r7 -n prometheus |
Hello @gauravbist , Can you share the logs of the pod crashing please ? Thank you ! |
is that you need? |
I don't see any crash log, can you try to remove the liveness and readiness checks for a while see what happen ? |
How to do it? Should I remove below mentioned lines in loki deployment running yaml file?
|
please yes ! Not sure why loki returns a 500 yet. |
After removing liveness and readiness checks, so far no crashed.. but in grafana old logs are not coming... |
It is still not stable.. sometimes it works but most of the time it crashes. Please let me know how to troubleshoot or debug the same. I have attached loki-promtails logs too.. |
I think your Loki is getting killed, can you look at memory usage and kubernetes events.
|
I don't think due to memory or cpu usage it is getting killed because node have 30GB memory with 4 vCpu core. and only few pods are running on it.. even events are also not showing it.. Is there any setting or configuration where I can enable debug mode, so that we can trace the root cause? |
Can you try to upgrade your helm deployment with latest and also delete all loki pods once that's done ? |
@gauravbist please check on the memory usage. We have seen an unbound memory leak in Loki. |
Ran into this problem today with the latest version of Loki. Health / Liveness checks keep failing and k8s kills the pod. Loki itself seems find and I can query logs until k8s kills the pod.
after a while it'll come back online. |
see #613 |
yeah let's keep the first issue only this is a dup of #613 |
Describe the bug
Loki Pod crashes randomly with Error: "Liveness probe failed: HTTP probe failed with statuscode: 500"
To Reproduce
Search in Grafana give error: "Unknown error during query transaction. Please check JS console logs" . Found Loki pod crashed with error: "Liveness probe failed: HTTP probe failed with statuscode: 500"
The text was updated successfully, but these errors were encountered: