-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container_oom_events_total always returns 0 #3015
Comments
Just tested this with 0.44.0 - issue persists. |
I believe the problem is that we are updating the OOMEvent count on the container itself on this line. Line 1256 in 24e7a98
To my understanding when an OOM event occurs the container is destroyed effectively removing it from the metric data. An example from my testing in AWS I0525 14:24:39.889468 1 manager.go:1044] Destroyed container: "/ecs/ID/CONTAINER_ID" (aliases: [alias], namespace: "docker") So we increment the OOM metric then deregister the metric :( Is my understanding correct of your implementation @kragniz ? If the expectation is for the container to be restarted after OOM, this makes the metric unusable in environments where containers are always replaced rather than restarted. ( Such as ECS ) |
I have the same issue. One of my container is running out of memory, I can see OOM event in syslog and kmsg, but compose.yml
|
Having the same issue here. Event not showing up in kubernetes also.
Any idea how to solve this ? Thanks ! |
Also having the same issue with cAdvisor 0.46.0 |
Hitting this issue in kubernetes as well. Commenting for visibility. |
I have done various tests of OOMKills under Kubernetes. I have - so far - seen only one use-case where I have observed |
See #3278 (comment) |
We encountered this problem just now too. |
@Creatone @bobbypage could I gently drag you to this very issue here about bugs with the OOM metrics? Is there anything to be done to get this problem addressed / the PR reviewed? |
An update on my previous comment: k8s |
regardless of the k8s version or whether |
This is exactly my point: the only use-case where I have seen that the OOM metric was not lost, was removed in k8s 1.28. Whether or not cadvisor should provide the OOM metric is a separate discussion. It is only relevant if the container is not deleted after being OOMKilled, which doesn't make a lot of sense for any managed container environment, to be honest. |
On my side, upgrade conmon and solved the issue ( minimum debian 12) |
Essentially, in a DevSecOps world, engineers want to be able to track container applications behavior when their memory usage exceeds the memory requests and kubelet has to step in to say, "whoah". That turns into an OOM (137 as stated above). e.g. I want to have a query like or something similar missing-container-metrics offers this exact metric, albeit 3 years old and littered with CVEs now. to that, I am unsure what container_oom_events_total is intended for if not the above user story. Please do help all of us who are following this thread if we are misunderstanding the intention of this metric. |
Running Docker (swarm), when OOM events occur the counter never increases. For reference, the node-exporter metric (node_vmstat_oom_kill) does increase.
Running cAdvisor v0.43.0.
The text was updated successfully, but these errors were encountered: