-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows exporter frequently shows container collector success status as 0 #1473
Comments
@jkroepke Can you please check and provide opinion regarding the same. |
no idea, not an export in windows container monitoring, but there is currently nothing to configure here. |
@jsturtevant @breed808 Can you please check once and suggest solution for the same. |
There are two difference errors. Its a bit difficult to understand why the error is happening since we don't have a picture of the containers on the system but I would guess it is due to the containers being either in a state of starting up or shutting down or something in between. The first The second |
Instead ignore errors, can we ask before if the container is in ready state? |
not really, since there are two different systems working on the same object at the same time. We could query that it is in ready state, but between the time we query and the time we execute the call to get the stats it could begin it's shut down process because kubelet (or what ever is controlling containers) has told it to shut down. This is essentially what is happening now, we should only get containers that are "ready" but when we go to actually execute the statistics command the container was already being shut down. |
@jkroepke @jsturtevant The containers are not in ready state. They do not exists at all. If they are already shut down then why |
I am not sure your set up (kubernetes, etc). If you can share that it could help. In a lot of environments there will be something creating/stop/deleting containers (such as kubelet in kubernetes) and then you have windows_exporter which is monitoring the system. Windows exporter doesn't know about the lifecycle of the containers it just queries the system and collects the stats and reports it back. The way stats are collected on windows_exporter is a three step process: get all the containers, then get stats for each container, then get the network stats. This means there could be a change on the system between the time it got the containers and stats for that container (ie the container is deleted). Using the kubernetes use case as an example, imagine the following: In a successful run you will get:
In a unsuccessful run you will get:
This can happen for starting up a container too where step 1 and 2 and 3 (from scenario 1) happen at microseconds apart and the container hasn't fully booted and doesn't therefor doesn't have stats. As this is an expected occurrence, particularly in high volume systems, the solution is to handle the errors properly and not error out. An example of that is in the windows containerd shim: https://github.com/jsturtevant/hcsshim/blob/6103d69d1f2604098781c8e848ab196239bb9aa6/cmd/containerd-shim-runhcs-v1/task_wcow_podsandbox.go#L251-L254 and kubelet (which i linked above but the code has been removed from kubelet) |
I will implement a toggle where end-users can decide to ignore "Element not found" errors. |
In #1473 I decide not implement a toggle, instead the exporter not longer fail hard if a container can't be scraped. I also take note of microsoft/hcsshim#933 , if the container can't be found the error will be logged as debug message. Lastly, fetching statistics is now done once. That should also solve the issue related to the warning. However I have to test the changes in #1473, which make take some time. |
Could someone assist here in verify is changes from #1473 are fine? |
Pre-build binary are available here: https://github.com/prometheus-community/windows_exporter/actions/runs/10400066649/artifacts/1814688310 |
Windows exporter frequently shows container collector success status as 0.
Following error logs which are being observed below in windows-exporter container logs do not exists containers:
All the data, metrics are visible on Grafana for the windows dashboard. Kindly suggest how to resolve this issue and what will be causing the container collector status as 0 and the significance of such error logs.
The text was updated successfully, but these errors were encountered: