-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows_exporter service failed to start on reboot #551
Comments
+1, we had to restart service after updates I think startup type should be Automatic(Delayed start) instead of Automatic |
@f1-outsourcing Note that those services have a "Subsequent failures" set to "Take no action", meaning it will simply stop trying if it fails to start twice. The first number, reset after, doesn't matter much when subsequent failures is set to restart. We could possibly set the restart interval to something higher to space restarts out, but before this report, we've never heard of this being a problem. |
@carlpett I've also noticed this behavior. I'm able to reproduce this consistently by rebooting one of the servers I manage. I'd fully expect to see logs in the "Application" event queue from the source "windows_exporter" when the service fails to start, but I don't. All I see is the same thing reported by @f1-outsourcing. Events are created for the service failing to start due to a timeout. It's also worth noting that I've seen this issue on pretty much all 200~ windows machines we have. See the following screenshots: The service fails to start due to timeout: The service manager fails the service: The application event queue has no windows_exporter entries in this time period: Should I circumvent event viewer? I know stdout is a logger option but I didn't see an option to log to a flat file. If you've got some ideas for troubleshooting this I'd be willing to run whatever is needed. This issue has been quite troublesome for us during patching. |
I do have the same issue on Windows Server 2016. |
Same issue here. |
Same issue on Server 2019. Fresh installed machines running the windows_exporter agent do not start the agent on reboot. Playing with the automatic restart options did not resolve the issue. |
I agree. and at the very least you should have a the First and Second failure set to |
same issue on windows 8.1 |
I've still been unable to reproduce this, unfortunately, so anything you can find about why it is happening on your systems, but not all, would be useful. @babunatarajan You seem to have a completely different issue, since your error is a timeout during metric collection from a running exporter. |
@carlpett if i see this correctly, it works with Delayed start, so i my best guess is that the windows_exporter service starts and immediately exits again during its first try, probably because a dependency is not fulfilled at that early stage of boot time. |
I already set the Delayed Start as soon as it failed to start at the boot, but never really tested just because it is prod environment. Thanks |
I set my servers to delayed start and it seemed to at least start correctly when Windows started up. I'm unsure if it would restart on failure correctly or not though. |
There's a lot of different threads flying here, and a few misconceptions. Then, on the topic of Delayed starts. I'm not in principle against it (it will mean you will not have metrics for ~2 minutes longer than otherwise after a reboot, but that is probably not a huge deal in most cases), but there seems to be a mixed bag of experiences reported on whether it helps or not. I've now tried booting completely without networking and related services enabled, and it does not appear to prevent the windows_exporter from starting. So there's something deeper going on. Are any of you overriding the service account for the service, so you could have a dependency on Active Directory being available? |
The 2019 machines I was seeing the problem on are AD joined and hardened with the CIS guidelines. I never had issues last year when I was still using Windows Server 2012 R2 and an older version of the exporter with the service starting correctly on reboot so maybe it's a 2019 Server issue? |
Hi everyone, I was able to get through this issue by running this command : Delayed start
Restart option
Tested on Windows Server 2012 R2 / 2016 / 2019. Hope its help. |
I have the same problem on freshly provisioned Azure Windows VMs: windows_exporter fails to start after VM reboot.
|
solved for me with a folder exclusion rule on Windows Defender |
Same issue here.
I can confirm setting the service to Delayed Start fixed the issue. Why can't this be set to Delayed Start by default? |
@josephB Good call on the exclusion, in our case looks like our AV tools needed an exception following aug updates. |
@chinhodado As I mention in my comment above, it doesn't seem to solve it very reliably. If we could figure out why it fixed it for you, that'd be a big step forward towards making a change. If it is related to antivirus starting up, as indicated by some other commenters lately, we'd be much better served by setting the correct service dependency. |
Ill see if i can get more detail. |
Setting delayed start doesn't help. Until it's fixed, I'm using a scheduled task which starts windows_exporter if it's not running every 5 mins. |
@dry4ng It'd be interesting to see if your case is solved with an exception in Windows Defender as mentioned above? |
In my case, almost all my Windows Server 2016/2019 machines will start the service with the automatic delayed startup after a reboot. I seem to always have a few that do not and I have to go manually start them once I get alerted. I can confirm that I've removed the Windows Defender feature from my Windows 2019 servers because I am using a third-party AV software. I was also thinking of having some kind of work around to start up the service when it is stopped but had been hesitant to put one in place so far. |
Is there any log that we can look at to debug why the service doesn't start? AFAIK the service doesn't generate any log file. |
I installed 0.15 yesterday because I noticed added a dependency for the Windows service on the WMI service. I experienced the same problem where the service would not start with 0.15 when the start up type is set to Automatic. When I changed the start up type to Automatic (Delayed Start) after upgrading to 0.15 the service did start correctly after a reboot. I noticed looking in the event viewer that the windows_exporter service did start but had problems collecting metrics, and I guess stopped itself, before the event that says the "Windows Management Instrumentation" service was started. Maybe this is the service that should be the dependency instead of or in addition to "WMI Performance Adapter"? |
I'm happy to test this but I'm not great with GIT so would be great if you could provide an MSI |
See attached. I've included both the EXE and MSI built from |
Can confirm works now. Especially with an older server where even delayed start and setting |
Finally got a chance to test this. Happy to say that it appears fixed from my testing. I was able to get it to consistently fail with previous versions but the provided version above seems to have done the trick and it now starts successfully. Thanks to all involved in getting this over the line. |
Thanks all. I'll aim to get a new release with this fix out in the next few days, then hopefully we can close this one off 🤞 |
Hi @breed808 tested the windows_exporter.zip provided above. It fixed the cpu usage issues and timeouts which I was having. What I noticed however is that I am experiencing a memory leak. At one point agent hit 1GB ram usage. |
@matthewsc05 is the memory leak present on the latest version or just on the build I provided earlier? |
I've had the build from the post above (on 26th August by breed808) installed for the last week or so on 3 servers (Windows Server 2016) and can't see any high memory from it. I've compared it to the rest of my servers running an older version of Windows Exporter and the memory levels look similar across the versions. Thanks |
Hi @breed808 its with the previous build provided above windows_exporter.zip Could this be related to a particular windows version? This was tested on windows server 2019 - we had to remove the agent due to the high memory usage. |
@matthewsc05 it's more likely to be the collectors you have enabled. We've identified some problem collectors using WMI as a metric source in #813, and there's been a recently identified leak in the That said, if you're running the same collectors between versions and there's a noticeable difference in the new version, we'll need to investigate. I'm concerned that we may be introducing a new issue in the next release while trying to fix this one. |
Hi @breed808 I agree, for me I was using the default configuration, so everything was enabled. I am moving to a dedicated configuration so this outcome might change for me soon. Fix is still important in my opinion as in extreme cases the 30s timeout is being hit. For me when I had this using the above provided package and deleting registry keys of previous installation fixed my issues until I hit this memory issue which I am looking into improving. |
Fair enough. If we can't identify the cause of the issue in the next few days, I'll cut a release and list it as a known bug. Let me know if you find anything while using a dedicated configuration. |
@breed808 really appreciate your attention on this. Do you have any timeframe on an update? Thanks! |
Apologies for the delay, life got in the way again. I've released |
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
Behaviour of init functions has been centralised in `collector/init.go`, and can be called during exporter startup. This allows the exporter to control the timing of collector initialisation, rather than relying on the import & `init()` method. This should reduce unexpected behaviour arising from the use of `init()`, such as prometheus-community#551. Signed-off-by: Ben Reedy <[email protected]>
After updates and rebooting the server, the windows_exporter service was not running
The windows_exporter service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
When I look at the recovery options of the windows_exporter service they are not as other 'standard' windows services. Looks like none has set reset fail count after:0 and restart service after: 0
exporter:
other examples:
I am not really an expert on the settings of recovery of services, but maybe someone should look at these. Maybe it is better to put this minutes on 3 or 5?
https://docs.microsoft.com/en-us/archive/blogs/jcalev/some-tricks-with-service-restart-logic
https://social.microsoft.com/Forums/ro-RO/3db76753-4607-4a20-97a0-790c73e379cc/the-actions-after-system-service-failure?forum=winserver8gen
The text was updated successfully, but these errors were encountered: