-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: Windows service timeout during system startup #23479
Comments
Thanks for the report. Can you try instrumenting the runtime with timestamped logs, following the code being executed before main? Just compile Go from source, and use the produced binary to compile your application. Then put print calls in the runtime code and just recompile your software and test. Within the runtime, you can use println() to print without going through fmt, and nanotime() or walltime() to get a timer reading to print. Start from runtime.main() and runtime.init() and check if you can find where time is being lost. |
I cannot reproduce it here (I am running it on my WIndows 10 computer) - your service starts and stops fine. I do not have any bright ideas. Alex
Maybe some code that runs during std packages init functions is broken when running as a service. We don't test code you import to run as a service. Maybe try removing some imports and see if it improve your situation. Alex |
and you are not checking for errors here and when you Kill and Wait for process to exit. Not that it matter for your problem. Alex |
I rebuilt 1.9.2 from source with instrumentation in the I also reduced the number of unused standard package imports in The simple C++ Windows service (mentioned previously) was modified to do the following:
The C++ service was used instead of the Go There are thus three types of timestamped logs in the log file when the C++ service starts and launches the Go application:
The results of a set of runs can be found in the following Google Sheet. There is one manual start of the service (ie system already running), followed by 15 boot-ups. The fourth column is the time delta between that log and the previous one, in seconds. It has been conditionally formatted to highlight any value >= 1 second. For the manual start, there were no significant delays. A separate set of 10 manual service starts on a running system (data not shown) also showed no significant delays between any of the logs. For the boot-up runs, there are cases where there was no significant delays (such as boot 2 & 7) and others where there were noticeable delays. These delays tend to be seen in several places:
There are runs where there was only one significant delay and others where there were many. For the delay between the The current hypothesis is that when Windows is starting services during boot-up, there are often times when resources available to the processes (possibly disk and/or CPU?) are limited. These resource constraints might affect a Go application more than others due to its size, as a larger executable requires more disk I/O and CPU time to load into memory. If the application doesn't hit any significant resource constraints, it starts without delay. If it does, the delay(s) depend when the constraints occur. |
I do not know. Maybe @aclements knows. Austin, can you suggest something here to understand why it is so slow to start?
It sounds reasonable to me. Windows is trying to get system up and running as soon as possible. It would delay anything that it consider might affect that goal.
I don't think executable size maters here. Maybe memory allocations affect things. Go reserves large amount of address space upfront, but it is not expensive for a normal process. And, as you have discovered, simple Go service does not suffers delays. Have you tried to build simple Go service with many inits (not starting Go executable with many inits from a service)? I really don't have any good ideas. Sorry. Alex |
We have seen simple Go service experience delays during startup, though the probability of it happening seem to be less compared to one with one with many imports. A Go service with many imports behaves similar to one that runs as a 'normal' executable launched by a service, often suffering start-up delays. In fact, we started debugging with this first and moved to having the Go executable (in 'non-service' mode) launched by a service (C++ or Go) to get a measure of the time for the program to be launched. When the delay issues occur during Windows start-up, it doesn't seem to make much difference whether the Go executable is running as the service itself, or launched by the another service. |
Thank you for explaining. Just an idea - I believe Docker is running as a Windows service. If that service executable is written in Go, how come they don't have problems like yours? Alex |
Good point. When I get a chance, I'll try installing Docker on a problematic VM and see how it behaves. |
It is. Runtime initialization calls a few functions before |
I instrumented the earlier runtime initialization calls, with more focus on the middle of The results are in the second tab of the Google Sheets spreadsheet. Six of the boot-ups had significant times in the newly instrumented code:
Other boot-ups either had delays elsewhere, or none of significance. Seems similar to the previous results, with there being locations in the code where delays are more likely to happen than others, but they don't happen in these places all the time. |
Indeed. I have nothing else that helps you. Sorry. Alex |
Does your binary use cgo? If so, can you try commenting out anything that depends on cgo and see if you still those random delays? Also, if that's not the case, you can still try dropping parts of your binary (like, entire packages commenting out functionalities) and see if you can bisect what causes the initial delay as side effect. I'm assuming this is something specific to your binary and not all Windows go binaries of course. Which DLLs your binary depend on? You can use a tool like DependencyWalker to see a complete list of dependencies. |
That's good question - if your service executable is depending on anything network related then you might experience such delays (because of complexities how TCP-stack is getting up'n'running). Either mark your service as "Delayed start" or remove TCP-related code from the stages which send Windows control service signal that it's up. |
Our actual application does not use CGo. Nor does the sample winserv program I have put on GitHub (basic Windows service with a lot of unused imports and some timestamped logging). Using DependencyWalker on winserv takes almost 2 minutes to analyze the binary and shows a large tree with three top-level DLLS: WINMM.DLL, WS2_32.DLL, & KERNEL32.DLL. Feeding it a simple Go Hello World (just fmt.Println()) gives a similar result, at least time terms of analysis time, top-level DLLs, and functions listed. There are also a number of errors reported for both executables, including being unable to open many DLL files and the following (this is from the gohello.exe, but is the same for winserv.exe):
Is there anything specific in the DependencyWalker results I should be looking for? When I use
|
Hi, Thanks & Regards, |
I am having the exact same problem using Go to generate a windows service.
|
I assume 'debug' and 'release' refer to with and without debug symbols ( I rebuilt our application with those link flags, which resulted in a Windows executable that was about 1/3 smaller. However, it behaved the same as before, timing out when starting as a normal Windows service at boot-up. |
I think the correct way to solve this would be to build the service support into the runtime. A Windows service can be pending starting without be cancelled after sending a SERVICE_START_PENDING message:
The next step could be to write a proposal for this. However having these kind of changes is probably hard getting approved and it won't release before Feb 2019. On the other hand, there a quite a lot good service wrappers out there. |
Part of the problem won't be solved by building service support into the runtime, at least not without some very specific modifications. |
@derekwbrown I think the point here would be to hook into the service manager before firing up the runtime... before "runtime/proc.go main()" but inside the system's entry point (main/WinMain). More details: https://msdn.microsoft.com/en-us/library/windows/desktop/ms685477(v=vs.85).aspx |
@fkollmann I understand that; however, even if your service hooked into the service manager during its own Unless you're talking about changing the language to have a |
As far as I know, This problem should be fixed by calling SetServiceStatus on regular intervals. And the interval can be changed on registory. |
I think it should be implemented directly into the OS main() function. You need specific functions anyway (see servicemain()). I would suggest the following steps:
|
@mattn I don't think that is a good idea. Microsoft explicitly discourages from doing this as it can keep the service stuck forever. I also think it's not required, as this is no heartbeat mechanism. You simply have to set an appropriate timeout. |
@kneden @derekwbrown can you try https://golang.org/cl/110160 to see if it helps with your problem. Thank you. Alex |
@alexbrainman, this issue only appears to show up (for me) if you are heavily disk I/O constrained. |
Do you have any suggestions on how I can reproduce it? What version of Go do you use? What is your program? Grasping for straws. Thank you. Alex |
From this thread it looks like even the simplest program will exhibit this, currently I am trying to repro myself from scratch in GCE. I have a repo (I didn't build the image though so not sure what was done) so I know its related to disk speed, but vanilla images don't have the issue. I think additional services need to be installed to slow boot down. |
That would be nice, if you can repro on GCE. What version of Go is your service built with?
What is your environment where this error happen to you? Does the error happens consistently? Alex |
I was unable to repro the timeout, but I was able to see the issue using Windows Performance Recorder and measuring boot performance. One a really low specced system (1CPU, really low IO limit) my service took 27s for container init time. One a system with higher specs it was only 1s. I also installed lots of random windows server features to further slow down the boot process. Here is a link that talks about further digging into why service start is slow: |
I just submitted https://golang.org/cl/246317 change that can possibly fix this bug. Please, check, if you still can reproduce this bug after CL 246317 ? Thank you. Alex |
Since no one replied to my comment last week, I will assume that fixes this. So I will close this issue. Please, let me know, if I am wrong. Thank you. Alex |
This issue is frustrating and I've seen it on busy Windows Server Core machines. One workaround for this issue is to edit the Another option for services is to change the startup type to Automatic (Delayed). |
@alexbrainman can we reopen this issue? |
@Jonathan727 , @Sigurthorb , have you been able to test with Alex's CL 246317? |
Yes, we are seeing this issue even with Alex's CL 246317 fix in our vendor-ed sys package |
Thanks. Reopening. |
We're facing the same issue in our environment, could share a permanent workaround. |
I encountered this problem, putting all DLL related initialization actions into init method solves my problem. |
Thanks for this shirou/gopsutil#570 |
What version of Go are you using (
go version
)?go version go1.9.2 windows/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?What did you do?
Configured our Go language application to start as a Windows Service, with start type of 'Automatic'. The service is expected to start when the system boots up. Sometimes the service process will not be running after the system has booted up and the 'Windows Logs / 'System' logs in Event Viewer show two logs:
These events do not appear in the logs when the service successfully starts.
It appears that Windows terminates the service process when the service fails to respond to the Windows Service start event within the 30 s time frame. Our application contains the Windows Service code based on the sample code from
sys/windows/svc/example
.This timeout issue occurs both with 64-bit and 32-bit versions of our executable.
What did you expect to see?
The Go language Windows service start reliably every time the system boots up.
What did you see instead?
The Go language service does not start reliably on some systems on boot-up.
Some observations from systems where the service often fails to start during boot-up:
main()
starts. We found thatmain()
was often not reached when the timeout occurred. Sometimes even the first imported package's initialization was not reached.CreateProcess()
call that started our executable) often showed large time deltas. The logs indicated it often took a long time for the Go executable to get launched, and also to reach themain()
.While we have some systems (VirtualBox VMs) that reliably show these startup problems, the issue is harder to reproduce on other systems. We have VirtualBox VMs where the problem occurs only occasionally and we have yet to see the timeout occur on non-virtual Windows 10 installations. Timestamp-instrumented loads on those systems sometimes show time deltas of up to 10 s during startup. While this is not long enough to trigger the 30 s timeout on a powerful real system, the concern is that in virtualized environments and/or slower systems the timeout will be reached and the service will not start.
Sample Programs
Sample code can be found in this GitHub repository, including instructions for building and installing (README.md).
Both the
launchserv.exe
andwinserv.exe
programs contain the Go Windows Service code (again, based one the sample code). The Go Windows Service sample code was combined into a single file in both cases in order to 'flatten' the main package for tracking imported package initialization.It is intended that only one of the two should run as a Windows Service. The recommended configuration is to install
launchserv.exe
as a Windows Service ("my service"), and it will in turn launchwinserv.exe
as a child process (winserv.exe
will run as a 'normal' program, not a service).launchserv.exe
writes a timestamped log just prior to startingwinserv.exe
, which allows measuring the time it takes forwinserv.exe
to start.winserv.exe
contains a large number of standard package imports, to mimic a real program. These imports are marked as 'unused' and as such only their initializations run (variable initialization andinit()
).The VirtualBox version of the Win10 MSEdge test VM can be downloaded from Microsoft here to run these test programs. The following changes may help to increase the likelihood of seeing the issue with this, or other systems:
Multiple boot-ups of the system may be required to see the problem. In many cases, a VM may sometimes demonstrate the problem, but then not do so at other times. YMMV.
When delays are seen, the two most common places are:
winserv.exe
; that is, between the lastlaunchserv.exe
log and the firstwinserv.exe
log. Time deltas as high as 26 s have been seen (and as low as less than 400 ms) have been seen on an Win10/MSEdge VM.winserv.exe
main.go
, between the 'b' and 'c' logs. These are the large number of unused standard package imports. Times as high as 9 s (and less than 100 ms on the low end) have been seen on the same Win10/MSEdge VM.This service is an important component of our product; we are adding more functionality all the time. As the amount of code increases, it seems even more likely that this problem will occur. If there is anything else we can do to help resolve this issue (experiments, measurements, debugging, etc), please let us know.
The text was updated successfully, but these errors were encountered: