-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Windows Service init out to seperate package #2061
Conversation
Signed-off-by: Jamie Milton <[email protected]>
Signed-off-by: Jamie Milton <[email protected]>
Looks like there are some linting errors and issues building on linux. |
Thanks for the pointers. Trying to get my local environment sorted so i can resolve locally and not rely on the ci linter... This PR in its current form (Ignoring the lint errors) only goes part way to achieving what I'm trying to do. Because 'Entrypoint' is required to start the service we still have to import a BUNCH of other packages just to initiate the service so it still takes a long time when CPU is starved. I'm not too sure what direction to take with this right now, Ideally I would want to decouple the 'Entrypoint' type from the starting of the service and have its status maintained back in main() but that might end up being a larger change. Once I've got my local stuff sorted ill need to take a crack at separating the two, unless someone else wants to take a crack at the last part now that the issue has been 'defined' |
Signed-off-by: Jamie Milton <[email protected]>
Signed-off-by: Jamie Milton <[email protected]>
Signed-off-by: Jamie Milton <[email protected]>
Going to give this a bit of thought. I hesitant to set successful running until we have loaded the configuration. Though the worst-case scenario is it comes up and then immediately fails. |
No worries, I'm very much doing this from the perspective of understanding windows well but golang not well so whilst any solution i may come up with might work, it might not necessarily be 'correct' :) For reference with the current state the service will start (Quickly) but the stop command will be ignored as the channel triggered by a stop command is not hooked up with the signal handler on the entrypoint. Loading the config during the initiate wouldn't be too terrible, as long as it doesn't depend on some of the weighty packages that were being imported before. It should be noted that based on the behaviour I've seen svc.run() sets the windows service into 'Starting' and its not until you hit main() (or possibly init within the main package, i haven't tested that) that the service is set to 'Started' |
Granted that could just be because I've seen it broken so often now xD either way I'll do some more on this in the next few days |
For comparison so far, previously under load it took 59s to get to Service Initiation (Trace in the linked issue) with this shuffle around it takes 2s, Still quiet a long time for a windows service but well within the 30s grace period: Click Here
|
Signed-off-by: Jamie Milton <[email protected]>
Signed-off-by: Jamie Milton <[email protected]>
Nearly there, it actually works on windows (stopping/starting services + Checking healthy/metrics URLs) so I just need to sort the cross platform stuff. I was hoping to use IsWindowsService to toggle behaviour based on being windows or not but that doesn't see to be an option so once I resolve that it SHOULD be able to pass checks. I see now why originally it just followed a separate path for windows rather than trying to have the two co-exist... |
Signed-off-by: Jamie Milton <[email protected]>
Signed-off-by: Jamie Milton <[email protected]>
I'm not entirely happy with how the last commit turned out but apart from modifying the code under /pkg i couldn't see another way of achieving my goal which was to keep server out of initiate but maintain the toggle between windows event log and 'normal' logger. Open to suggestions on that one :\ |
Signed-off-by: Jamie Milton <[email protected]>
Signed-off-by: Jamie Milton <[email protected]>
Fyi i made a version where we call 'StartPending' in the initiate and 'Running' in main() but it had the same timeout issue as before :( |
In some very severe cases the underlying GO code can fail to even reach the executing code, the only option in such cases (Besides fixing the underlying golang issue, unlikely) is to opt for a delayed start on the service to buy some time. How do you feel about setting the default service type to delayed as part of this PR? Or does it warrant a separate PR/Discussion? |
This PR has been automatically marked as stale because it has not had any activity in the past 30 days. |
Any thoughts/comments? |
// Default name for the Grafana Agent under Windows | ||
const ( | ||
ServiceName = "Grafana Agent" | ||
) | ||
|
||
func init() { | ||
// If Windows is trying to run as a service, go through that | ||
// path instead. | ||
if IsWindowsService() { | ||
go func() { | ||
err := svc.Run(ServiceName, &AgentService{stopCh: ServiceExit}) | ||
if err != nil { | ||
log.Fatalf("Failed to start service: %v", err) | ||
} | ||
}() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this isn't an ideal situation to be in; moving logic to init
functions is error prone and generally not fun to maintain.
I recognize that this was the solution used in prometheus-community/windows_exporter#1047, and for a short-term solution it might be okay.
Long-term, though, it's best for windows_exporter (and any of its transitive dependencies causing this issue) to move away from init functions and towards lazy loading so we can remove the need for this. This is something we can assist with since we're a direct downstream dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, This is definitely a tactical fix thought up by someone with limited Go skills, but if nothing else im hoping this effort has helped articulate what the issue is on windows machines for more competent contributors to come up with a more suitable strategic fix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like for @mattdurham to have the final say on approval/merging this PR, but from my perspective I'd be much more comfortable if we could add a TODO comment here to move eventually away from init functions. Something like:
// TODO: Remove init function when possible.
//
// An init function is needed because some collectors in
// windows_exporter have init functions which can delay the
// startup process. We need to run our init function first
// to set the status of the Windows service as early as
// possible.
//
// Long term, helping windows_exporter move away from init
// functions and towards lazy initialization will remove the
// need for this workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give it another run-through today/tomorrow and see if I find any oddness with using the init. It does solve a problem a lot of our users have and the PR is very much appreciated @jammiemil.
I believe this was forgotten about. The issue is still impacting me, could we take another look at this PR? |
maybe @erikbaranowski wants to take a look? |
@jammiemil what's the status of this PR? Do you plan to keep working on it or should we close it? |
For now, I think we're going to close this one; it's been open for over a year, and we've had talks with Jamie on the community slack, where he expressed being OK with it not being merged. The main reason I'm choosing to close this is because, while it fixes the issue, I'm worried it comes at the cost of making the codebase harder to reason about, and leaves the package import order volatile and prone to bugs. We also believe the workarounds are sufficient enough such that this PR isn't desperately needed; the workaround for static mode is to increase the service timeout on Windows. In Flow mode, this issue doesn't exist because there's a separate program which implements the service and manages a static mode binary. Thanks for your effort here @jammiemil, and I'm sorry this one didn't get merged. |
PR Description
Move Windows service Initiation out of main to allow it to run ASAP during process initiation and avoid the 30s timeout on windows services
Which issue(s) this PR fixes
grafana/alloy#529
Notes to the Reviewer
PR Checklist