Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Collector to be run as a Windows service #1120

Merged
merged 1 commit into from
Jun 25, 2020

Conversation

james-bebbington
Copy link
Member

Link to tracking Issue:
#1119

Description:
When running the Collector on Windows, users will generally want to run this as a Windows service. It's not possible to just run a .exe directly as a Windows service, so some minor modifications needed to be made to the startup/service code.

This PR adds a separate main method on Windows (using build flags) that checks if the app is running interactively (i.e. invoked directly via the command line) vs running as a Windows service (i.e. invoked from service control manager).

If the executable is invoked from the command line, there are no changes to how the app is run. If the executable has been invoked from Windows service control manager, we call svc.Run on a service handler that is as a wrapper around service.Application.

  • In order to make minimal changes to existing code, the signalsChannel, that is used to be notified of OS SIGTERM signals, is made available for this wrapper to use when the Windows svc.Stop or svc.Shutdown signals are received.
  • In order to prevent logs from being lost, allow a zap hook to be supplied as part of service.Parameters, and redirect non-debug logs to the Windows Event Viewer.

Testing:
Have installed and run this as a Windows service (created via MSI or sc.exe) to validate that it runs as expected.

@codecov
Copy link

codecov bot commented Jun 15, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@a8db627). Click here to learn what that means.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1120   +/-   ##
=========================================
  Coverage          ?   86.91%           
=========================================
  Files             ?      201           
  Lines             ?    14532           
  Branches          ?        0           
=========================================
  Hits              ?    12631           
  Misses            ?     1452           
  Partials          ?      449           
Impacted Files Coverage Δ
service/telemetry.go 85.00% <90.00%> (ø)
service/logger.go 68.42% <100.00%> (ø)
service/service.go 55.13% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8db627...081424c. Read the comment docs.

@james-bebbington
Copy link
Member Author

I will look into adding a couple of tests for the service_windows.go file, but would appreciate a review of the approach here.

Note one thing I haven't implemented is the ability to install/uninstall/start/stop the Windows Service via command line flags. I don't really feel this is necessary as sc.exe (or various other tools) can be used instead, but if anyone feels strongly about this I can look into implementing it (I would probably consider switching to using https://github.com/kardianos/service).

@tigrannajaryan
Copy link
Member

@jchengsfx please review.

@jchengsfx
Copy link
Contributor

I will look into adding a couple of tests for the service_windows.go file, but would appreciate a review of the approach here.

Note one thing I haven't implemented is the ability to install/uninstall/start/stop the Windows Service via command line flags. I don't really feel this is necessary as sc.exe (or various other tools) can be used instead, but if anyone feels strongly about this I can look into implementing it (I would probably consider switching to using https://github.com/kardianos/service).

The SignalFx agent also uses kardianos and https://github.com/StackExchange/wmi for its windows service, and it has worked well for us so far (https://github.com/signalfx/signalfx-agent/blob/master/cmd/agent/agentmain/windows.go).

@james-bebbington
Copy link
Member Author

The SignalFx agent also uses kardianos and https://github.com/StackExchange/wmi for its windows service, and it has worked well for us so far (https://github.com/signalfx/signalfx-agent/blob/master/cmd/agent/agentmain/windows.go).

Normally I'd be keen to use a library with a higher level interface, but in this case I'm not sure kardianos is much different than using the x/sys/windows/svc code directly. I explored switching this code over but don't think its worth it. If you're interested, see the potential changes here: james-bebbington#2

@james-bebbington
Copy link
Member Author

(I intend to get back to this PR to add some tests tomorrow)


import "log"

func main() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this in the main.go and move the helpers in a different file? Main other sounds like a second hand possibility :))

Copy link
Member Author

@james-bebbington james-bebbington Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea having multiple main functions was probably a bad idea. I've restructured this slightly so there's only helper methods in this file - are you okay with how it is now or did you want me to rename the files as well?

(I feel like this is the right convention now where the xxx_{os}.go files contain helper functions for the xxx.go file)

@james-bebbington james-bebbington force-pushed the windows-service branch 3 times, most recently from 6ca71bc to dc6ccdb Compare June 19, 2020 08:25
@@ -40,7 +40,8 @@ var (
)

type appTelemetry struct {
views []*view.View
views []*view.View
server *http.Server
Copy link
Member Author

@james-bebbington james-bebbington Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q. Why did I change this file that's unrelated to the rest of the PR?

A. Because this PR introduces a second test that calls app.Start(). Without this change, the app fails to start up on the second attempt because we never stopped listening to the metricsAddr port.

@james-bebbington
Copy link
Member Author

I added tests and this is now ready for review

@james-bebbington james-bebbington changed the title Enable to Collector to be run as a Windows service Enable Collector to be run as a Windows service Jun 19, 2020
Copy link
Member

@nilebox nilebox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly code cleanup comments.

cmd/otelcol/main.go Outdated Show resolved Hide resolved
cmd/otelcol/main_windows.go Outdated Show resolved Hide resolved
cmd/otelcol/main.go Outdated Show resolved Hide resolved
cmd/otelcol/main_windows.go Outdated Show resolved Hide resolved

func (s *WindowsService) Execute(args []string, requests <-chan svc.ChangeRequest, changes chan<- svc.Status) (ssec bool, errno uint32) {
if len(args) == 0 {
log.Fatal("expected first argument supplied to service.Execute to be the service name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagate error, here and everywhere else :)

Copy link
Member Author

@james-bebbington james-bebbington Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really propogate errors in this file back to the main function as this function is invoked in a background goroutine via a syscall.

However, you did send me a on a path of reviewing how we handle errors here. Windows is not great at giving much information when services startup - you don't get any helpful error message from the startup command/dialog, & it takes a while to return any error. I tested the following:

  1. Log to event log (Event Viewer) directly: Where possible this is probably the best option as the error will come up in Application event logs, and include whatever info you specify.
  2. Return a system errorcode: This is useful to do where 1 is not possible or in addition to it. This will create an error representing the relevant error code in the System event logs.
  3. Uselog.Fatal: This is probably the worst option as you just get a generic error message in the System event logs with no details of the error.

So I've refactored this file quite a bit to do 1 & 2 instead of 3 - you might want to take another look

service/service_windows.go Outdated Show resolved Hide resolved
cmd/otelcol/main.go Outdated Show resolved Hide resolved
log.Fatalf("failed to construct the application: %v", err)
}

err = app.Start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rant: we should really rename this function to Run (change #615 was rejected, but there were quite a few other breaking changes in the codebase since then).

Copy link
Member Author

@james-bebbington james-bebbington Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this is admittedly particularly confusing in this PR since we wait for the response of app.Start in the windowsservice.stop function. But if we rename this, I think it should be done in a separate PR

@james-bebbington james-bebbington force-pushed the windows-service branch 2 times, most recently from bf3e5c8 to a8b8ba4 Compare June 22, 2020 03:59

changes <- svc.Status{State: svc.StartPending}
if err = s.start(elog, appErrorChannel); err != nil {
elog.Error(3, fmt.Sprintf("failed to start service: %v", err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

service/service_windows.go Outdated Show resolved Hide resolved
service/service_windows.go Outdated Show resolved Hide resolved
service/service_windows.go Outdated Show resolved Hide resolved
service/service_windows_test.go Outdated Show resolved Hide resolved
service/telemetry.go Outdated Show resolved Hide resolved
@james-bebbington james-bebbington force-pushed the windows-service branch 7 times, most recently from 78e5077 to dcb86e2 Compare June 24, 2020 08:03
@james-bebbington
Copy link
Member Author

@jchengsfx @dmitryax - can you please take a look?

(this is also blocking #1153)

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants