Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: deploy via ansible rc: 0 while "The service process could not connect to the service controller." in eventlog #1760

Closed
one1zero1one opened this issue Sep 13, 2016 · 13 comments
Labels
bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution platform/windows
Milestone

Comments

@one1zero1one
Copy link

System info:

Win2012R2

Steps to reproduce:

Running in powershell/console works OK (both interactive and installing the service)
Deploying via winrm (ansible) on the same box rc:0, no stdout or stderr output + eventlog error.

  1. execute telegraf.exe via ansible's raw module (powershell via winrm)
  2. check output (rc:0, no err/stdout) + eventID3 in eventlog

Expected behavior:

installing/running telegraf
rc:0, stderr NOT "" and/or stdout NOT "" and no eventID

Actual behavior:

"rc": 0, "stderr": "", "stdout": "", "stdout_lines": [] + eventID:3 "The service process could not connect to the service controller."

Use case:

testing telegraf on windows works fine, mass-deploying it using ansible (or any other deployment method that uses winrm) doesn't, and this is a blocker for using telegraf on windows.

@sparrc
Copy link
Contributor

sparrc commented Sep 13, 2016

I don't quite understand what the issue is here, is it because telegraf is not sending anything to the event log?

@sparrc sparrc added bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution platform/windows labels Sep 13, 2016
@sparrc
Copy link
Contributor

sparrc commented Sep 13, 2016

@one1zero1one would you happen to be able to try it out via the old NSSM installation method? see https://github.com/influxdata/telegraf/blob/9320a6e115b0bc2d7a832ae56ef0c8329df9db79/docs/WINDOWS_SERVICE.md

@sparrc
Copy link
Contributor

sparrc commented Sep 13, 2016

cc @butitsnotme

@one1zero1one
Copy link
Author

@sparrc - I've managed to register the service via ansible with chocolatey->nssm - but that's a lot of overheads and workarounds for registering a simple metric agent service...

Back to your first question the issue here is that, with the same admin user, running from console/powershell telegraf.exe works (test and registering service) - however, when running it over WinRM, I get (rc:0) nothing on stderr/stdout, but "The service process could not connect to the service controller." in windows event log.

@sparrc
Copy link
Contributor

sparrc commented Sep 14, 2016

yep, it's not a permanent workaround but just to try to diagnose what part of the stack is failing.

Do you have any other debug information that may be of use? Do you know what commands ansible is running to install the service? which user it's running under? Is it specific to ansible or to WinRM in general?

@sparrc
Copy link
Contributor

sparrc commented Sep 14, 2016

I've also opened an issue with the service installation library here: kardianos/service#72

@one1zero1one
Copy link
Author

@sparrc ansible is using winrm and powershell. So it actually runs the exact same command using the exact same user as in console (where it works) - only it does it over winrm.
I'll look into it further during the weekend to see if I can rule ansible out for this behaviour.

@one1zero1one
Copy link
Author

one1zero1one commented Sep 14, 2016

@sparrc I jumped to conclusion saying that deploying via nssm works.
It does register the service,

C:\telegraf>nssm install Telegraf c:\telegraf\telegraf.exe -config c:\telegraf\telegraf.config
Service "Telegraf" installed successfully!

However, the service registered points to "C:\ProgramData\chocolatey\lib\NSSM\Tools\nssm-2.24\win64\nssm.exe" (http://imgur.com/gBWu0rE) instead of telegraf :/

So - currently the only way that work to use/install telegraf as a service on windows2012r2 is to do it interactively in console.

I've seen from the issue you opened with the service installation library that it could be a permission issue, however - my issue is that whatever path I take to automate (other than running in console), there is no error/output from telegraf.exe to give some kind of clue what's up. Using full admin everywhere for this tests.

@one1zero1one
Copy link
Author

@sparrc I finally got it to work using sc.exe under winrm (from ansible).

EXEC sc.exe create telegraf binpath= "C:\telegraf\telegraf.exe -config c:\telegraf\telegraf.conf"
WINRM RESULT u' Response code 0, out "[SC] CreateService S", err "" '

I held back from trying sc.exe in the begining because of the whole debate here #860 - however I'm happy it finally works, we can automate its deployment now.

Not sure about this bug, -service install still won't work from winrm - however I'm happy with sc.exe if it holds up in time. Thanks!

@sparrc
Copy link
Contributor

sparrc commented Sep 14, 2016

glad you found a workaround, I'll leave this open until kardianos/service#72 is solved, as it's not ideal

@peter-murray
Copy link
Contributor

peter-murray commented Sep 15, 2016

We ran into the same issue with telgraf not being able to be installed using Ansible. The problem is the way that the code handles interactive and non-interactive sessions.

The library for the service wrapper and reloadLoop combine together in a way that if you are in a non-interactive session, it assumes you are running as a service and are performing only a start or stop.

We had to modify the code to move the handling of the -service flags and then managed to get it working as intended, we could provide this change as a PR if desired.

The second issue is that there is no logging what so ever when running under Windows, which is a real issue as you cannot debug a broken config.

@sparrc
Copy link
Contributor

sparrc commented Sep 15, 2016

@peter-murray yes please submit a PR!

logging is another issue that I'm also not quite sure how to handle, can you open a separate issue for that?

@sparrc
Copy link
Contributor

sparrc commented Sep 28, 2016

closed in 1.1 by #1772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution platform/windows
Projects
None yet
Development

No branches or pull requests

4 participants